1
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Finocchio A, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat Biotechnol 2024:10.1038/s41587-024-02382-1. [PMID: 39455800 DOI: 10.1038/s41587-024-02382-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 08/08/2024] [Indexed: 10/28/2024]
Abstract
Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection. DRAGEN outperforms current state-of-the-art methods in speed and accuracy across all variant types (single-nucleotide variations, insertions or deletions, short tandem repeats, structural variations and copy number variations) and incorporates specialized methods for analysis of medically relevant genes. We demonstrate the performance of DRAGEN across 3,202 whole-genome sequencing datasets by generating fully genotyped multisample variant call format files and demonstrate its scalability, accuracy and innovation to further advance the integration of comprehensive genomics. Overall, DRAGEN marks a major milestone in sequencing data analysis and will provide insights across various diseases, including Mendelian and rare diseases, with a highly comprehensive and scalable platform.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
2
|
English AC, Cunial F, Metcalf GA, Gibbs RA, Sedlazeck FJ. K-mer analysis of long-read alignment pileups for structural variant genotyping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.22.619642. [PMID: 39484432 PMCID: PMC11526963 DOI: 10.1101/2024.10.22.619642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Accurately genotyping structural variant (SV) alleles is crucial to genomics research. We present a novel method (kanpig) for genotyping SVs that leverages variant graphs and k-mer vectors to rapidly generate accurate SV genotypes. We benchmark kanpig against the latest SV benchmarks and show single-sample genotyping concordance of 82.1%, which is higher than existing genotypers averaging 66.3%. We explore kanpig's applicability to multi-sample projects by benchmarking project-level VCFs containing 47 genetically diverse samples and find kanpig accurately genotypes complex loci (e.g. SVs neighboring other SVs), achieving much higher genotyping concordance than other tools. Kanpig requires only 43 seconds to process a single sample's 20x long-reads and can be run on PacBio or ONT long-reads.
Collapse
Affiliation(s)
- Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Fabio Cunial
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
3
|
Behera S, Belyeu JR, Chen X, Paulin LF, Nguyen NQH, Newman E, Mahmoud M, Menon VK, Qi Q, Joshi P, Marcovina S, Rossi M, Roller E, Han J, Onuchic V, Avery CL, Ballantyne CM, Rodriguez CJ, Kaplan RC, Muzny DM, Metcalf GA, Gibbs RA, Yu B, Boerwinkle E, Eberle MA, Sedlazeck FJ. Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics 2024; 17:255. [PMID: 39449055 PMCID: PMC11515395 DOI: 10.1186/s12920-024-02024-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/07/2024] [Indexed: 10/26/2024] Open
Abstract
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase approximately 50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jonathan R Belyeu
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Xiao Chen
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ngoc Quynh H Nguyen
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Genentech, San Francisco, CA, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Parag Joshi
- Medpace Reference Laboratories, Cincinnati, OH, USA
| | | | | | | | | | | | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Carlos J Rodriguez
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Fred Hutchinson Cancer Center, Public Health Sciences Division, Seattle, WA, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Bing Yu
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Michael A Eberle
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
4
|
Dodge TO, Kim BY, Baczenas JJ, Banerjee SM, Gunn TR, Donny AE, Given LA, Rice AR, Haase Cox SK, Weinstein ML, Cross R, Moran BM, Haber K, Haghani NB, Machin Kairuz JA, Gellert HR, Du K, Aguillon SM, Tudor MS, Gutiérrez-Rodríguez C, Rios-Cardenas O, Morris MR, Schartl M, Powell DL, Schumer M. Structural genomic variation and behavioral interactions underpin a balanced sexual mimicry polymorphism. Curr Biol 2024; 34:4662-4676.e9. [PMID: 39326413 DOI: 10.1016/j.cub.2024.08.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/15/2024] [Accepted: 08/29/2024] [Indexed: 09/28/2024]
Abstract
How phenotypic diversity originates and persists within populations are classic puzzles in evolutionary biology. While balanced polymorphisms segregate within many species, it remains rare for both the genetic basis and the selective forces to be known, leading to an incomplete understanding of many classes of traits under balancing selection. Here, we uncover the genetic architecture of a balanced sexual mimicry polymorphism and identify behavioral mechanisms that may be involved in its maintenance in the swordtail fish Xiphophorus birchmanni. We find that ∼40% of X. birchmanni males develop a "false gravid spot," a melanic pigmentation pattern that mimics the "pregnancy spot" associated with sexual maturity in female live-bearing fish. Using genome-wide association mapping, we detect a single intergenic region associated with variation in the false gravid spot phenotype, which is upstream of kitlga, a melanophore patterning gene. By performing long-read sequencing within and across populations, we identify complex structural rearrangements between alternate alleles at this locus. The false gravid spot haplotype drives increased allele-specific expression of kitlga, which provides a mechanistic explanation for the increased melanophore abundance that causes the spot. By studying social interactions in the laboratory and in nature, we find that males with the false gravid spot experience less aggression; however, they also receive increased attention from other males and are disdained by females. These behavioral interactions may contribute to the maintenance of this phenotypic polymorphism in natural populations. We speculate that structural variants affecting gene regulation may be an underappreciated driver of balanced polymorphisms across diverse species.
Collapse
Affiliation(s)
- Tristram O Dodge
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México.
| | - Bernard Y Kim
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - John J Baczenas
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Shreya M Banerjee
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, 475 Storer Mall, Davis, CA 95616, USA
| | - Theresa R Gunn
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Alex E Donny
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Lyle A Given
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Andreas R Rice
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Sophia K Haase Cox
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - M Luke Weinstein
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Ryan Cross
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Benjamin M Moran
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | - Kate Haber
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Berkeley High School, 1980 Allston Way, Berkeley, CA 94704, USA
| | - Nadia B Haghani
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México
| | | | - Hannah R Gellert
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA
| | - Kang Du
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA
| | - Stepfanie M Aguillon
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 612 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - M Scarlett Tudor
- Cooperative Extension and Aquaculture Research Institute, University of Maine, 33 Salmon Farm Road, Franklin, ME 04634, USA
| | - Carla Gutiérrez-Rodríguez
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Oscar Rios-Cardenas
- Red de Biología Evolutiva, Instituto de Ecología, A.C., Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz 91073, México
| | - Molly R Morris
- Department of Biological Sciences, Ohio University, 7 Depot St., Athens, OH 45701, USA
| | - Manfred Schartl
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, 601 University Drive, San Marcos, TX 78666, USA; Developmental Biochemistry, Biocenter, University of Würzburg, Am Hubland, 97074 Wuerzburg, Germany
| | - Daniel L Powell
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Department of Biology, Louisiana State University, 202 Life Science Building, Baton Rouge, LA 70803, USA
| | - Molly Schumer
- Department of Biology, Stanford University, 327 Campus Drive, Stanford, CA 94305, USA; Centro de Investigaciones Científicas de las Huastecas "Aguazarca" A.C., 16 de Septiembre, 392 Barrio Aguazarca, Calnali, Hidalgo 43240, México; Howard Hughes Medical Institute, 327 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
5
|
Ma C, Shi X, Li X, Zhang YP, Peng MS. Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data. BMC Genomics 2024; 25:970. [PMID: 39415108 PMCID: PMC11481438 DOI: 10.1186/s12864-024-10875-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 10/08/2024] [Indexed: 10/18/2024] Open
Abstract
BACKGROUND Structural variations (SVs) are widespread across genome and have a great impact on evolution, disease, and phenotypic diversity. Despite the development of numerous bioinformatic tools, commonly referred to as SV callers, tailored for detecting SVs using whole genome sequence (WGS) data and employing diverse algorithms, their performance necessitates rigorous evaluation with real data and validated SVs. Moreover, a considerable proportion of these tools have been primarily designed and optimized using human genome data. Consequently, their applicability and performance in Avian species, characterized by smaller genomes and distinct genomic architectures, remain inadequately assessed. RESULTS We performed a comprehensive assessment of the performance of ten widely used SV callers using population-level real genomic data with the validated five common types of SVs. The performance of SV callers varies with the types and sizes of SVs. As compared with other tools, GRIDSS, Lumpy, Wham, and Manta present better detection accuracy. Pindel can detect more small SVs than others. CNVnator and CNVkit can detect more medium and large copy number variations. Given the poor consistency among different SV callers, the combination calling strategy is not recommended. All tools show poor ability in the detection of insertions (especially with size > 150 bp). At least 50× read depth is required to detect more than 80% of the SVs for most tools. CONCLUSIONS This study highlights the importance and necessity of using real sequencing data, rather than simulated data only, with validated SVs for SV caller evaluation. Some practical guidance and suggestions are provided for SV detection in future researches.
Collapse
Affiliation(s)
- Cheng Ma
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC, Uppsala, SE-75123, Sweden
| | - Xian Shi
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuzhen Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming, 650201, China
- College of Biological Big Data, Yunnan Agriculture University, Kunming, 650201, China
| | - Ya-Ping Zhang
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650091, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Min-Sheng Peng
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
6
|
Ferreira MR, Carratto TMT, Frontanilla TS, Bonadio RS, Jain M, de Oliveira SF, Castelli EC, Mendes-Junior CT. Advances in forensic genetics: Exploring the potential of long read sequencing. Forensic Sci Int Genet 2024; 74:103156. [PMID: 39427416 DOI: 10.1016/j.fsigen.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 10/04/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
DNA-based technologies have been used in forensic practice since the mid-1980s. While PCR-based STR genotyping using Capillary Electrophoresis remains the gold standard for generating DNA profiles in routine casework worldwide, the research community is continually seeking alternative methods capable of providing additional information to enhance discrimination power or contribute with new investigative leads. Oxford Nanopore Technologies (ONT) and PacBio third-generation sequencing have revolutionized the field, offering real-time capabilities, single-molecule resolution, and long-read sequencing (LRS). ONT, the pioneer of nanopore sequencing, uses biological nanopores to analyze nucleic acids in real-time. Its devices have revolutionized sequencing and may represent an interesting alternative for forensic research and routine casework, given that it offers unparalleled flexibility in a portable size: it enables sequencing approaches that range widely from PCR-amplified short target regions (e.g., CODIS STRs) to PCR-free whole transcriptome or even ultra-long whole genome sequencing. Despite its higher error rate compared to Illumina sequencing, it can significantly improve accuracy in read alignment against a reference genome or de novo genome assembly. This is achieved by generating long contiguous sequences that correctly assemble repetitive sections and regions with structural variation. Moreover, it allows real-time determination of DNA methylation status from native DNA without the need for bisulfite conversion. LRS enables the analysis of thousands of markers at once, providing phasing information and eliminating the need for multiple assays. This maximizes the information retrieved from a single invaluable sample. In this review, we explore the potential use of LRS in different forensic genetics approaches.
Collapse
Affiliation(s)
- Marcel Rodrigues Ferreira
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Thássia Mayra Telles Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14049-900, Brazil
| | - Raphael Severino Bonadio
- Depto Genética e Morfologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | | | - Erick C Castelli
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil; Pathology Department, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
| |
Collapse
|
7
|
Kang X, Zhang W, Li Y, Luo X, Schönhuth A. HyLight: Strain aware assembly of low coverage metagenomes. Nat Commun 2024; 15:8665. [PMID: 39375348 PMCID: PMC11458758 DOI: 10.1038/s41467-024-52907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 09/23/2024] [Indexed: 10/09/2024] Open
Abstract
Different strains of identical species can vary substantially in terms of their spectrum of biomedically relevant phenotypes. Reconstructing the genomes of microbial communities at the level of their strains poses significant challenges, because sequencing errors can obscure strain-specific variants. Next-generation sequencing (NGS) reads are too short to resolve complex genomic regions. Third-generation sequencing (TGS) reads, although longer, are prone to higher error rates or substantially more expensive. Limiting TGS coverage to reduce costs compromises the accuracy of the assemblies. This explains why prior approaches agree on losses in strain awareness, accuracy, tendentially excessive costs, or combinations thereof. We introduce HyLight, a metagenome assembly approach that addresses these challenges by implementing the complementary strengths of TGS and NGS data. HyLight employs strain-resolved overlap graphs (OG) to accurately reconstruct individual strains within microbial communities. Our experiments demonstrate that HyLight produces strain-aware and contiguous assemblies at minimal error content, while significantly reducing costs because utilizing low-coverage TGS data. HyLight achieves an average improvement of 19.05% in preserving strain identity and demonstrates near-complete strain awareness across diverse datasets. In summary, HyLight offers considerable advances in metagenome assembly, insofar as it delivers significantly enhanced strain awareness, contiguity, and accuracy without the typical compromises observed in existing approaches.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Wenhai Zhang
- College of Biology, Hunan University, Changsha, China
| | - Yichen Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
8
|
Dunn T, Zook JM, Holt JM, Narayanasamy S. Jointly benchmarking small and structural variant calls with vcfdist. Genome Biol 2024; 25:253. [PMID: 39358801 PMCID: PMC11446017 DOI: 10.1186/s13059-024-03394-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (- 28.9%), INDELs (- 19.3%), and SVs (- 52.4%) across three datasets. vcfdist also corrects a common flaw in phasing evaluations, reducing measured flip errors by over 50%. Lastly, we show that vcfdist is more accurate than previously published works and on par with the newest approaches while providing improved result interpretability.
Collapse
Affiliation(s)
- Tim Dunn
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA.
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | | | - Satish Narayanasamy
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
9
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024; 143:1005-1020. [PMID: 38787419 PMCID: PMC11485027 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
10
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
11
|
Zhang Z, Zhang J, Kang L, Qiu X, Xu S, Xu J, Guo Y, Niu Z, Niu B, Bi A, Zhao X, Xu D, Wang J, Yin C, Lu F. Structural variation discovery in wheat using PacBio high-fidelity sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:687-698. [PMID: 39239888 DOI: 10.1111/tpj.17011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 08/09/2024] [Accepted: 08/22/2024] [Indexed: 09/07/2024]
Abstract
Structural variations (SVs) pervade plant genomes and contribute substantially to the phenotypic diversity. However, most SVs were ineffectively assayed due to their complex nature and the limitations of early genomic technologies. By applying the PacBio high-fidelity (HiFi) sequencing for wheat genomes, we performed a comprehensive evaluation of mainstream long-read aligners and SV callers in SV detection. The results indicated that the accuracy of deletion discovery is markedly influenced by callers, accounting for 87.73% of the variance, whereas both aligners (38.25%) and callers (49.32%) contributed substantially to the accuracy variance for insertions. Among the aligners, Winnowmap2 and NGMLR excelled in detecting deletions and insertions, respectively. For SV callers, SVIM achieved the best performance. We demonstrated that combining the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the effect of sequencing depth on the accuracy of SV detection, revealing that low-coverage HiFi sequencing is sufficiently robust for high-quality SV discovery. This study thoroughly evaluated SV discovery approaches and established optimal workflows for investigating structural variations using low-coverage HiFi sequencing in the wheat genome, which will advance SV discovery and decipher the biological functions of SVs in wheat and many other plants.
Collapse
Affiliation(s)
- Zhiliang Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jijin Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lipeng Kang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebing Qiu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Song Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jun Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yafei Guo
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zelin Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Beirui Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Aoyue Bi
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebo Zhao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daxing Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Changbin Yin
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Fei Lu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
12
|
Ren P, Zhang J, Vijg J. Somatic mutations in aging and disease. GeroScience 2024; 46:5171-5189. [PMID: 38488948 PMCID: PMC11336144 DOI: 10.1007/s11357-024-01113-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/27/2024] [Indexed: 03/17/2024] Open
Abstract
Time always leaves its mark, and our genome is no exception. Mutations in the genome of somatic cells were first hypothesized to be the cause of aging in the 1950s, shortly after the molecular structure of DNA had been described. Somatic mutation theories of aging are based on the fact that mutations in DNA as the ultimate template for all cellular functions are irreversible. However, it took until the 1990s to develop the methods to test if DNA mutations accumulate with age in different organs and tissues and estimate the severity of the problem. By now, numerous studies have documented the accumulation of somatic mutations with age in normal cells and tissues of mice, humans, and other animals, showing clock-like mutational signatures that provide information on the underlying causes of the mutations. In this review, we will first briefly discuss the recent advances in next-generation sequencing that now allow quantitative analysis of somatic mutations. Second, we will provide evidence that the mutation rate differs between cell types, with a focus on differences between germline and somatic mutation rate. Third, we will discuss somatic mutational signatures as measures of aging, environmental exposure, and activities of DNA repair processes. Fourth, we will explain the concept of clonally amplified somatic mutations, with a focus on clonal hematopoiesis. Fifth, we will briefly discuss somatic mutations in the transcriptome and in our other genome, i.e., the genome of mitochondria. We will end with a brief discussion of a possible causal contribution of somatic mutations to the aging process.
Collapse
Affiliation(s)
- Peijun Ren
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Jie Zhang
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Jan Vijg
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
| |
Collapse
|
13
|
Chen Z, Morris HR, Polke J, Wood NW, Gandhi S, Ryten M, Houlden H, Tucci A. Repeat expansion disorders. Pract Neurol 2024:pn-2023-003938. [PMID: 39349043 DOI: 10.1136/pn-2023-003938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2024] [Indexed: 10/02/2024]
Abstract
An increasing number of repeat expansion disorders have been found to cause both rare and common neurological disease. This is exemplified in recent discoveries of novel repeat expansions underlying a significant proportion of several late-onset neurodegenerative disorders, such as CANVAS (cerebellar ataxia, neuropathy and vestibular areflexia syndrome) and spinocerebellar ataxia type 27B. Most of the 60 described repeat expansion disorders to date are associated with neurological disease, providing substantial challenges for diagnosis, but also opportunities for management in a clinical neurology setting. Commonalities in clinical presentation, overarching diagnostic features and similarities in the approach to genetic testing justify considering these disorders collectively based on their unifying causative mechanism. In this review, we discuss the characteristics and diagnostic challenges of repeat expansion disorders for the neurologist and provide examples to highlight their clinical heterogeneity. With the ready availability of clinical-grade whole-genome sequencing for molecular diagnosis, we discuss the current approaches to testing for repeat expansion disorders and application in clinical practice.
Collapse
Affiliation(s)
- Zhongbo Chen
- Department of Clinical and Movement Neuroscience, University College London Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Huw R Morris
- Department of Clinical and Movement Neuroscience, University College London Queen Square Institute of Neurology, London, UK
| | - James Polke
- The Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, University College London Hospitals NHS Foundation Trust, London, UK
| | - Nicholas W Wood
- Department of Clinical and Movement Neuroscience, University College London Queen Square Institute of Neurology, London, UK
| | - Sonia Gandhi
- Department of Clinical and Movement Neuroscience, University College London Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Mina Ryten
- UK Dementia Research Institute at University of Cambridge, Cambridge, UK
| | - Henry Houlden
- Department of Neuromuscular Disease, University College London Queen Square Institute of Neurology, London, UK
| | - Arianna Tucci
- William Harvey Institute, Queen Mary University of London, London, UK
| |
Collapse
|
14
|
Lok S, Lau TNH, Trost B, Tong AHY, Paton T, Wintle RF, Engstrom MD, Gunn A, Scherer SW. Chromosomal-level reference genome assembly of muskox (Ovibos moschatus) from Banks Island in the Canadian Arctic, a resource for conservation genomics. Sci Rep 2024; 14:21023. [PMID: 39284808 PMCID: PMC11405533 DOI: 10.1038/s41598-024-67270-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 07/09/2024] [Indexed: 09/20/2024] Open
Abstract
The muskox (Ovibos moschatus), an integral component and iconic symbol of arctic biocultural diversity, is under threat by rapid environmental disruptions from climate change. We report a chromosomal-level haploid genome assembly of a muskox from Banks Island in the Canadian Arctic Archipelago. The assembly has a contig N50 of 44.7 Mbp, a scaffold N50 of 112.3 Mbp, a complete representation (100%) of the BUSCO v5.2.2 set of 9225 mammalian marker genes and is anchored to the 24 chromosomes of the muskox. Tabulation of heterozygous single nucleotide variants in our specimen revealed a very low level of genetic diversity, which is consistent with recent reports of the muskox having the lowest genome-wide heterozygosity among the ungulates. While muskox populations are currently showing no overt signs of inbreeding depression, environmental disruptions are expected to strain the genomic resilience of the species. One notable impact of rapid climate change in the Arctic is the spread of emerging infectious and parasitic diseases in the muskox, as exemplified by the range expansion of muskox lungworms, and the recent fatal outbreaks of Erysipelothrix rhusiopathiae, a pathogen normally associated with domestic swine and poultry. As a genomics resource for conservation management of the muskox against existing and emerging disease modalities, we annotated the genes of the major histocompatibility complex on chromosome 2 and performed an initial assessment of the genetic diversity of this complex. This resource is further supported by the annotation of the principal genes of the innate immunity system, genes that are rapidly evolving and under positive selection in the muskox, genes associated with environmental adaptations, and the genes associated with socioeconomic benefits for Arctic communities such as wool (qiviut) attributes. These annotations will benefit muskox management and conservation.
Collapse
Affiliation(s)
- Si Lok
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada.
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada.
| | - Timothy N H Lau
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Brett Trost
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Amy H Y Tong
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Tara Paton
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Richard F Wintle
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Mark D Engstrom
- Department of Natural History, Royal Ontario Museum, Toronto, ON, M5S 2C6, Canada
| | | | - Stephen W Scherer
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Rm 13.9713, Suite 03-6577, Toronto, ON, M5G 0A4, Canada.
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada.
- McLaughlin Centre, University of Toronto, Toronto, ON, M5G 0A4, Canada.
- Department of Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
15
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
16
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
17
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
18
|
Blommaert J, Sandoval-Castillo J, Beheregaray LB, Wellenreuther M. Peering into the gaps: Long-read sequencing illuminates structural variants and genomic evolution in the Australasian snapper. Genomics 2024; 116:110929. [PMID: 39216708 DOI: 10.1016/j.ygeno.2024.110929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/25/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Even before genome sequencing, genetic resources have supported species management and breeding programs. Current technologies, such as long-read sequencing, resolve complex genomic regions, like those rich in repeats or high in GC content. Improved genome contiguity enhances accuracy in identifying structural variants (SVs) and transposable elements (TEs). We present an improved genome assembly and SV catalogue for the Australasian snapper (Chrysophrys auratus). The new assembly is more contiguous, allowing for putative identification of 14 centromeres and transfer of 26,115 gene annotations from yellowfin seabream. Compared to the previous assembly, 35,000 additional SVs, including larger and more complex rearrangements, were annotated. SVs and TEs exhibit a distribution pattern skewed towards chromosome ends, likely influenced by recombination. Some SVs overlap with growth-related genes, underscoring their significance. This upgraded genome serves as a foundation for studying natural and artificial selection, offers a reference for related species, and sheds light on genome dynamics shaped by evolution.
Collapse
Affiliation(s)
- Julie Blommaert
- The New Zealand Institute for Plant and Food Research, Nelson, New Zealand.
| | - Jonathan Sandoval-Castillo
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Bedford Park, South Australia, Australia
| | - Luciano B Beheregaray
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Bedford Park, South Australia, Australia
| | - Maren Wellenreuther
- The New Zealand Institute for Plant and Food Research, Nelson, New Zealand; School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
19
|
Mbeti JMM, Bénech C, Sack FN, Wete E, Pangetha HN, Ateba SN, Tchatchueng J, Nloga AN, Fichou Y. First investigation of RH gene polymorphism in patients with sickle cell disease and associated blood donors in Cameroon, Central Africa. BLOOD TRANSFUSION = TRASFUSIONE DEL SANGUE 2024; 22:377-386. [PMID: 38315540 PMCID: PMC11390615 DOI: 10.2450/bloodtransfus.660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 11/29/2023] [Indexed: 02/07/2024]
Abstract
BACKGROUND Although genetic polymorphism of the RH blood group system is well known in sub-Saharan Africa, national/regional specificities still remain to be described precisely. For the first time in Cameroon, Central Africa, and in order to better characterize the molecular basis driving RH phenotype variability, as well as to identify the main antigens that may be potentially responsible for alloimmunization, we sought 1) to study the RH genes in a cohort of 109 patients with sickle cell disease; 2) to study the same genes in the corresponding donors whose red blood cells (RBCs) were transfused to the patients (108 donors in 98 patients); 3) to predict RH phenotype on the basis of the molecular data and compare the results with serologic testing; and 4) to identify retrospectively patients at risk for alloimmunization. MATERIALS AND METHODS In order to generate an exhaustive dataset, the RH genes of all patient and donor samples were systematically investigated 1) by quantitative multiplex PCR of short fluorescent fragments (QMPSF) for characterization of RHD gene zygosity and potential structural variants (SVs), and 2) by Sanger sequencing for identification of single nucleotide variants (SNVs). Subsequent to molecular analysis, the genotypes and RH phenotype were deduced and predicted, respectively, from reference databases. RESULTS In a total of 217 Cameroonian individuals, as many as 24 and up to 22 variant alleles were identified in the RHD and RHCE genes, respectively, in addition to the reference alleles. Interestingly, 65 patients with SCD (66.3%) were assumed to be exposed to one or more undesirable RH antigen(s) with varying degrees of clinical relevance. DISCUSSION Beyond the comprehensive report of the nature and distribution of RH variant alleles in a subset of Cameroonian patients treated by transfusion therapy, this work highlights the need for an extensive review of current practice, including routine serologic typing procedures, preferably in the near future.
Collapse
Affiliation(s)
- Jeanne Manga Messina Mbeti
- Université Catholique d'Afrique Centrale (UCAC), Yaoundé, Cameroon
- Centre Pasteur du Cameroun (CPC), Yaoundé, Cameroon
| | - Caroline Bénech
- Univ Brest, Inserm, EFS, UBO, UMR1078, GGB, Brest, France
- Laboratory of Excellence GR-Ex, Paris, France
| | - Françoise Ngo Sack
- Université Catholique d'Afrique Centrale (UCAC), Yaoundé, Cameroon
- Banque de sang, Hôpital Central de Yaoundé, Yaoundé, Cameroon
- Service Hémato-oncologie, Hôpital Central de Yaoundé, Yaoundé, Cameroon
| | - Estelle Wete
- Centre Mère et Enfant, Fondation Chantal Biya, Yaoundé, Cameroon
| | | | | | | | - Alexandre Njan Nloga
- Université Catholique d'Afrique Centrale (UCAC), Yaoundé, Cameroon
- Faculté des Sciences, Université de Ngaoundéré, Ngaoundéré, Cameroon
| | - Yann Fichou
- Univ Brest, Inserm, EFS, UBO, UMR1078, GGB, Brest, France
- Laboratory of Excellence GR-Ex, Paris, France
| |
Collapse
|
20
|
Negi S, Stenton SL, Berger SI, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.22.24312327. [PMID: 39228712 PMCID: PMC11370519 DOI: 10.1101/2024.08.22.24312327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ∼36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos . Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
|
21
|
Lai S, Wang H, Bork P, Chen WH, Zhao XM. Long-read sequencing reveals extensive gut phageome structural variations driven by genetic exchange with bacterial hosts. SCIENCE ADVANCES 2024; 10:eadn3316. [PMID: 39141729 PMCID: PMC11323893 DOI: 10.1126/sciadv.adn3316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 07/10/2024] [Indexed: 08/16/2024]
Abstract
Genetic variations are instrumental for unraveling phage evolution and deciphering their functional implications. Here, we explore the underlying fine-scale genetic variations in the gut phageome, especially structural variations (SVs). By using virome-enriched long-read metagenomic sequencing across 91 individuals, we identified a total of 14,438 nonredundant phage SVs and revealed their prevalence within the human gut phageome. These SVs are mainly enriched in genes involved in recombination, DNA methylation, and antibiotic resistance. Notably, a substantial fraction of phage SV sequences share close homology with bacterial fragments, with most SVs enriched for horizontal gene transfer (HGT) mechanism. Further investigations showed that these SV sequences were genetic exchanged between specific phage-bacteria pairs, particularly between phages and their respective bacterial hosts. Temperate phages exhibit a higher frequency of genetic exchange with bacterial chromosomes and then virulent phages. Collectively, our findings provide insights into the genetic landscape of the human gut phageome.
Collapse
Affiliation(s)
- Senying Lai
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Wei-Hua Chen
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- College of Life Science, Henan Normal University, Xinxiang, Henan, China
| | - Xing-Ming Zhao
- Department of Neurology, Zhongshan Hospital and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
22
|
Nanamiya H, Tanaka D, Hiyama G, Isogai T, Watanabe S. Detection of four isomers of the human cytomegalovirus genome using nanopore long-read sequencing. Virus Genes 2024; 60:377-384. [PMID: 38861195 DOI: 10.1007/s11262-024-02083-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 06/01/2024] [Indexed: 06/12/2024]
Abstract
Human cytomegalovirus has a linear DNA genome with a total length of approximately 235 kb. This large genome is divided into two domains, "Long" and "Short". There are four isomers of the cytomegalovirus genome with different orientations of each domain. To confirm the presence of four types of isomers, it is necessary to identify the sequence of the junction between the domains. However, due to the presence of repeat sequences, it is difficult to determine the junction sequences by next-generation sequencing analysis. To solve this problem, long-read sequencing was performed using the Oxford Nanopore sequencer and the junctions were successfully identified in four isomers in strain Merin and ATCC-2011-3. Nanopore sequencing also revealed the presence of multiple copies of the "a" sequence (a-seq) in the junctions, indicating the diversity of the junction sequences. These results strongly suggest that long-read sequencing using the nanopore sequencer would be beneficial for identifying the complex structure of the cytomegalovirus genome.
Collapse
Affiliation(s)
- Hideaki Nanamiya
- Fukushima Translational Research Foundation, Capital Front Bldg., 7-4, 1-35, Sakae-Machi, Fukushima, 960-8031, Japan.
- Translational Research Center, Fukushima Medical University, 1, Hikarigaoka, Fukushima, 960-1295, Japan.
| | - Daisuke Tanaka
- Translational Research Center, Fukushima Medical University, 1, Hikarigaoka, Fukushima, 960-1295, Japan
| | - Gen Hiyama
- Translational Research Center, Fukushima Medical University, 1, Hikarigaoka, Fukushima, 960-1295, Japan
| | - Takao Isogai
- Translational Research Center, Fukushima Medical University, 1, Hikarigaoka, Fukushima, 960-1295, Japan
| | - Shinya Watanabe
- Translational Research Center, Fukushima Medical University, 1, Hikarigaoka, Fukushima, 960-1295, Japan
| |
Collapse
|
23
|
Alvarez Jerez P, Daida K, Grenn FP, Malik L, Miano-Burkhardt A, Makarious MB, Ding J, Gibbs JR, Moore A, Reed X, Nalls MA, Shah S, Mahmoud M, Sedlazeck FJ, Dolzhenko E, Park M, Iwaki H, Casey B, Ryten M, Blauwendraat C, Singleton AB, Billingsley KJ. Characterizing a complex CT-rich haplotype in intron 4 of SNCA using large-scale targeted amplicon long-read sequencing. NPJ Parkinsons Dis 2024; 10:136. [PMID: 39060285 PMCID: PMC11282088 DOI: 10.1038/s41531-024-00749-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/04/2024] [Indexed: 07/28/2024] Open
Abstract
Parkinson's disease (PD) is a common neurodegenerative disorder with a significant risk proportion driven by genetics. While much progress has been made, most of the heritability remains unknown. This is in-part because previous genetic studies have focused on the contribution of single nucleotide variants. More complex forms of variation, such as structural variants and tandem repeats, are already associated with several synucleinopathies. However, because more sophisticated sequencing methods are usually required to detect these regions, little is understood regarding their contribution to PD. One example is a polymorphic CT-rich region in intron 4 of the SNCA gene. This haplotype has been suggested to be associated with risk of Lewy Body (LB) pathology in Alzheimer's Disease and SNCA gene expression, but is yet to be investigated in PD. Here, we attempt to resolve this CT-rich haplotype and investigate its role in PD. We performed targeted PacBio HiFi sequencing of the region in 1375 PD cases and 959 controls. We replicate the previously reported associations and a novel association between two PD risk SNVs (rs356182 and rs5019538) and haplotype 4, the largest haplotype. Through quantitative trait locus analyzes we identify a significant haplotype 4 association with alternative CAGE transcriptional start site usage, not leading to significant differential SNCA gene expression in post-mortem frontal cortex brain tissue. Therefore, disease association in this locus might not be biologically driven by this CT-rich repeat region. Our data demonstrates the complexity of this SNCA region and highlights that further follow up functional studies are warranted.
Collapse
Affiliation(s)
- Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Francis P Grenn
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Anni Moore
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Syed Shah
- DataTecnica LLC, Washington, DC, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Morgan Park
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
- DataTecnica LLC, Washington, DC, USA
| | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's Research, New York, New York, USA
| | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Uk Dementia Research Institute at the University of Cambridge and Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA.
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA.
| |
Collapse
|
24
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
25
|
Choi W, Cha S, Kim K. Navigating the CRISPR/Cas Landscape for Enhanced Diagnosis and Treatment of Wilson's Disease. Cells 2024; 13:1214. [PMID: 39056796 PMCID: PMC11274827 DOI: 10.3390/cells13141214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 07/15/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024] Open
Abstract
The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) system continues to evolve, thereby enabling more precise detection and repair of mutagenesis. The development of CRISPR/Cas-based diagnosis holds promise for high-throughput, cost-effective, and portable nucleic acid screening and genetic disease diagnosis. In addition, advancements in transportation strategies such as adeno-associated virus (AAV), lentiviral vectors, nanoparticles, and virus-like vectors (VLPs) offer synergistic insights for gene therapeutics in vivo. Wilson's disease (WD), a copper metabolism disorder, is primarily caused by mutations in the ATPase copper transporting beta (ATP7B) gene. The condition is associated with the accumulation of copper in the body, leading to irreversible damage to various organs, including the liver, nervous system, kidneys, and eyes. However, the heterogeneous nature and individualized presentation of physical and neurological symptoms in WD patients pose significant challenges to accurate diagnosis. Furthermore, patients must consume copper-chelating medication throughout their lifetime. Herein, we provide a detailed description of WD and review the application of novel CRISPR-based strategies for its diagnosis and treatment, along with the challenges that need to be overcome.
Collapse
Affiliation(s)
- Woong Choi
- Department of Physiology, Korea University College of Medicine, Seoul 02841, Republic of Korea;
| | - Seongkwang Cha
- Department of Physiology, Korea University College of Medicine, Seoul 02841, Republic of Korea;
- Neuroscience Research Institute, Korea University College of Medicine, Seoul 02841, Republic of Korea
| | - Kyoungmi Kim
- Department of Physiology, Korea University College of Medicine, Seoul 02841, Republic of Korea;
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul 02841, Republic of Korea
| |
Collapse
|
26
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
27
|
Barcia-Cruz R, Balboa S, Lema A, Romalde JL. Comparative genomics of Vibrio toranzoniae strains. Int Microbiol 2024:10.1007/s10123-024-00557-z. [PMID: 38995500 DOI: 10.1007/s10123-024-00557-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 06/25/2024] [Accepted: 07/02/2024] [Indexed: 07/13/2024]
Abstract
Vibrio toranzoniae is a marine bacterium belonging to the Splendidus clade that was originally isolated from healthy clams in Galicia (NW Spain). Its isolation from different hosts and seawater indicated two lifestyles and wide geographical distribution. The aim of the present study was to determine the differences at the genomic level among six strains (4 isolated from clam and 2 from seawater) and to determine their phylogeny. For this purpose, whole genomes of the six strains were sequenced by different technologies including Illumina and PacBio, and the resulting sequences were corrected. Genomes were annotated and compared using different online tools. Furthermore, the study of core- and pan-genomes were examined, and the phylogeny was inferred. The content of the core genome ranged from 2953 to 2766 genes and that of the pangenome ranged from 6278 to 6132, depending on the tool used. Although the strains shared certain homology, with DDH values ranging from 77.10 to 82.30 and values of OrthoANI values higher than 97%, some differences were found related to motility, capsule synthesis, iron acquisition systems or mobile genetic elements. Phylogenetic analysis of the core genome did not reveal a differentiation of the strains according to their lifestyle (commensal or free-living), but that of the pangenome indicated certain geographical isolation in the same growing area. This study led to the reclassification of some isolates formerly described as V. toranzoniae and demonstrated the importance of cured deposited sequences to proper phylogenetic assignment.
Collapse
Affiliation(s)
- Rubén Barcia-Cruz
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Campus Vida S/N, 15782, Santiago de Compostela, Spain
- French Agency for Food, Environmental and Occupational Health and Safety (Anses), 94701, Maisons-Alfort Cedex, France
| | - Sabela Balboa
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Campus Vida S/N, 15782, Santiago de Compostela, Spain
- Centro de Investigación Interdisciplinar en Tecnología Ambientales (CRETUS), Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain
| | - Alberto Lema
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Campus Vida S/N, 15782, Santiago de Compostela, Spain
- AllGenetics & Biology SL, Oleiros, 15172, Perillo, A Coruña, Spain
| | - Jesús L Romalde
- Departamento de Microbiología y Parasitología, CIBUS-Facultad de Biología, Universidade de Santiago de Compostela, Campus Vida S/N, 15782, Santiago de Compostela, Spain.
- Centro de Investigación Interdisciplinar en Tecnología Ambientales (CRETUS), Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain.
| |
Collapse
|
28
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
29
|
Kramer M, Goodwin S, Wappel R, Borio M, Offit K, Feldman DR, Stadler ZK, McCombie WR. Exploring the genetic and epigenetic underpinnings of early-onset cancers: Variant prioritization for long read whole genome sequencing from family cancer pedigrees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.27.601096. [PMID: 39005350 PMCID: PMC11244929 DOI: 10.1101/2024.06.27.601096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Despite significant advances in our understanding of genetic cancer susceptibility, known inherited cancer predisposition syndromes explain at most 20% of early-onset cancers. As early-onset cancer prevalence continues to increase, the need to assess previously inaccessible areas of the human genome, harnessing a trio or quad family-based architecture for variant filtration, may reveal further insights into cancer susceptibility. To assess a broader spectrum of variation than can be ascertained by multi-gene panel sequencing, or even whole genome sequencing with short reads, we employed long read whole genome sequencing using an Oxford Nanopore Technology (ONT) PromethION of 3 families containing an early-onset cancer proband using a trio or quad family architecture. Analysis included 2 early-onset colorectal cancer family trios and one quad consisting of two siblings with testicular cancer, all with unaffected parents. Structural variants (SVs), epigenetic profiles and single nucleotide variants (SNVs) were determined for each individual, and a filtering strategy was employed to refine and prioritize candidate variants based on the family architecture. The family architecture enabled us to focus on inapposite variants while filtering variants shared with the unaffected parents, significantly decreasing background variation that can hamper identification of potentially disease causing differences. Candidate d e novo and compound heterozygous variants were identified in this way. Gene expression, in matched neoplastic and pre-neoplastic lesions, was assessed for one trio. Our study demonstrates the feasibility of a streamlined analysis of genomic variants from long read ONT whole genome sequencing and a way to prioritize key variants for further evaluation of pathogenicity, while revealing what may be missing from panel based analyses.
Collapse
|
30
|
Niu J, Wang W, Wang Z, Chen Z, Zhang X, Qin Z, Miao L, Yang Z, Xie C, Xin M, Peng H, Yao Y, Liu J, Ni Z, Sun Q, Guo W. Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing. Genome Biol 2024; 25:171. [PMID: 38951917 PMCID: PMC11218387 DOI: 10.1186/s13059-024-03315-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/18/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources. RESULTS We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2NvS translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties. CONCLUSIONS The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.
Collapse
Affiliation(s)
- Jianxia Niu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Wenxi Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zihao Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Zhe Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Xiaoyu Zhang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhen Qin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Lingfeng Miao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhengzhao Yang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Chaojie Xie
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Mingming Xin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Huiru Peng
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Yingyin Yao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Jie Liu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhongfu Ni
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Qixin Sun
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
31
|
Phillips AR. Variant calling in polyploids for population and quantitative genetics. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11607. [PMID: 39184203 PMCID: PMC11342233 DOI: 10.1002/aps3.11607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/03/2024] [Accepted: 04/10/2024] [Indexed: 08/27/2024]
Abstract
Advancements in genome assembly and sequencing technology have made whole genome sequence (WGS) data and reference genomes accessible to study polyploid species. Compared to popular reduced-representation sequencing approaches, the genome-wide coverage and greater marker density provided by WGS data can greatly improve our understanding of polyploid species and polyploid biology. However, biological features that make polyploid species interesting also pose challenges in read mapping, variant identification, and genotype estimation. Accounting for characteristics in variant calling like allelic dosage uncertainty, homology between subgenomes, and variance in chromosome inheritance mode can reduce errors. Here, I discuss the challenges of variant calling in polyploid WGS data and discuss where potential solutions can be integrated into a standard variant calling pipeline.
Collapse
Affiliation(s)
- Alyssa R. Phillips
- Department of Evolution and EcologyUniversity of California, DavisDavis95616CaliforniaUSA
| |
Collapse
|
32
|
Curry KD, Yu FB, Vance SE, Segarra S, Bhaya D, Chikhi R, Rocha EPC, Treangen TJ. Reference-free structural variant detection in microbiomes via long-read co-assembly graphs. Bioinformatics 2024; 40:i58-i67. [PMID: 38940156 PMCID: PMC11211843 DOI: 10.1093/bioinformatics/btae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION The study of bacterial genome dynamics is vital for understanding the mechanisms underlying microbial adaptation, growth, and their impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing all metagenomic samples in a series (time or other metric) into a single co-assembly graph. The log fold change in graph coverage between successive samples is then calculated to call SVs that are thriving or declining. RESULTS We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, particularly as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux. AVAILABILITY AND IMPLEMENTATION rhea is open source and available at: https://github.com/treangenlab/rhea.
Collapse
Affiliation(s)
- Kristen D Curry
- Department of Computer Science, Rice University, 6100 Main St., Houston, TX 77005, United States
- Department of Genomes and Genetics, Microbial Evolutionary Genomics, Institut Pasteur, Université Paris Cité, CNRS, UMR3525, Paris 75015, France
| | | | - Summer E Vance
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, United States
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, United States
| | - Devaki Bhaya
- Carnegie Institution for Science, Department of Plant Biology, Stanford, CA 94305, United States
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris 75015, France
| | - Eduardo P C Rocha
- Department of Genomes and Genetics, Microbial Evolutionary Genomics, Institut Pasteur, Université Paris Cité, CNRS, UMR3525, Paris 75015, France
| | - Todd J Treangen
- Department of Computer Science, Rice University, 6100 Main St., Houston, TX 77005, United States
| |
Collapse
|
33
|
Hämälä T, Moore C, Cowan L, Carlile M, Gopaulchan D, Brandrud MK, Birkeland S, Loose M, Kolář F, Koch MA, Yant L. Impact of whole-genome duplications on structural variant evolution in Cochlearia. Nat Commun 2024; 15:5377. [PMID: 38918389 PMCID: PMC11199601 DOI: 10.1038/s41467-024-49679-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 06/14/2024] [Indexed: 06/27/2024] Open
Abstract
Polyploidy, the result of whole-genome duplication (WGD), is a major driver of eukaryote evolution. Yet WGDs are hugely disruptive mutations, and we still lack a clear understanding of their fitness consequences. Here, we study whether WGDs result in greater diversity of genomic structural variants (SVs) and how they influence evolutionary dynamics in a plant genus, Cochlearia (Brassicaceae). By using long-read sequencing and a graph-based pangenome, we find both negative and positive interactions between WGDs and SVs. Masking of recessive mutations due to WGDs leads to a progressive accumulation of deleterious SVs across four ploidal levels (from diploids to octoploids), likely reducing the adaptive potential of polyploid populations. However, we also discover putative benefits arising from SV accumulation, as more ploidy-specific SVs harbor signals of local adaptation in polyploids than in diploids. Together, our results suggest that SVs play diverse and contrasting roles in the evolutionary trajectories of young polyploids.
Collapse
Affiliation(s)
- Tuomas Hämälä
- School of Life Sciences, University of Nottingham, Nottingham, UK.
- Production Systems, Natural Resources Institute Finland, Jokioinen, Finland.
| | | | - Laura Cowan
- School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Matthew Carlile
- School of Life Sciences, University of Nottingham, Nottingham, UK
| | | | | | - Siri Birkeland
- Natural History Museum, University of Oslo, Oslo, Norway
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Matthew Loose
- School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Filip Kolář
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Botany, Czech Academy of Sciences, Průhonice, Czech Republic
| | - Marcus A Koch
- Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany
| | - Levi Yant
- School of Life Sciences, University of Nottingham, Nottingham, UK.
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic.
| |
Collapse
|
34
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
35
|
Li W, Miller D, Liu X, Tosi L, Chkaiban L, Mei H, Hung PH, Parekkadan B, Sherlock G, Levy S. Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries. Nucleic Acids Res 2024; 52:e47. [PMID: 38709890 PMCID: PMC11162764 DOI: 10.1093/nar/gkae332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/23/2024] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Sequence verification of plasmid DNA is critical for many cloning and molecular biology workflows. To leverage high-throughput sequencing, several methods have been developed that add a unique DNA barcode to individual samples prior to pooling and sequencing. However, these methods require an individual plasmid extraction and/or in vitro barcoding reaction for each sample processed, limiting throughput and adding cost. Here, we develop an arrayed in vivo plasmid barcoding platform that enables pooled plasmid extraction and library preparation for Oxford Nanopore sequencing. This method has a high accuracy and recovery rate, and greatly increases throughput and reduces cost relative to other plasmid barcoding methods or Sanger sequencing. We use in vivo barcoding to sequence verify >45 000 plasmids and show that the method can be used to transform error-containing dispersed plasmid pools into sequence-perfect arrays or well-balanced pools. In vivo barcoding does not require any specialized equipment beyond a low-overhead Oxford Nanopore sequencer, enabling most labs to flexibly process hundreds to thousands of plasmids in parallel.
Collapse
Affiliation(s)
- Weiyi Li
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Darach Miller
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Xianan Liu
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Lorenzo Tosi
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Lamia Chkaiban
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Han Mei
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Po-Hsiang Hung
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Biju Parekkadan
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Sasha F Levy
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| |
Collapse
|
36
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. Bioinformatics 2024; 40:btae340. [PMID: 38796686 PMCID: PMC11153836 DOI: 10.1093/bioinformatics/btae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/04/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024] Open
Abstract
SUMMARY The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences. AVAILABILITY AND IMPLEMENTATION SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
| | - Katherine S Pollard
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
- Chan Zuckerberg Biohub, San Francisco, CA 94158, United States
| |
Collapse
|
37
|
Recuerda M, Campagna L. How structural variants shape avian phenotypes: Lessons from model systems. Mol Ecol 2024; 33:e17364. [PMID: 38651830 DOI: 10.1111/mec.17364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/04/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024]
Abstract
Despite receiving significant recent attention, the relevance of structural variation (SV) in driving phenotypic diversity remains understudied, although recent advances in long-read sequencing, bioinformatics and pangenomic approaches have enhanced SV detection. We review the role of SVs in shaping phenotypes in avian model systems, and identify some general patterns in SV type, length and their associated traits. We found that most of the avian SVs so far identified are short indels in chickens, which are frequently associated with changes in body weight and plumage colouration. Overall, we found that relatively short SVs are more frequently detected, likely due to a combination of their prevalence compared to large SVs, and a detection bias, stemming primarily from the widespread use of short-read sequencing and associated analytical methods. SVs most commonly involve non-coding regions, especially introns, and when patterns of inheritance were reported, SVs associated primarily with dominant discrete traits. We summarise several examples of phenotypic convergence across different species, mediated by different SVs in the same or different genes and different types of changes in the same gene that can lead to various phenotypes. Complex rearrangements and supergenes, which can simultaneously affect and link several genes, tend to have pleiotropic phenotypic effects. Additionally, SVs commonly co-occur with single-nucleotide polymorphisms, highlighting the need to consider all types of genetic changes to understand the basis of phenotypic traits. We end by summarising expectations for when long-read technologies become commonly implemented in non-model birds, likely leading to an increase in SV discovery and characterisation. The growing interest in this subject suggests an increase in our understanding of the phenotypic effects of SVs in upcoming years.
Collapse
Affiliation(s)
- María Recuerda
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
| | - Leonardo Campagna
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
38
|
Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]
Abstract
BACKGROUND Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. OBJECTIVE This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. METHODS We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. RESULTS From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. CONCLUSIONS On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Collapse
|
39
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
40
|
Szakállas N, Barták BK, Valcz G, Nagy ZB, Takács I, Molnár B. Can long-read sequencing tackle the barriers, which the next-generation could not? A review. Pathol Oncol Res 2024; 30:1611676. [PMID: 38818014 PMCID: PMC11137202 DOI: 10.3389/pore.2024.1611676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/30/2024] [Indexed: 06/01/2024]
Abstract
The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.
Collapse
Affiliation(s)
- Nikolett Szakállas
- Department of Biological Physics, Faculty of Science, Eötvös Loránd University, Budapest, Hungary
| | - Barbara K. Barták
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Gábor Valcz
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
- HUN-REN-SU Translational Extracellular Vesicle Research Group, Budapest, Hungary
| | - Zsófia B. Nagy
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - István Takács
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Béla Molnár
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| |
Collapse
|
41
|
Yu Y, Gao R, Luo J. LcDel: deletion variation detection based on clustering and long reads. Front Genet 2024; 15:1404415. [PMID: 38798694 PMCID: PMC11116628 DOI: 10.3389/fgene.2024.1404415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results. Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
42
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
43
|
Bjørnstad PM, Aaløkken R, Åsheim J, Sundaram AYM, Felde CN, Østby GH, Dalland M, Sjursen W, Carrizosa C, Vigeland MD, Sorte HS, Sheng Y, Ariansen SL, Grindedal EM, Gilfillan GD. A 39 kb structural variant causing Lynch Syndrome detected by optical genome mapping and nanopore sequencing. Eur J Hum Genet 2024; 32:513-520. [PMID: 38030917 PMCID: PMC11061271 DOI: 10.1038/s41431-023-01494-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/19/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023] Open
Abstract
Lynch Syndrome (LS) is a hereditary cancer syndrome caused by pathogenic germline variants in one of the four mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2. It is characterized by a significantly increased risk of multiple cancer types, particularly colorectal and endometrial cancer, with autosomal dominant inheritance. Access to precise and sensitive methods for genetic testing is important, as early detection and prevention of cancer is possible when the variant is known. We present here two unrelated Norwegian families with family histories strongly suggestive of LS, where immunohistochemical and microsatellite instability analyses indicated presence of a pathogenic variant in MSH2, but targeted exon sequencing and multiplex ligation-dependent probe amplification (MLPA) were negative. Using Bionano optical genome mapping, we detected a 39 kb insertion in the MSH2 gene. Precise mapping of the insertion breakpoints and inserted sequence was performed by low-coverage whole-genome sequencing with an Oxford Nanopore MinION. The same variant was present in both families, and later found in other families from the same region of Norway, indicative of a founder event. To our knowledge, this is the first diagnosis of LS caused by a structural variant using these technologies. We suggest that structural variant detection be performed when LS is suspected but not confirmed with first-tier standard genetic testing.
Collapse
Affiliation(s)
- Pål Marius Bjørnstad
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ragnhild Aaløkken
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - June Åsheim
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Arvind Y M Sundaram
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Caroline N Felde
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - G Henriette Østby
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Marianne Dalland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Wenche Sjursen
- Department of Clinical & Molecular Medicine, NTNU and Department of Medical Genetics, St Olavs Hospital, Trondheim, Norway
| | - Christian Carrizosa
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Magnus D Vigeland
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Forensic Sciences, Oslo University Hospital, 0372, Oslo, Norway
| | - Hanne S Sorte
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Ying Sheng
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Sarah L Ariansen
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Eli Marie Grindedal
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Gregor D Gilfillan
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| |
Collapse
|
44
|
Gunasekaran D, Ardell DH, Nobile CJ. SNP-SVant: A Computational Workflow to Predict and Annotate Genomic Variants in Organisms Lacking Benchmarked Variants. Curr Protoc 2024; 4:e1046. [PMID: 38717471 PMCID: PMC11081530 DOI: 10.1002/cpz1.1046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.
Collapse
Affiliation(s)
- Deepika Gunasekaran
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| | - David H. Ardell
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| | - Clarissa J. Nobile
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
- Health Science Research Institute, University of California, Merced, CA, USA
| |
Collapse
|
45
|
Kim D, Shin JI, Yoo IY, Jo S, Chu J, Cho WY, Shin SH, Chung YJ, Park YJ, Jung SH. GenoMycAnalyzer: a web-based tool for species and drug resistance prediction for Mycobacterium genomes. BMC Genomics 2024; 25:387. [PMID: 38643090 PMCID: PMC11031912 DOI: 10.1186/s12864-024-10320-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/17/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND Drug-resistant tuberculosis (TB) is a major threat to global public health. Whole-genome sequencing (WGS) is a useful tool for species identification and drug resistance prediction, and many clinical laboratories are transitioning to WGS as a routine diagnostic tool. However, user-friendly and high-confidence automated bioinformatics tools are needed to rapidly identify M. tuberculosis complex (MTBC) and non-tuberculous mycobacteria (NTM), detect drug resistance, and further guide treatment options. RESULTS We developed GenoMycAnalyzer, a web-based software that integrates functions for identifying MTBC and NTM species, lineage and spoligotype prediction, variant calling, annotation, drug-resistance determination, and data visualization. The accuracy of GenoMycAnalyzer for genotypic drug susceptibility testing (gDST) was evaluated using 5,473 MTBC isolates that underwent phenotypic DST (pDST). The GenoMycAnalyzer database was built to predict the gDST for 15 antituberculosis drugs using the World Health Organization mutational catalogue. Compared to pDST, the sensitivity of drug susceptibilities by the GenoMycAnalyzer for first-line drugs ranged from 95.9% for rifampicin (95% CI 94.8-96.7%) to 79.6% for pyrazinamide (95% CI 76.9-82.2%), whereas those for second-line drugs ranged from 98.2% for levofloxacin (95% CI 90.1-100.0%) to 74.9% for capreomycin (95% CI 69.3-80.0%). Notably, the integration of large deletions of the four resistance-conferring genes increased gDST sensitivity. The specificity of drug susceptibilities by the GenoMycAnalyzer ranged from 98.7% for amikacin (95% CI 97.8-99.3%) to 79.5% for ethionamide (95% CI 76.4-82.3%). The incorporated Kraken2 software identified 1,284 mycobacterial species with an accuracy of 98.8%. GenoMycAnalyzer also perfectly predicted lineages for 1,935 MTBC and spoligotypes for 54 MTBC. CONCLUSIONS GenoMycAnalyzer offers both web-based and graphical user interfaces, which can help biologists with limited access to high-performance computing systems or limited bioinformatics skills. By streamlining the interpretation of WGS data, the GenoMycAnalyzer has the potential to significantly impact TB management and contribute to global efforts to combat this infectious disease. GenoMycAnalyzer is available at http://www.mycochase.org .
Collapse
Affiliation(s)
- Doyoung Kim
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jeong-Ih Shin
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Integrated Research Center for Genomic Polymorphism, Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - In Young Yoo
- Department of Laboratory Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Sungjin Jo
- Department of Laboratory Medicine, Eunpyeong St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jiyon Chu
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | | | | | - Yeun-Jun Chung
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Integrated Research Center for Genomic Polymorphism, Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Departments of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Yeon-Joon Park
- Department of Laboratory Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Seung-Hyun Jung
- Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea.
- Integrated Research Center for Genomic Polymorphism, Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, Seoul, Korea.
- Departments of Biochemistry, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seoch-Gu, Seoul, 06591, Republic of Korea.
| |
Collapse
|
46
|
Sjodin BMF, Schmidt DA, Galbreath KE, Russello MA. Putative climate adaptation in American pikas (Ochotona princeps) is associated with copy number variation across environmental gradients. Sci Rep 2024; 14:8568. [PMID: 38609461 PMCID: PMC11014952 DOI: 10.1038/s41598-024-59157-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/08/2024] [Indexed: 04/14/2024] Open
Abstract
Improved understanding of the genetic basis of adaptation to climate change is necessary for maintaining global biodiversity moving forward. Studies to date have largely focused on sequence variation, yet there is growing evidence that suggests that changes in genome structure may be an even more significant source of adaptive potential. The American pika (Ochotona princeps) is an alpine specialist that shows some evidence of adaptation to climate along elevational gradients, but previous work has been limited to single nucleotide polymorphism based analyses within a fraction of the species range. Here, we investigated the role of copy number variation underlying patterns of local adaptation in the American pika using genome-wide data previously collected across the entire species range. We identified 37-193 putative copy number variants (CNVs) associated with environmental variation (temperature, precipitation, solar radiation) within each of the six major American pika lineages, with patterns of divergence largely following elevational and latitudinal gradients. Genes associated (n = 158) with independent annotations across lineages, variables, and/or CNVs had functions related to mitochondrial structure/function, immune response, hypoxia, olfaction, and DNA repair. Some of these genes have been previously linked to putative high elevation and/or climate adaptation in other species, suggesting they may serve as important targets in future studies.
Collapse
Affiliation(s)
- Bryson M F Sjodin
- Department of Biology, The University of British Columbia, 3247 University Way, Kelowna, BC, V1V 1V7, Canada
| | - Danielle A Schmidt
- Department of Biology, The University of British Columbia, 3247 University Way, Kelowna, BC, V1V 1V7, Canada
| | - Kurt E Galbreath
- Department of Biology, Northern Michigan University, 1401 Presque Isle Ave, Marquette, MI, 49855, USA
| | - Michael A Russello
- Department of Biology, The University of British Columbia, 3247 University Way, Kelowna, BC, V1V 1V7, Canada.
| |
Collapse
|
47
|
Du ZZ, He JB, Jiao WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 2024; 25:91. [PMID: 38589937 PMCID: PMC11003132 DOI: 10.1186/s13059-024-03239-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
Collapse
Affiliation(s)
- Ze-Zhen Du
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jia-Bao He
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wen-Biao Jiao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
| |
Collapse
|
48
|
David G, Bertolotti A, Layer R, Scofield D, Hayward A, Baril T, Burnett HA, Gudmunds E, Jensen H, Husby A. Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations. Genome Biol Evol 2024; 16:evae049. [PMID: 38489588 PMCID: PMC11018544 DOI: 10.1093/gbe/evae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/28/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024] Open
Abstract
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
Collapse
Affiliation(s)
- Gabriel David
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | - Ryan Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Douglas Scofield
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Alexander Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Hamish A Burnett
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Gudmunds
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Arild Husby
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
49
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
50
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|