1
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024; 25:750-767. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
2
|
Behera S, Belyeu JR, Chen X, Paulin LF, Nguyen NQH, Newman E, Mahmoud M, Menon VK, Qi Q, Joshi P, Marcovina S, Rossi M, Roller E, Han J, Onuchic V, Avery CL, Ballantyne CM, Rodriguez CJ, Kaplan RC, Muzny DM, Metcalf GA, Gibbs RA, Yu B, Boerwinkle E, Eberle MA, Sedlazeck FJ. Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. BMC Med Genomics 2024; 17:255. [PMID: 39449055 DOI: 10.1186/s12920-024-02024-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/07/2024] [Indexed: 10/26/2024] Open
Abstract
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase approximately 50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jonathan R Belyeu
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Xiao Chen
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ngoc Quynh H Nguyen
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Genentech, San Francisco, CA, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Parag Joshi
- Medpace Reference Laboratories, Cincinnati, OH, USA
| | | | | | | | | | | | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Carlos J Rodriguez
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Fred Hutchinson Cancer Center, Public Health Sciences Division, Seattle, WA, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Bing Yu
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| | - Michael A Eberle
- Illumina Inc, San Diego, CA, USA
- Present Address: Pacific Biosciences, San Francisco, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
3
|
Trinh J, Schaake S, Gabbert C, Lüth T, Cowley SA, Fienemann A, Ullrich KK, Klein C, Seibler P. Optical genome mapping of structural variants in Parkinson's disease-related induced pluripotent stem cells. BMC Genomics 2024; 25:980. [PMID: 39425080 PMCID: PMC11490025 DOI: 10.1186/s12864-024-10902-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 10/14/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Certain structural variants (SVs) including large-scale genetic copy number variants, as well as copy number-neutral inversions and translocations may not all be resolved by chromosome karyotype studies. The identification of genetic risk factors for Parkinson's disease (PD) has been primarily focused on the gene-disruptive single nucleotide variants. In contrast, larger SVs, which may significantly influence human phenotypes, have been largely underexplored. Optical genomic mapping (OGM) represents a novel approach that offers greater sensitivity and resolution for detecting SVs. In this study, we used induced pluripotent stem cell (iPSC) lines of patients with PD-linked SNCA and PRKN variants as a proof of concept to (i) show the detection of pathogenic SVs in PD with OGM and (ii) provide a comprehensive screening of genetic abnormalities in iPSCs. RESULTS OGM detected SNCA gene triplication and duplication in patient-derived iPSC lines, which were not identified by long-read sequencing. Additionally, various exon deletions were confirmed by OGM in the PRKN gene of iPSCs, of which exon 3-5 and exon 2 deletions were unable to phase with conventional multiplex-ligation-dependent probe amplification. In terms of chromosomal abnormalities in iPSCs, no gene fusions, no aneuploidy but two balanced inter-chromosomal translocations were detected in one line that were absent in the parental fibroblasts and not identified by routine single nucleotide variant karyotyping. CONCLUSIONS In summary, OGM can detect pathogenic SVs in PD-linked genes as well as reveal genomic abnormalities for iPSCs that were not identified by other techniques, which is supportive for OGM's future use in gene discovery and iPSC line screening.
Collapse
Affiliation(s)
- Joanne Trinh
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Susen Schaake
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Carolin Gabbert
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Theresa Lüth
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Sally A Cowley
- James and Lillian Martin Centre for Stem Cell Research, Sir William Dunn School of Pathology, University of Oxford, Oxford, UK
| | - André Fienemann
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Kristian K Ullrich
- Division Scientific IT Group, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Philip Seibler
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Ratzeburger Allee 160, 23562, Lübeck, Germany.
| |
Collapse
|
4
|
Parmar JM, Laing NG, Kennerson ML, Ravenscroft G. Genetics of inherited peripheral neuropathies and the next frontier: looking backwards to progress forwards. J Neurol Neurosurg Psychiatry 2024; 95:992-1001. [PMID: 38744462 PMCID: PMC11503175 DOI: 10.1136/jnnp-2024-333436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Inherited peripheral neuropathies (IPNs) encompass a clinically and genetically heterogeneous group of disorders causing length-dependent degeneration of peripheral autonomic, motor and/or sensory nerves. Despite gold-standard diagnostic testing for pathogenic variants in over 100 known associated genes, many patients with IPN remain genetically unsolved. Providing patients with a diagnosis is critical for reducing their 'diagnostic odyssey', improving clinical care, and for informed genetic counselling. The last decade of massively parallel sequencing technologies has seen a rapid increase in the number of newly described IPN-associated gene variants contributing to IPN pathogenesis. However, the scarcity of additional families and functional data supporting variants in potential novel genes is prolonging patient diagnostic uncertainty and contributing to the missing heritability of IPNs. We review the last decade of IPN disease gene discovery to highlight novel genes, structural variation and short tandem repeat expansions contributing to IPN pathogenesis. From the lessons learnt, we provide our vision for IPN research as we anticipate the future, providing examples of emerging technologies, resources and tools that we propose that will expedite the genetic diagnosis of unsolved IPN families.
Collapse
Affiliation(s)
- Jevin M Parmar
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| | - Nigel G Laing
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
- Preventive Genetics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
| | - Marina L Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, New South Wales, Australia
| | - Gianina Ravenscroft
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
5
|
Ma C, Shi X, Li X, Zhang YP, Peng MS. Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data. BMC Genomics 2024; 25:970. [PMID: 39415108 PMCID: PMC11481438 DOI: 10.1186/s12864-024-10875-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 10/08/2024] [Indexed: 10/18/2024] Open
Abstract
BACKGROUND Structural variations (SVs) are widespread across genome and have a great impact on evolution, disease, and phenotypic diversity. Despite the development of numerous bioinformatic tools, commonly referred to as SV callers, tailored for detecting SVs using whole genome sequence (WGS) data and employing diverse algorithms, their performance necessitates rigorous evaluation with real data and validated SVs. Moreover, a considerable proportion of these tools have been primarily designed and optimized using human genome data. Consequently, their applicability and performance in Avian species, characterized by smaller genomes and distinct genomic architectures, remain inadequately assessed. RESULTS We performed a comprehensive assessment of the performance of ten widely used SV callers using population-level real genomic data with the validated five common types of SVs. The performance of SV callers varies with the types and sizes of SVs. As compared with other tools, GRIDSS, Lumpy, Wham, and Manta present better detection accuracy. Pindel can detect more small SVs than others. CNVnator and CNVkit can detect more medium and large copy number variations. Given the poor consistency among different SV callers, the combination calling strategy is not recommended. All tools show poor ability in the detection of insertions (especially with size > 150 bp). At least 50× read depth is required to detect more than 80% of the SVs for most tools. CONCLUSIONS This study highlights the importance and necessity of using real sequencing data, rather than simulated data only, with validated SVs for SV caller evaluation. Some practical guidance and suggestions are provided for SV detection in future researches.
Collapse
Affiliation(s)
- Cheng Ma
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- Department of Medical Biochemistry and Microbiology, Uppsala University, BMC, Uppsala, SE-75123, Sweden
| | - Xian Shi
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuzhen Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming, 650201, China
- College of Biological Big Data, Yunnan Agriculture University, Kunming, 650201, China
| | - Ya-Ping Zhang
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650091, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Min-Sheng Peng
- Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
6
|
Ferreira MR, Carratto TMT, Frontanilla TS, Bonadio RS, Jain M, de Oliveira SF, Castelli EC, Mendes-Junior CT. Advances in forensic genetics: Exploring the potential of long read sequencing. Forensic Sci Int Genet 2024; 74:103156. [PMID: 39427416 DOI: 10.1016/j.fsigen.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 10/04/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
DNA-based technologies have been used in forensic practice since the mid-1980s. While PCR-based STR genotyping using Capillary Electrophoresis remains the gold standard for generating DNA profiles in routine casework worldwide, the research community is continually seeking alternative methods capable of providing additional information to enhance discrimination power or contribute with new investigative leads. Oxford Nanopore Technologies (ONT) and PacBio third-generation sequencing have revolutionized the field, offering real-time capabilities, single-molecule resolution, and long-read sequencing (LRS). ONT, the pioneer of nanopore sequencing, uses biological nanopores to analyze nucleic acids in real-time. Its devices have revolutionized sequencing and may represent an interesting alternative for forensic research and routine casework, given that it offers unparalleled flexibility in a portable size: it enables sequencing approaches that range widely from PCR-amplified short target regions (e.g., CODIS STRs) to PCR-free whole transcriptome or even ultra-long whole genome sequencing. Despite its higher error rate compared to Illumina sequencing, it can significantly improve accuracy in read alignment against a reference genome or de novo genome assembly. This is achieved by generating long contiguous sequences that correctly assemble repetitive sections and regions with structural variation. Moreover, it allows real-time determination of DNA methylation status from native DNA without the need for bisulfite conversion. LRS enables the analysis of thousands of markers at once, providing phasing information and eliminating the need for multiple assays. This maximizes the information retrieved from a single invaluable sample. In this review, we explore the potential use of LRS in different forensic genetics approaches.
Collapse
Affiliation(s)
- Marcel Rodrigues Ferreira
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Thássia Mayra Telles Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14049-900, Brazil
| | - Raphael Severino Bonadio
- Depto Genética e Morfologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | | | - Erick C Castelli
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil; Pathology Department, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
| |
Collapse
|
7
|
Wu L, Yi W, Yao S, Xie S, Peng R, Zhang J, Tan W. mRNA-Based Cancer Vaccines: Advancements and Prospects. NANO LETTERS 2024. [PMID: 39375146 DOI: 10.1021/acs.nanolett.4c03296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
The success of mRNA COVID-19 vaccines has reinvigorated research and interest in mRNA-based cancer vaccines. Despite promising results in clinical trials, therapeutic mRNA-based cancer vaccines have not yet been approved for human use. These vaccines are designed to trigger tumor regression, establish enduring antitumor memory, and mitigate adverse reactions. However, challenges such as tumor-induced immunosuppression and immunoresistance significantly hinder their application. Here, we provide an overview of the recent advances of neoantigen discovery and delivery systems for mRNA vaccines, focusing on improving clinical efficacy. Additionally, we summarize the recent clinical advances involving mRNA cancer vaccines and discuss prospective strategies for overcoming immuneresistance.
Collapse
Affiliation(s)
- Lijin Wu
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
- School of Molecular Medicine, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, No.19 A Yuquan Road, Beijing 100049, China
| | - Weicheng Yi
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
| | - Shiyu Yao
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, College of Biology, Aptamer Engineering Center of Hunan Province, Hunan University, Changsha, Hunan 410082, China
| | - Sitao Xie
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
| | - Ruizi Peng
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
| | - Jing Zhang
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
| | - Weihong Tan
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, College of Biology, Aptamer Engineering Center of Hunan Province, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
8
|
Dunn T, Zook JM, Holt JM, Narayanasamy S. Jointly benchmarking small and structural variant calls with vcfdist. Genome Biol 2024; 25:253. [PMID: 39358801 PMCID: PMC11446017 DOI: 10.1186/s13059-024-03394-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (- 28.9%), INDELs (- 19.3%), and SVs (- 52.4%) across three datasets. vcfdist also corrects a common flaw in phasing evaluations, reducing measured flip errors by over 50%. Lastly, we show that vcfdist is more accurate than previously published works and on par with the newest approaches while providing improved result interpretability.
Collapse
Affiliation(s)
- Tim Dunn
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA.
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | | | - Satish Narayanasamy
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
9
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
10
|
Karageorgiou C, Gokcumen O, Dennis MY. Deciphering the role of structural variation in human evolution: a functional perspective. Curr Opin Genet Dev 2024; 88:102240. [PMID: 39121701 PMCID: PMC11485010 DOI: 10.1016/j.gde.2024.102240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024]
Abstract
Advances in sequencing technologies have enabled the comparison of high-quality genomes of diverse primate species, revealing vast amounts of divergence due to structural variation. Given their large size, structural variants (SVs) can simultaneously alter the function and regulation of multiple genes. Studies estimate that collectively more than 3.5% of the genome is divergent in humans versus other great apes, impacting thousands of genes. Functional genomics and gene-editing tools in various model systems recently emerged as an exciting frontier - investigating the wide-ranging impacts of SVs on molecular, cellular, and systems-level phenotypes. This review examines existing research and identifies future directions to broaden our understanding of the functional roles of SVs on phenotypic innovations and diversity impacting uniquely human features, ranging from cognition to metabolic adaptations.
Collapse
Affiliation(s)
- Charikleia Karageorgiou
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA. https://twitter.com/@evobioclio
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA
| | - Megan Y Dennis
- Department of Biochemistry & Molecular Medicine, Genome Center, and MIND Institute, University of California, Davis, CA 95616, USA.
| |
Collapse
|
11
|
Zhang Z, Zhang J, Kang L, Qiu X, Xu S, Xu J, Guo Y, Niu Z, Niu B, Bi A, Zhao X, Xu D, Wang J, Yin C, Lu F. Structural variation discovery in wheat using PacBio high-fidelity sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:687-698. [PMID: 39239888 DOI: 10.1111/tpj.17011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 08/09/2024] [Accepted: 08/22/2024] [Indexed: 09/07/2024]
Abstract
Structural variations (SVs) pervade plant genomes and contribute substantially to the phenotypic diversity. However, most SVs were ineffectively assayed due to their complex nature and the limitations of early genomic technologies. By applying the PacBio high-fidelity (HiFi) sequencing for wheat genomes, we performed a comprehensive evaluation of mainstream long-read aligners and SV callers in SV detection. The results indicated that the accuracy of deletion discovery is markedly influenced by callers, accounting for 87.73% of the variance, whereas both aligners (38.25%) and callers (49.32%) contributed substantially to the accuracy variance for insertions. Among the aligners, Winnowmap2 and NGMLR excelled in detecting deletions and insertions, respectively. For SV callers, SVIM achieved the best performance. We demonstrated that combining the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the effect of sequencing depth on the accuracy of SV detection, revealing that low-coverage HiFi sequencing is sufficiently robust for high-quality SV discovery. This study thoroughly evaluated SV discovery approaches and established optimal workflows for investigating structural variations using low-coverage HiFi sequencing in the wheat genome, which will advance SV discovery and decipher the biological functions of SVs in wheat and many other plants.
Collapse
Affiliation(s)
- Zhiliang Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jijin Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lipeng Kang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebing Qiu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Song Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jun Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yafei Guo
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zelin Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Beirui Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Aoyue Bi
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebo Zhao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daxing Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Changbin Yin
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Fei Lu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
12
|
Zheng Y, Shang X. FindCSV: a long-read based method for detecting complex structural variations. BMC Bioinformatics 2024; 25:315. [PMID: 39342151 PMCID: PMC11439270 DOI: 10.1186/s12859-024-05937-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
13
|
Rubio-Alarcón C, Stelloo E, Vessies DCL, Erve IV, Mekkes NJ, Swennenhuis J, Lakbir S, van Bree EJ, Tijssen M, Diemen PDV, Lanfermeijer M, Linders T, van den Broek D, Punt CJA, Heringa J, Meijer GA, Abeln S, Feitsma H, Fijneman RJA. High Prevalence of Chromosomal Rearrangements and LINE Retrotranspositions Detected in Formalin-Fixed, Paraffin-Embedded Colorectal Cancer Tissue. J Mol Diagn 2024:S1525-1578(24)00206-X. [PMID: 39332629 DOI: 10.1016/j.jmoldx.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 07/04/2024] [Accepted: 08/23/2024] [Indexed: 09/29/2024] Open
Abstract
Structural variants (SVs) caused by chromosomal rearrangements in common fragile sites or long interspersed nuclear element (LINE) retrotranspositions are highly prevalent in colorectal cancer. However, methodology for the targeted detection of these SVs is lacking. This article reports the use of formalin-fixed paraffin-embedded targeted-locus capture (FFPE-TLC) sequencing as a novel technology for the targeted detection of tumor-specific SVs. Analysis of 29 FFPE colorectal tumor samples and 8 matched normal samples revealed tumor-specific SVs in 24 patients (83%), with a median of 2 SVs per patient (range, 1 to 21). A total of 104 SVs were found in the common fragile site-associated genes MACROD2, PRKN, FHIT, and WWOX in 18 patients (62%), and 39 SVs caused by three LINE transposable elements were found in 15 patients (52%). Tumor specificity of SVs was independently verified by Droplet Digital PCR of tumor tissue DNA, and their applicability as plasma circulating tumor DNA biomarkers was demonstrated. It was concluded that FFPE-TLC sequencing enables the detection of tumor-specific SVs caused by chromosomal rearrangements and LINE retrotranspositions in FFPE tissue. Therefore, FFPE-TLC sequencing facilitates the investigation of the biological and clinical effects of SVs using FFPE material from (retrospective) cohorts of cancer patients and has potential clinical applicability in the detection of SV biomarkers in the routine molecular diagnostics setting.
Collapse
Affiliation(s)
- Carmen Rubio-Alarcón
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Ellen Stelloo
- Cergentis B.V., a Solvias company, Utrecht, The Netherlands
| | - Daan C L Vessies
- Department of Laboratory Medicine, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Iris Van't Erve
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Nienke J Mekkes
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | | | - Soufyan Lakbir
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands; Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Elisabeth J van Bree
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Marianne Tijssen
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Pien Delis-van Diemen
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Mirthe Lanfermeijer
- Department of Laboratory Medicine, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Theodora Linders
- Department of Laboratory Medicine, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Daan van den Broek
- Department of Laboratory Medicine, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Cornelis J A Punt
- Department of Epidemiology, Julius Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Jaap Heringa
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Gerrit A Meijer
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Sanne Abeln
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; AI Technology for Life Group, Department of Information and Computing Sciences, Department of Biology, Utrecht University, Utrecht, The Netherlands
| | - Harma Feitsma
- Cergentis B.V., a Solvias company, Utrecht, The Netherlands
| | - Remond J A Fijneman
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
| |
Collapse
|
14
|
Su KJ, Qiu C, Greenbaum J, Zhang X, Liu A, Liu Y, Luo Z, Mungasavalli Gnanesh SS, Tian Q, Zhao LJ, Shen H, Deng HW. Genomic structural variations link multiple genes to bone mineral density in a multi-ethnic cohort study: Louisiana osteoporosis study. J Bone Miner Res 2024; 39:1474-1485. [PMID: 39167757 PMCID: PMC11425707 DOI: 10.1093/jbmr/zjae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 07/11/2024] [Accepted: 08/07/2024] [Indexed: 08/23/2024]
Abstract
Osteoporosis, characterized by low BMD, is a highly heritable metabolic bone disorder. Although single nucleotide variations (SNVs) have been extensively studied, they explain only a fraction of BMD heritability. Although genomic structural variations (SVs) are large-scale genomic alterations that contribute to genetic diversity in shaping phenotypic variations, the role of SVs in osteoporosis susceptibility remains poorly understood. This study aims to identify and prioritize genes that harbor BMD-related SVs. We performed whole genome sequencing on 4982 subjects from the Louisiana Osteoporosis Study. To obtain high-confidence SVs, the detection of SVs was performed using an ensemble approach. The SVs were tested for association with BMD variation at the hip (HIP), femoral neck (FNK), and lumbar spine (SPN), respectively. Additionally, we conducted co-occurrence analysis using multi-omics approaches to prioritize the identified genes based on their functional importance. Stratification was employed to explore the sex- and ethnicity-specific effects. We identified significant SV-BMD associations: 125 for FNK-BMD, 99 for SPN-BMD, and 83 for HIP-BMD. We observed SVs that were commonly associated with both FNK and HIP BMDs in our combined and stratified analyses. These SVs explain 13.3% to 19.1% of BMD variation. Novel bone-related genes emerged, including LINC02370, ZNF family genes, and ZDHHC family genes. Additionally, FMN2, carrying BMD-related deletions, showed associations with FNK or HIP BMDs, with sex-specific effects. The co-occurrence analysis prioritized an RNA gene LINC00494 and ZNF family genes positively associated with BMDs at different skeletal sites. Two potential causal genes, IBSP and SPP1, for osteoporosis were also identified. Our study uncovers new insights into genetic factors influencing BMD through SV analysis. We highlight BMD-related SVs, revealing a mix of shared and specific genetic influences across skeletal sites and gender or ethnicity. These findings suggest potential roles in osteoporosis pathophysiology, opening avenues for further research and therapeutic targets.
Collapse
Affiliation(s)
- Kuan-Jui Su
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Chuan Qiu
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Jonathan Greenbaum
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Xiao Zhang
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Anqi Liu
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Yong Liu
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Zhe Luo
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Shashank Sajjan Mungasavalli Gnanesh
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Qing Tian
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Lan-Juan Zhao
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Hui Shen
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| | - Hong-Wen Deng
- Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA 70112, United States
| |
Collapse
|
15
|
Mascher M, Jayakodi M, Shim H, Stein N. Promises and challenges of crop translational genomics. Nature 2024:10.1038/s41586-024-07713-5. [PMID: 39313530 DOI: 10.1038/s41586-024-07713-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/13/2024] [Indexed: 09/25/2024]
Abstract
Crop translational genomics applies breeding techniques based on genomic datasets to improve crops. Technological breakthroughs in the past ten years have made it possible to sequence the genomes of increasing numbers of crop varieties and have assisted in the genetic dissection of crop performance. However, translating research findings to breeding applications remains challenging. Here we review recent progress and future prospects for crop translational genomics in bringing results from the laboratory to the field. Genetic mapping, genomic selection and sequence-assisted characterization and deployment of plant genetic resources utilize rapid genotyping of large populations. These approaches have all had an impact on breeding for qualitative traits, where single genes with large phenotypic effects exert their influence. Characterization of the complex genetic architectures that underlie quantitative traits such as yield and flowering time, especially in newly domesticated crops, will require further basic research, including research into regulation and interactions of genes and the integration of genomic approaches and high-throughput phenotyping, before targeted interventions can be designed. Future priorities for translation include supporting genomics-assisted breeding in low-income countries and adaptation of crops to changing environments.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Hyeonah Shim
- Department of Agriculture, Forestry and Bioresources, Plant Genomics and Breeding Institute, Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
- Martin Luther University Halle-Wittenberg, Halle, Germany.
| |
Collapse
|
16
|
Vilkaite G, Vogel J, Mattsson-Carlgren N. Integrating amyloid and tau imaging with proteomics and genomics in Alzheimer's disease. Cell Rep Med 2024; 5:101735. [PMID: 39293391 DOI: 10.1016/j.xcrm.2024.101735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/28/2024] [Accepted: 08/20/2024] [Indexed: 09/20/2024]
Abstract
Alzheimer's disease (AD) is the most common neurodegenerative disease and is characterized by the aggregation of β-amyloid (Aβ) and tau in the brain. Breakthroughs in disease-modifying treatments targeting Aβ bring new hope for the management of AD. But to effectively modify and someday even prevent AD, a better understanding is needed of the biological mechanisms that underlie and link Aβ and tau in AD. Developments of high-throughput omics, including genomics, proteomics, and transcriptomics, together with molecular imaging of Aβ and tau with positron emission tomography (PET), allow us to discover and understand the biological pathways that regulate the aggregation and spread of Aβ and tau in living humans. The field of integrated omics and PET studies of Aβ and tau in AD is growing rapidly. We here provide an update of this field, both in terms of biological insights and in terms of future clinical implications of integrated omics-molecular imaging studies.
Collapse
Affiliation(s)
- Gabriele Vilkaite
- Department of Clinical Sciences Malmö, SciLifeLab, Lund University, Lund, Sweden
| | - Jacob Vogel
- Department of Clinical Sciences Malmö, SciLifeLab, Lund University, Lund, Sweden
| | - Niklas Mattsson-Carlgren
- Clinical Memory Research Unit, Department of Clinical Sciences Malmö, Lund University, Lund, Sweden; Department of Neurology, Skåne University Hospital, Lund University, Lund, Sweden; Wallenberg Center for Molecular Medicine, Lund University, Lund, Sweden.
| |
Collapse
|
17
|
Boyanton BL, Zarate YA, Broadfoot BG, Kelly T, Crawford BD. NR3C2 microdeletions-an underrecognized cause of pseudohypoaldosteronism type 1A: a case report and literature review. Lab Med 2024; 55:640-644. [PMID: 38493321 DOI: 10.1093/labmed/lmae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024] Open
Abstract
OBJECTIVES Pseudohypoaldosteronism type 1A (PHA1A) is caused by haploinsufficiency of the mineralocorticoid receptor (MR). Heterozygous small insertions/deletions, transitions, and/or transversions within NR3C2 comprise the majority (85%-90%) of pathogenic copy number variants. Structural chromosomal abnormalities, contiguous gene deletion syndromes, and microdeletions are infrequent. We describe a neonate with PHA1A due to a novel NR3C2 microdeletion involving exons 1-2. METHODS Literature review identified 39 individuals with PHA1A due to NR3C2 microdeletions. Transmission modality, variant description(s), testing method(s), exon(s) deleted, and affected functional domain(s) were characterized. RESULTS In total, 40 individuals with NR3C2 microdeletions were described: 19 involved contiguous exons encoding a single MR domain; 21 involved contiguous exons encoding multiple MR domains. Transmission modality frequency was familial (65%), de novo (20%), or unknown (15%). Sequencing (Sanger or short-read next-generation) failed to detect microdeletions in 100% of tested individuals (n = 38). All were detected using deletion/duplication testing modalities. In 2 individuals, only microarray-based testing was performed; microdeletions were detected in both cases. CONCLUSION Initial testing for PHA1A should rely on sequencing to detect the most common genetic alterations. Deletion/duplication analysis should be performed when initial testing is nondiagnostic. Most NR3C2 microdeletions are parentally transmitted, thus highlighting the importance of familial genetic testing and counseling.
Collapse
Affiliation(s)
- Bobby L Boyanton
- Department of Pathology, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, AR, US
| | - Yuri A Zarate
- Department of Pediatrics, Section of Genetics and Metabolism, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, AR, US
- Division of Genetics and Metabolism, University of Kentucky, Lexington, KY, US
| | - Brannon G Broadfoot
- Department of Pathology, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, AR, US
| | - Thomas Kelly
- Department of Pathology, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, AR, US
| | - Brendan D Crawford
- Department of Medicine, Division of Pediatric Nephrology, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, AR, US
| |
Collapse
|
18
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
19
|
Engelbrecht E, Rodriguez OL, Watson CT. Addressing Technical Pitfalls in Pursuit of Molecular Factors That Mediate Immunoglobulin Gene Regulation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 213:651-662. [PMID: 39007649 PMCID: PMC11333172 DOI: 10.4049/jimmunol.2400131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/16/2024]
Abstract
The expressed Ab repertoire is a critical determinant of immune-related phenotypes. Ab-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human Abs are composed of two identical H and L chain polypeptides derived from genes in IGH locus and one of two L chain loci. The combinatorial diversity that results from Ab gene rearrangement and the pairing of different H and L chains contributes to the immense diversity of the baseline Ab repertoire. During rearrangement, Ab gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the Ab repertoire associates with germline variation in IGH, implicating polymorphism in Ab gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. In this study, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype resolution. Using simulated and chromatin immunoprecipitation sequencing reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of the DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed Ab repertoires.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| |
Collapse
|
20
|
Tang X, Berger MF, Solit DB. Precision oncology: current and future platforms for treatment selection. Trends Cancer 2024; 10:781-791. [PMID: 39030146 DOI: 10.1016/j.trecan.2024.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/20/2024] [Accepted: 06/21/2024] [Indexed: 07/21/2024]
Abstract
Genomic profiling of hundreds of cancer-associated genes is now a component of routine cancer care. DNA sequencing can identify mutations, mutational signatures, and structural alterations predictive of therapy response and assess for heritable cancer risk, but it has been less useful for identifying predictive biomarkers of sensitivity to cytotoxic chemotherapies, antibody drug conjugates, and immunotherapies. The clinical adoption of molecular profiling platforms such as RNA sequencing better suited to identifying those patients most likely to respond to immunotherapies and drug combinations will be critical to expanding the benefits of precision oncology. This review discusses the potential advantages of innovative molecular and functional profiling platforms designed to replace or complement targeted DNA sequencing and the major hurdles to their clinical adoption.
Collapse
Affiliation(s)
- Xinran Tang
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Michael F Berger
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David B Solit
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
21
|
Tsai CY, Hsu JSJ, Chen PL, Wu CC. Implementing next-generation sequencing for diagnosis and management of hereditary hearing impairment: a comprehensive review. Expert Rev Mol Diagn 2024; 24:753-765. [PMID: 39194060 DOI: 10.1080/14737159.2024.2396866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/22/2024] [Indexed: 08/29/2024]
Abstract
INTRODUCTION Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS) offers high-throughput screening and high-sensitivity detection of genetic etiologies of SNHI, enabling clinicians to make informed medical decisions, provide tailored treatments, and improve prognostic outcomes. AREAS COVERED This review covers the diverse etiologies of HHI and the utility of different NGS modalities (targeted sequencing and whole exome/genome sequencing), and includes HHI-related studies on newborn screening, genetic counseling, prognostic prediction, and personalized treatment. Challenges such as the trade-off between cost and diagnostic yield, detection of structural variants, and exploration of the non-coding genome are also highlighted. EXPERT OPINION In the current landscape of NGS-based diagnostics for HHI, there are both challenges (e.g. detection of structural variants and non-coding genome variants) and opportunities (e.g. the emergence of medical artificial intelligence tools). The authors advocate the use of technological advances such as long-read sequencing for structural variant detection, multi-omics analysis for non-coding variant exploration, and medical artificial intelligence for pathogenicity assessment and outcome prediction. By integrating these innovations into clinical practice, precision medicine in the diagnosis and management of HHI can be further improved.
Collapse
Affiliation(s)
- Cheng-Yu Tsai
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
| | - Jacob Shu-Jui Hsu
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Chen-Chi Wu
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Research, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| |
Collapse
|
22
|
Ren J, Gao Z, Lu Y, Li M, Hong J, Wu J, Wu D, Deng W, Xi D, Chong Y. Application of GWAS and mGWAS in Livestock and Poultry Breeding. Animals (Basel) 2024; 14:2382. [PMID: 39199916 PMCID: PMC11350712 DOI: 10.3390/ani14162382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 08/04/2024] [Accepted: 08/15/2024] [Indexed: 09/01/2024] Open
Abstract
In recent years, genome-wide association studies (GWAS) and metabolome genome-wide association studies (mGWAS) have emerged as crucial methods for investigating complex traits in animals and plants. These have played pivotal roles in research on livestock and poultry breeding, facilitating a deeper understanding of genetic diversity, the relationship between genes, and genetic bases in livestock and poultry. This article provides a review of the applications of GWAS and mGWAS in animal genetic breeding, aiming to offer reference and inspiration for relevant researchers, promote innovation in animal genetic improvement and breeding methods, and contribute to the sustainable development of animal husbandry.
Collapse
Affiliation(s)
- Jing Ren
- Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region, Ministry of Education, Guizhou University, Guiyang 550025, China;
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Zhendong Gao
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Ying Lu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Mengfei Li
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Jieyun Hong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Jiao Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Dongwang Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Weidong Deng
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Dongmei Xi
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| | - Yuqing Chong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China; (Z.G.); : (M.L.); (J.H.); (J.W.); (D.W.); (W.D.)
| |
Collapse
|
23
|
Nikjoo H, Rahmanian S, Taleei R. Modelling DNA damage-repair and beyond. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 190:1-18. [PMID: 38754703 DOI: 10.1016/j.pbiomolbio.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/27/2024] [Accepted: 05/10/2024] [Indexed: 05/18/2024]
Abstract
The paper presents a review of mechanistic modelling studies of DNA damage and DNA repair, and consequences to follow in mammalian cell nucleus. We hypothesize DNA deletions are consequences of repair of double strand breaks leading to the modifications of genome that play crucial role in long term development of genetic inheritance and diseases. The aim of the paper is to review formation mechanisms underlying naturally occurring DNA deletions in the human genome and their potential relevance for bridging the gap between induced DNA double strand breaks and deletions in damaged human genome from endogenous and exogenous events. The model of the cell nucleus presented enables simulation of DNA damage at molecular level identifying the spectrum of damage induced in all chromosomal territories and loops. Our mechanistic modelling of DNA repair for double stand breaks (DSB), single strand breaks (SSB) and base damage (BD), shows the complexity of DNA damage is responsible for the longer repair times and the reason for the biphasic feature of mammalian cells repair curves. In the absence of experimentally determined data, the mechanistic model of repair predicts the in vivo rate constants for the proteins involved in the repair of DSB, SSB, and of BD.
Collapse
Affiliation(s)
- Hooshang Nikjoo
- Department of Physiology, Anatomy and Genetics (DPAG), Oxford University, Oxford, OX1 3PT, UK.
| | | | - Reza Taleei
- Medical Physics Division, Department of Radiation Oncology Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, 19107, USA.
| |
Collapse
|
24
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
25
|
Yuan N, Jia P. Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation. Brief Bioinform 2024; 25:bbae441. [PMID: 39256200 PMCID: PMC11387058 DOI: 10.1093/bib/bbae441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/09/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024] Open
Abstract
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
Collapse
Affiliation(s)
- Na Yuan
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Peilin Jia
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
26
|
Sund KL, Liu J, Lee J, Garbe J, Abdelhamed Z, Maag C, Hallinan B, Wu SW, Sperry E, Deshpande A, Stottmann R, Smolarek TA, Dyer LM, Hestand MS. Long-read sequencing and optical genome mapping identify causative gene disruptions in noncoding sequence in two patients with neurologic disease and known chromosome abnormalities. Am J Med Genet A 2024:e63818. [PMID: 39041659 DOI: 10.1002/ajmg.a.63818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/12/2024] [Accepted: 07/07/2024] [Indexed: 07/24/2024]
Abstract
Despite advances in next generation sequencing (NGS), genetic diagnoses remain elusive for many patients with neurologic syndromes. Long-read sequencing (LRS) and optical genome mapping (OGM) technologies improve upon existing capabilities in the detection and interpretation of structural variation in repetitive DNA, on a single haplotype, while also providing enhanced breakpoint resolution. We performed LRS and OGM on two patients with known chromosomal rearrangements and inconclusive Sanger or NGS. The first patient, who had epilepsy and developmental delay, had a complex translocation between two chromosomes that included insertion and inversion events. The second patient, who had a movement disorder, had an inversion on a single chromosome disrupted by multiple smaller inversions and insertions. Sequence level resolution of the rearrangements identified pathogenic breaks in noncoding sequence in or near known disease-causing genes with relevant neurologic phenotypes (MBD5, NKX2-1). These specific variants have not been reported previously, but expected molecular consequences are consistent with previously reported cases. As the use of LRS and OGM technologies for clinical testing increases and data analyses become more standardized, these methods along with multiomic data to validate noncoding variation effects will improve diagnostic yield and increase the proportion of probands with detectable pathogenic variants for known genes implicated in neurogenetic disease.
Collapse
Affiliation(s)
- Kristen L Sund
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Jie Liu
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, California, USA
| | - John Garbe
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Zakia Abdelhamed
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Chelsey Maag
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Barbara Hallinan
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Steven W Wu
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Ethan Sperry
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Archana Deshpande
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Rolf Stottmann
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Teresa A Smolarek
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Lisa M Dyer
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Matthew S Hestand
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| |
Collapse
|
27
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
28
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
29
|
Jia H, Tan S, Cai Y, Guo Y, Shen J, Zhang Y, Ma H, Zhang Q, Chen J, Qiao G, Ruan J, Zhang YE. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat Commun 2024; 15:5644. [PMID: 38969648 PMCID: PMC11226609 DOI: 10.1038/s41467-024-49992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 06/20/2024] [Indexed: 07/07/2024] Open
Abstract
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Collapse
Affiliation(s)
- Hangxing Jia
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Yingao Cai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanyan Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qingzhu Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Chen
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Gexia Qiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
30
|
Qiao G, Xu P, Guo T, He X, Yue Y, Yang B. Genome-wide detection of structural variation in some sheep breeds using whole-genome long-read sequencing data. J Anim Breed Genet 2024; 141:403-414. [PMID: 38247268 DOI: 10.1111/jbg.12846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/21/2023] [Accepted: 12/29/2023] [Indexed: 01/23/2024]
Abstract
Genomic structural variants (SVs) constitute a significant proportion of genetic variation in the genome. The rapid development of long-reads sequencing has facilitated the detection of long-fragment SVs. There is no published study to detect SVs using long-read data from sheep. We applied a long-read mapping approach to detect SVs and characterized a total of 30,771 insertions, deletions, inversions and translocations. We identified 716, 916, 842 and 303 specific SVs in Southdown sheep, Alpine merino sheep, Qilian White Tibetan sheep and Oula sheep, respectively. We annotated these SVs and found that these SV-related genes were primarily enriched in the well-established pathways involved in the regulation of the immune system, growth and development and environmental adaptability. We detected and annotated SVs based on NGS resequencing data to validate the accuracy based on third-generation detection. Moreover, five candidate SVs were verified using the PCR method in 50 sheep. Our study is the first to use a long-reads sequencing approach to construct a novel structural variation map in sheep. We have completed a preliminary exploration of the potential effects of SVs on sheep.
Collapse
Affiliation(s)
- Guoyan Qiao
- Lanzhou Institute of Husbandry and Pharmaceutical Sciences of Chinese Academy of Agricultural Sciences, Lanzhou, China
- College of Ecological Agriculture and Animal Husbandry, Qinghai Communications Technical College, Xining, China
| | - Pan Xu
- State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Tingting Guo
- Lanzhou Institute of Husbandry and Pharmaceutical Sciences of Chinese Academy of Agricultural Sciences, Lanzhou, China
| | - Xue He
- Lanzhou Institute of Husbandry and Pharmaceutical Sciences of Chinese Academy of Agricultural Sciences, Lanzhou, China
| | - Yaojing Yue
- Lanzhou Institute of Husbandry and Pharmaceutical Sciences of Chinese Academy of Agricultural Sciences, Lanzhou, China
| | - Bohui Yang
- Lanzhou Institute of Husbandry and Pharmaceutical Sciences of Chinese Academy of Agricultural Sciences, Lanzhou, China
| |
Collapse
|
31
|
Shim Y, Koo YK, Shin S, Lee ST, Lee KA, Choi JR. Comparison of Optical Genome Mapping With Conventional Diagnostic Methods for Structural Variant Detection in Hematologic Malignancies. Ann Lab Med 2024; 44:324-334. [PMID: 38433573 PMCID: PMC10961627 DOI: 10.3343/alm.2023.0339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 02/13/2024] [Indexed: 03/05/2024] Open
Abstract
Background Structural variants (SVs) are currently analyzed using a combination of conventional methods; however, this approach has limitations. Optical genome mapping (OGM), an emerging technology for detecting SVs using a single-molecule strategy, has the potential to replace conventional methods. We compared OGM with conventional diagnostic methods for detecting SVs in various hematologic malignancies. Methods Residual bone marrow aspirates from 27 patients with hematologic malignancies in whom SVs were observed using conventional methods (chromosomal banding analysis, FISH, an RNA fusion panel, and reverse transcription PCR) were analyzed using OGM. The concordance between the OGM and conventional method results was evaluated. Results OGM showed concordance in 63% (17/27) and partial concordance in 37% (10/27) of samples. OGM detected 76% (52/68) of the total SVs correctly (concordance rate for each type of SVs: aneuploidies, 83% [15/18]; balanced translocation, 80% [12/15] unbalanced translocation, 54% [7/13] deletions, 81% [13/16]; duplications, 100% [2/2] inversion 100% [1/1]; insertion, 100% [1/1]; marker chromosome, 0% [0/1]; isochromosome, 100% [1/1]). Sixteen discordant results were attributed to the involvement of centromeric/telomeric regions, detection sensitivity, and a low mapping rate and coverage. OGM identified additional SVs, including submicroscopic SVs and novel fusions, in five cases. Conclusions OGM shows a high level of concordance with conventional diagnostic methods for the detection of SVs and can identify novel variants, suggesting its potential utility in enabling more comprehensive SV analysis in routine diagnostics of hematologic malignancies, although further studies and improvements are required.
Collapse
Affiliation(s)
- Yeeun Shim
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Seoul, Korea
- MDxK (Molecular Diagnostics Korea), Inc., Gwacheon, Korea
| | - Yu-Kyung Koo
- Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Saeam Shin
- Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Seung-Tae Lee
- Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
- Dxome Co., Ltd., Seongnam, Korea
| | - Kyung-A Lee
- Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Jong Rak Choi
- Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
- Dxome Co., Ltd., Seongnam, Korea
| |
Collapse
|
32
|
Niu J, Wang W, Wang Z, Chen Z, Zhang X, Qin Z, Miao L, Yang Z, Xie C, Xin M, Peng H, Yao Y, Liu J, Ni Z, Sun Q, Guo W. Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing. Genome Biol 2024; 25:171. [PMID: 38951917 PMCID: PMC11218387 DOI: 10.1186/s13059-024-03315-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/18/2024] [Indexed: 07/03/2024] Open
Abstract
BACKGROUND The massive structural variations and frequent introgression highly contribute to the genetic diversity of wheat, while the huge and complex genome of polyploid wheat hinders efficient genotyping of abundant varieties towards accurate identification, management, and exploitation of germplasm resources. RESULTS We develop a novel workflow that identifies 1240 high-quality large copy number variation blocks (CNVb) in wheat at the pan-genome level, demonstrating that CNVb can serve as an ideal DNA fingerprinting marker for discriminating massive varieties, with the accuracy validated by PCR assay. We then construct a digitalized genotyping CNVb map across 1599 global wheat accessions. Key CNVb markers are linked with trait-associated introgressions, such as the 1RS·1BL translocation and 2NvS translocation, and the beneficial alleles, such as the end-use quality allele Glu-D1d (Dx5 + Dy10) and the semi-dwarf r-e-z allele. Furthermore, we demonstrate that these tagged CNVb markers promote a stable and cost-effective strategy for evaluating wheat germplasm resources with ultra-low-coverage sequencing data, competing with SNP array for applications such as evaluating new varieties, efficient management of collections in gene banks, and describing wheat germplasm resources in a digitalized manner. We also develop a user-friendly interactive platform, WheatCNVb ( http://wheat.cau.edu.cn/WheatCNVb/ ), for exploring the CNVb profiles over ever-increasing wheat accessions, and also propose a QR-code-like representation of individual digital CNVb fingerprint. This platform also allows uploading new CNVb profiles for comparison with stored varieties. CONCLUSIONS The CNVb-based approach provides a low-cost and high-throughput genotyping strategy for enabling digitalized wheat germplasm management and modern breeding with precise and practical decision-making.
Collapse
Affiliation(s)
- Jianxia Niu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Wenxi Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zihao Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Zhe Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Xiaoyu Zhang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhen Qin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Lingfeng Miao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhengzhao Yang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Chaojie Xie
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Mingming Xin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Huiru Peng
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Yingyin Yao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Jie Liu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Zhongfu Ni
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China
| | - Qixin Sun
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
33
|
Sun W, Xiong D, Ouyang J, Xiao X, Jiang Y, Wang Y, Li S, Xie Z, Wang J, Tang Z, Zhang Q. Altered chromatin topologies caused by balanced chromosomal translocation lead to central iris hypoplasia. Nat Commun 2024; 15:5048. [PMID: 38871723 DOI: 10.1038/s41467-024-49376-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Despite the advent of genomic sequencing, molecular diagnosis remains unsolved in approximately half of patients with Mendelian disorders, largely due to unclarified functions of noncoding regions and the difficulty in identifying complex structural variations. In this study, we map a unique form of central iris hypoplasia in a large family to 6q15-q23.3 and 18p11.31-q12.1 using a genome-wide linkage scan. Long-read sequencing reveals a balanced translocation t(6;18)(q22.31;p11.22) with intergenic breakpoints. By performing Hi-C on induced pluripotent stem cells from a patient, we identify two chromatin topologically associating domains spanning across the breakpoints. These alterations lead the ectopic chromatin interactions between APCDD1 on chromosome 18 and enhancers on chromosome 6, resulting in upregulation of APCDD1. Notably, APCDD1 is specifically localized in the iris of human eyes. Our findings demonstrate that noncoding structural variations can lead to Mendelian diseases by disrupting the 3D genome structure and resulting in altered gene expression.
Collapse
Affiliation(s)
- Wenmin Sun
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Dan Xiong
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Jiamin Ouyang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Xueshan Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Yi Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Yingwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Shiqiang Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Ziying Xie
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Junwen Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.
| | - Qingjiong Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510060, China.
| |
Collapse
|
34
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
35
|
Knief U, Müller IA, Stryjewski KF, Metzler D, Sorenson MD, Wolf JBW. Evolution of Chromosomal Inversions across an Avian Radiation. Mol Biol Evol 2024; 41:msae092. [PMID: 38743589 PMCID: PMC11152452 DOI: 10.1093/molbev/msae092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/05/2024] [Accepted: 05/03/2024] [Indexed: 05/16/2024] Open
Abstract
Chromosomal inversions are structural mutations that can play a prominent role in adaptation and speciation. Inversions segregating across species boundaries (trans-species inversions) are often taken as evidence for ancient balancing selection or adaptive introgression, but can also be due to incomplete lineage sorting. Using whole-genome resequencing data from 18 populations of 11 recognized munia species in the genus Lonchura (N = 176 individuals), we identify four large para- and pericentric inversions ranging in size from 4 to 20 Mb. All four inversions cosegregate across multiple species and predate the numerous speciation events associated with the rapid radiation of this clade across the prehistoric Sahul (Australia, New Guinea) and Bismarck Archipelago. Using coalescent theory, we infer that trans-specificity is improbable for neutrally segregating variation despite substantial incomplete lineage sorting characterizing this young radiation. Instead, the maintenance of all three autosomal inversions (chr1, chr5, and chr6) is best explained by selection acting along ecogeographic clines not observed for the collinear parts of the genome. In addition, the sex chromosome inversion largely aligns with species boundaries and shows signatures of repeated positive selection for both alleles. This study provides evidence for trans-species inversion polymorphisms involved in both adaptation and speciation. It further highlights the importance of informing selection inference using a null model of neutral evolution derived from the collinear part of the genome.
Collapse
Affiliation(s)
- Ulrich Knief
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, 82152 Planegg-Martinsried, Germany
- Evolutionary Biology & Ecology, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Ingo A Müller
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, 82152 Planegg-Martinsried, Germany
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, 11418 Stockholm, Sweden
- Division of Systematics and Evolution, Department of Zoology, Stockholm University, 11418 Stockholm, Sweden
| | | | - Dirk Metzler
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, 82152 Planegg-Martinsried, Germany
| | | | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
36
|
Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]
Abstract
BACKGROUND Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. OBJECTIVE This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. METHODS We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. RESULTS From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. CONCLUSIONS On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Collapse
|
37
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
38
|
Chen Q, Wu B, Li C, Ding L, Huang S, Wang J, Zhao J. Deciphering male influence in gynogenetic Pengze crucian carp ( Carassius auratus var. pengsenensis): insights from Nanopore sequencing of structural variations. Front Genet 2024; 15:1392110. [PMID: 38784042 PMCID: PMC11111978 DOI: 10.3389/fgene.2024.1392110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 04/11/2024] [Indexed: 05/25/2024] Open
Abstract
In this study, we investigate gynogenetic reproduction in Pengze Crucian Carp (Carassius auratus var. pengsenensis) using third-generation Nanopore sequencing to uncover structural variations (SVs) in offspring. Our objective was to understand the role of male genetic material in gynogenesis by examining the genomes of both parents and their offspring. We discovered a notable number of male-specific structural variations (MSSVs): 1,195 to 1,709 MSSVs in homologous offspring, accounting for approximately 0.52%-0.60% of their detected SVs, and 236 to 350 MSSVs in heterologous offspring, making up about 0.10%-0.13%. These results highlight the significant influence of male genetic material on the genetic composition of offspring, particularly in homologous pairs, challenging the traditional view of asexual reproduction. The gene annotation of MSSVs revealed their presence in critical gene regions, indicating potential functional impacts. Specifically, we found 5 MSSVs in the exonic regions of protein-coding genes in homologous offspring, suggesting possible direct effects on protein structure and function. Validation of an MSSV in the exonic region of the polyunsaturated fatty acid 5-lipoxygenase gene confirmed male genetic material transmission in some offspring. This study underscores the importance of further research on the genetic diversity and gynogenesis mechanisms, providing valuable insights for reproductive biology, aquaculture, and fostering innovation in biological research and aquaculture practices.
Collapse
Affiliation(s)
- Qianhui Chen
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Biyu Wu
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Chao Li
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Liyun Ding
- Jiangxi Fisheries Research Institute, Nanchang, China
| | - Shiting Huang
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Junjie Wang
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Jun Zhao
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| |
Collapse
|
39
|
Yang X, Zheng G, Jia P, Wang S, Ye K. Pindel-TD: A Tandem Duplication Detector Based on A Pattern Growth Approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae008. [PMID: 38862430 PMCID: PMC11425056 DOI: 10.1093/gpbjnl/qzae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/24/2023] [Accepted: 11/03/2023] [Indexed: 06/13/2024]
Abstract
Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.
Collapse
Affiliation(s)
- Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Gaoyang Zheng
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Peng Jia
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Songbo Wang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Kai Ye
- Center for Mathematical Medical, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Faculty of Science, Leiden University, Leiden 2311 EZ, Netherland
| |
Collapse
|
40
|
Elfman J, Goins L, Heller T, Singh S, Wang YH, Li H. Discovery of a polymorphic gene fusion via bottom-up chimeric RNA prediction. Nucleic Acids Res 2024; 52:4409-4421. [PMID: 38587197 PMCID: PMC11077074 DOI: 10.1093/nar/gkae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/27/2024] [Indexed: 04/09/2024] Open
Abstract
Gene fusions and their chimeric products are commonly linked with cancer. However, recent studies have found chimeric transcripts in non-cancer tissues and cell lines. Large-scale efforts to annotate structural variations have identified gene fusions capable of generating chimeric transcripts even in normal tissues. In this study, we present a bottom-up approach targeting population-specific chimeric RNAs, identifying 58 such instances in the GTEx cohort, including notable cases such as SUZ12P1-CRLF3, TFG-ADGRG7 and TRPM4-PPFIA3, which possess distinct patterns across different ancestry groups. We provide direct evidence for an additional 29 polymorphic chimeric RNAs with associated structural variants, revealing 13 novel rare structural variants. Additionally, we utilize the All of Us dataset and a large cohort of clinical samples to characterize the association of the SUZ12P1-CRLF3-causing variant with patient phenotypes. Our study showcases SUZ12P1-CRLF3 as a representative example, illustrating the identification of elusive structural variants by focusing on those producing population-specific fusion transcripts.
Collapse
Affiliation(s)
- Justin Elfman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Lynette Goins
- Department of Biological Sciences, Clemson University, Clemson, SC 29631, USA
| | - Tessa Heller
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Sandeep Singh
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Computational Toxicology Facility, CSIR-Indian Institute of Toxicology Research, Lucknow, 226001, Uttar Pradesh, India
| | - Yuh-Hwa Wang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Hui Li
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Pathology, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
41
|
Yang L, Yin H, Bai L, Yao W, Tao T, Zhao Q, Gao Y, Teng J, Xu Z, Lin Q, Diao S, Pan Z, Guan D, Li B, Zhou H, Zhou Z, Zhao F, Wang Q, Pan Y, Zhang Z, Li K, Fang L, Liu GE. Mapping and functional characterization of structural variation in 1060 pig genomes. Genome Biol 2024; 25:116. [PMID: 38715020 PMCID: PMC11075355 DOI: 10.1186/s13059-024-03253-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Structural variations (SVs) have significant impacts on complex phenotypes by rearranging large amounts of DNA sequence. RESULTS We present a comprehensive SV catalog based on the whole-genome sequence of 1060 pigs (Sus scrofa) representing 101 breeds, covering 9.6% of the pig genome. This catalog includes 42,487 deletions, 37,913 mobile element insertions, 3308 duplications, 1664 inversions, and 45,184 break ends. Estimates of breed ancestry and hybridization using genotyped SVs align well with those from single nucleotide polymorphisms. Geographically stratified deletions are observed, along with known duplications of the KIT gene, responsible for white coat color in European pigs. Additionally, we identify a recent SINE element insertion in MYO5A transcripts of European pigs, potentially influencing alternative splicing patterns and coat color alterations. Furthermore, a Yorkshire-specific copy number gain within ABCG2 is found, impacting chromatin interactions and gene expression across multiple tissues over a stretch of genomic region of ~200 kb. Preliminary investigations into SV's impact on gene expression and traits using the Pig Genotype-Tissue Expression (PigGTEx) data reveal SV associations with regulatory variants and gene-trait pairs. For instance, a 51-bp deletion is linked to the lead eQTL of the lipid metabolism regulating gene FADS3, whose expression in embryo may affect loin muscle area, as revealed by our transcriptome-wide association studies. CONCLUSIONS This SV catalog serves as a valuable resource for studying diversity, evolutionary history, and functional shaping of the pig genome by processes like domestication, trait-based breeding, and adaptive evolution.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Hongwei Yin
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lijing Bai
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Wenye Yao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Tan Tao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Qianyi Zhao
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhiting Xu
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shuqi Diao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhangyuan Pan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Dailu Guan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Bingjie Li
- Animal and Veterinary Sciences, Scotland's Rural College (SRUC), Roslin Institute Building, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - Huaijun Zhou
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Zhongyin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Fuping Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Kui Li
- Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
42
|
Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
Collapse
|
43
|
Du ZZ, He JB, Jiao WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 2024; 25:91. [PMID: 38589937 PMCID: PMC11003132 DOI: 10.1186/s13059-024-03239-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
Collapse
Affiliation(s)
- Ze-Zhen Du
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jia-Bao He
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wen-Biao Jiao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
| |
Collapse
|
44
|
Rivas-González I, Tung J. A multi-million-year natural experiment: Comparative genomics on a massive scale and its implications for human health. Evol Med Public Health 2024; 12:67-70. [PMID: 38601345 PMCID: PMC11005778 DOI: 10.1093/emph/eoae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Improving the diversity and quality of genome assemblies for non-human mammals has been a long-standing goal of comparative genomics. The last year saw substantial progress towards this goal, including the release of genome alignments for 240 mammals and nearly half the primate order. These resources have increased our ability to identify evolutionarily constrained regions of the genome, and together strongly support the importance of these regions to biomedically relevant trait variation in humans. They also provide new strategies for identifying the genetic basis of changes unique to individual lineages, illustrating the value of evolutionary comparative approaches for understanding human health.
Collapse
Affiliation(s)
- Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Department of Biology, Duke University, Durham, NC, USA
- Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
| |
Collapse
|
45
|
David G, Bertolotti A, Layer R, Scofield D, Hayward A, Baril T, Burnett HA, Gudmunds E, Jensen H, Husby A. Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations. Genome Biol Evol 2024; 16:evae049. [PMID: 38489588 PMCID: PMC11018544 DOI: 10.1093/gbe/evae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/28/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024] Open
Abstract
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
Collapse
Affiliation(s)
- Gabriel David
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | - Ryan Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Douglas Scofield
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Alexander Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Hamish A Burnett
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Gudmunds
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Arild Husby
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
46
|
Hujoel MLA, Handsaker RE, Sherman MA, Kamitaki N, Barton AR, Mukamel RE, Terao C, McCarroll SA, Loh PR. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat Genet 2024; 56:569-578. [PMID: 38548989 PMCID: PMC11018521 DOI: 10.1038/s41588-024-01684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 02/08/2024] [Indexed: 04/09/2024]
Abstract
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Maxwell A Sherman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Serinus Biosciences Inc., New York, NY, USA
| | - Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison R Barton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
47
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
48
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2024:10.1038/s41587-024-02190-7. [PMID: 38519720 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
49
|
Lesack KJ, Wasmuth JD. The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data. PeerJ 2024; 12:e17101. [PMID: 38500526 PMCID: PMC10946394 DOI: 10.7717/peerj.17101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 02/21/2024] [Indexed: 03/20/2024] Open
Abstract
Background Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization. Conclusion The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.
Collapse
Affiliation(s)
- Kyle J. Lesack
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
50
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|