1
|
Yang J, Wang DF, Huang JH, Zhu QH, Luo LY, Lu R, Xie XL, Salehian-Dehkordi H, Esmailizadeh A, Liu GE, Li MH. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol 2024; 25:148. [PMID: 38845023 PMCID: PMC11155191 DOI: 10.1186/s13059-024-03288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Sheep and goats have undergone domestication and improvement to produce similar phenotypes, which have been greatly impacted by structural variants (SVs). Here, we report a high-quality chromosome-level reference genome of Asiatic mouflon, and implement a comprehensive analysis of SVs in 897 genomes of worldwide wild and domestic populations of sheep and goats to reveal genetic signatures underlying convergent evolution. RESULTS We characterize the SV landscapes in terms of genetic diversity, chromosomal distribution and their links with genes, QTLs and transposable elements, and examine their impacts on regulatory elements. We identify several novel SVs and annotate corresponding genes (e.g., BMPR1B, BMPR2, RALYL, COL21A1, and LRP1B) associated with important production traits such as fertility, meat and milk production, and wool/hair fineness. We detect signatures of selection involving the parallel evolution of orthologous SV-associated genes during domestication, local environmental adaptation, and improvement. In particular, we find that fecundity traits experienced convergent selection targeting the gene BMPR1B, with the DEL00067921 deletion explaining ~10.4% of the phenotypic variation observed in goats. CONCLUSIONS Our results provide new insights into the convergent evolution of SVs and serve as a rich resource for the future improvement of sheep, goats, and related livestock.
Collapse
Affiliation(s)
- Ji Yang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Feng Wang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Jia-Hui Huang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Qiang-Hui Zhu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ling-Yun Luo
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ran Lu
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Meng-Hua Li
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
2
|
Parmar JM, Laing NG, Kennerson ML, Ravenscroft G. Genetics of inherited peripheral neuropathies and the next frontier: looking backwards to progress forwards. J Neurol Neurosurg Psychiatry 2024:jnnp-2024-333436. [PMID: 38744462 DOI: 10.1136/jnnp-2024-333436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Inherited peripheral neuropathies (IPNs) encompass a clinically and genetically heterogeneous group of disorders causing length-dependent degeneration of peripheral autonomic, motor and/or sensory nerves. Despite gold-standard diagnostic testing for pathogenic variants in over 100 known associated genes, many patients with IPN remain genetically unsolved. Providing patients with a diagnosis is critical for reducing their 'diagnostic odyssey', improving clinical care, and for informed genetic counselling. The last decade of massively parallel sequencing technologies has seen a rapid increase in the number of newly described IPN-associated gene variants contributing to IPN pathogenesis. However, the scarcity of additional families and functional data supporting variants in potential novel genes is prolonging patient diagnostic uncertainty and contributing to the missing heritability of IPNs. We review the last decade of IPN disease gene discovery to highlight novel genes, structural variation and short tandem repeat expansions contributing to IPN pathogenesis. From the lessons learnt, we provide our vision for IPN research as we anticipate the future, providing examples of emerging technologies, resources and tools that we propose that will expedite the genetic diagnosis of unsolved IPN families.
Collapse
Affiliation(s)
- Jevin M Parmar
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| | - Nigel G Laing
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
- Preventive Genetics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
| | - Marina L Kennerson
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, New South Wales, Australia
- Molecular Medicine Laboratory, Concord Hospital, Concord, New South Wales, Australia
| | - Gianina Ravenscroft
- Rare Disease Genetics and Functional Genomics, Harry Perkins Institute of Medical Research, Perth, Western Australia, Australia
- Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
3
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
4
|
Du ZZ, He JB, Jiao WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 2024; 25:91. [PMID: 38589937 PMCID: PMC11003132 DOI: 10.1186/s13059-024-03239-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
Collapse
Affiliation(s)
- Ze-Zhen Du
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jia-Bao He
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wen-Biao Jiao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
| |
Collapse
|
5
|
David G, Bertolotti A, Layer R, Scofield D, Hayward A, Baril T, Burnett HA, Gudmunds E, Jensen H, Husby A. Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations. Genome Biol Evol 2024; 16:evae049. [PMID: 38489588 PMCID: PMC11018544 DOI: 10.1093/gbe/evae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/28/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024] Open
Abstract
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
Collapse
Affiliation(s)
- Gabriel David
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | - Ryan Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Douglas Scofield
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Alexander Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Hamish A Burnett
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Gudmunds
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Arild Husby
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
6
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
7
|
Joe S, Park JL, Kim J, Kim S, Park JH, Yeo MK, Lee D, Yang JO, Kim SY. Comparison of structural variant callers for massive whole-genome sequence data. BMC Genomics 2024; 25:318. [PMID: 38549092 PMCID: PMC10976732 DOI: 10.1186/s12864-024-10239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.
Collapse
Grants
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
Collapse
Affiliation(s)
- Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jong-Lyul Park
- Aging Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Functional Genomics, University of Science and Technology (UST), 34113, Daejeon, Republic of Korea
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Sangok Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea
| | - Min-Kyung Yeo
- Department of Pathology, Chungnam National University School of Medicine, Daejeon, 35015, Republic of Korea
| | - Dongyoon Lee
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| | - Seon-Young Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| |
Collapse
|
8
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
9
|
Olivucci G, Iovino E, Innella G, Turchetti D, Pippucci T, Magini P. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front Genet 2024; 15:1374860. [PMID: 38510277 PMCID: PMC10951082 DOI: 10.3389/fgene.2024.1374860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The clinical application of technological progress in the identification of DNA alterations has always led to improvements of diagnostic yields in genetic medicine. At chromosome side, from cytogenetic techniques evaluating number and gross structural defects to genomic microarrays detecting cryptic copy number variants, and at molecular level, from Sanger method studying the nucleotide sequence of single genes to the high-throughput next-generation sequencing (NGS) technologies, resolution and sensitivity progressively increased expanding considerably the range of detectable DNA anomalies and alongside of Mendelian disorders with known genetic causes. However, particular genomic regions (i.e., repetitive and GC-rich sequences) are inefficiently analyzed by standard genetic tests, still relying on laborious, time-consuming and low-sensitive approaches (i.e., southern-blot for repeat expansion or long-PCR for genes with highly homologous pseudogenes), accounting for at least part of the patients with undiagnosed genetic disorders. Third generation sequencing, generating long reads with improved mappability, is more suitable for the detection of structural alterations and defects in hardly accessible genomic regions. Although recently implemented and not yet clinically available, long read sequencing (LRS) technologies have already shown their potential in genetic medicine research that might greatly impact on diagnostic yield and reporting times, through their translation to clinical settings. The main investigated LRS application concerns the identification of structural variants and repeat expansions, probably because techniques for their detection have not evolved as rapidly as those dedicated to single nucleotide variants (SNV) identification: gold standard analyses are karyotyping and microarrays for balanced and unbalanced chromosome rearrangements, respectively, and southern blot and repeat-primed PCR for the amplification and sizing of expanded alleles, impaired by limited resolution and sensitivity that have not been significantly improved by the advent of NGS. Nevertheless, more recently, with the increased accuracy provided by the latest product releases, LRS has been tested also for SNV detection, especially in genes with highly homologous pseudogenes and for haplotype reconstruction to assess the parental origin of alleles with de novo pathogenic variants. We provide a review of relevant recent scientific papers exploring LRS potential in the diagnosis of genetic diseases and its potential future applications in routine genetic testing.
Collapse
Affiliation(s)
- Giulia Olivucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
- Department of Surgical and Oncological Sciences, University of Palermo, Palermo, Italy
| | - Emanuela Iovino
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Giovanni Innella
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Daniela Turchetti
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Tommaso Pippucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Pamela Magini
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| |
Collapse
|
10
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
11
|
Klaper K, Tlapák H, Selb R, Jansen K, Heuer D. Integrated molecular, phenotypic and epidemiological surveillance of antimicrobial resistance in Neisseria gonorrhoeae in Germany. Int J Med Microbiol 2024; 314:151611. [PMID: 38309143 DOI: 10.1016/j.ijmm.2024.151611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/17/2024] [Accepted: 01/22/2024] [Indexed: 02/05/2024] Open
Abstract
Numbers of infections with Neisseria gonorrhoeae are among the top three sexually transmitted infections (STI) worldwide. In addition, the emergence and spread of antimicrobial resistance (AMR) in Neisseria gonorrhoeae pose an important public-health issue. The integration of genomic, phenotypic and epidemiological data to monitor Neisseria gonorrhoeae fosters our understanding of the emergence and spread of AMR in Neisseria gonorrhoeae and helps to inform therapy guidelines and intervention strategies. Thus, the Gonococcal resistance surveillance (Go-Surv-AMR) was implemented at the Robert Koch Institute in Germany in 2021 to obtain molecular, phenotypic and epidemiological data on Neisseria gonorrhoeae isolated in Germany. Here, we describe the structure and aims of Go-Surv-AMR. Furthermore, we point out future directions of Go-Surv-AMR to improve the integrated genomic surveillance of Neisseria gonorrhoeae. In this context we discuss current and prospective sequencing approaches and the information derived from their application. Moreover, we highlight the importance of combining phenotypic and WGS data to monitor the evolution of AMR in Neisseria gonorrhoeae in Germany. The implementation and constant development of techniques and tools to improve the genomic surveillance of Neisseria gonorrhoeae will be important in coming years.
Collapse
Affiliation(s)
- Kathleen Klaper
- Department Infectious Diseases, Unit 18 `Sexually transmitted bacterial pathogens and HIV´, Robert Koch Institute, Berlin, Germany
| | - Hana Tlapák
- Department Infectious Diseases, Unit 18 `Sexually transmitted bacterial pathogens and HIV´, Robert Koch Institute, Berlin, Germany
| | - Regina Selb
- Department of Infectious Disease Epidemiology, Unit 34 `'HIV/AIDS, STI and Blood-borne Infections´, Robert Koch Institute, Berlin, Germany
| | - Klaus Jansen
- Department of Infectious Disease Epidemiology, Unit 34 `'HIV/AIDS, STI and Blood-borne Infections´, Robert Koch Institute, Berlin, Germany
| | - Dagmar Heuer
- Department Infectious Diseases, Unit 18 `Sexually transmitted bacterial pathogens and HIV´, Robert Koch Institute, Berlin, Germany.
| |
Collapse
|
12
|
Torres DE, Kramer HM, Tracanna V, Fiorin GL, Cook DE, Seidl MF, Thomma BPHJ. Implications of the three-dimensional chromatin organization for genome evolution in a fungal plant pathogen. Nat Commun 2024; 15:1701. [PMID: 38402218 PMCID: PMC10894299 DOI: 10.1038/s41467-024-45884-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 02/05/2024] [Indexed: 02/26/2024] Open
Abstract
The spatial organization of eukaryotic genomes is linked to their biological functions, although it is not clear how this impacts the overall evolution of a genome. Here, we uncover the three-dimensional (3D) genome organization of the phytopathogen Verticillium dahliae, known to possess distinct genomic regions, designated adaptive genomic regions (AGRs), enriched in transposable elements and genes that mediate host infection. Short-range DNA interactions form clear topologically associating domains (TADs) with gene-rich boundaries that show reduced levels of gene expression and reduced genomic variation. Intriguingly, TADs are less clearly insulated in AGRs than in the core genome. At a global scale, the genome contains bipartite long-range interactions, particularly enriched for AGRs and more generally containing segmental duplications. Notably, the patterns observed for V. dahliae are also present in other Verticillium species. Thus, our analysis links 3D genome organization to evolutionary features conserved throughout the Verticillium genus.
Collapse
Affiliation(s)
- David E Torres
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
- Theoretical Biology & Bioinformatics Group, Department of Biology, Utrecht University, Utrecht, The Netherlands
| | - H Martin Kramer
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Vittorio Tracanna
- University of Cologne, Institute for Plant Sciences, Cluster of Excellence on Plant Sciences (CEPLAS), Cologne, Germany
| | - Gabriel L Fiorin
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - David E Cook
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
- Department of Plant Pathology, Kansas State University, 1712 Claflin Road, Manhattan, KS, USA
| | - Michael F Seidl
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.
- Theoretical Biology & Bioinformatics Group, Department of Biology, Utrecht University, Utrecht, The Netherlands.
| | - Bart P H J Thomma
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.
- University of Cologne, Institute for Plant Sciences, Cluster of Excellence on Plant Sciences (CEPLAS), Cologne, Germany.
| |
Collapse
|
13
|
Singh A, Ramakrishna G, Singh NK, Abdin MZ, Gaikwad K. Genomic insight into variations associated with flowering-time and early-maturity in pigeonpea mutant TAT-10 and its wild type parent T21. Int J Biol Macromol 2024; 257:128559. [PMID: 38061506 DOI: 10.1016/j.ijbiomac.2023.128559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023]
Abstract
Pigeonpea [Cajanus cajan (L.) Millspaugh] is an important grain legume crop with a broad range of 90 to 300 days for maturity. To identify the genomic variations associated with the early maturity, we conducted whole-genome resequencing of an early-maturing pigeonpea mutant TAT-10 and its wild type parent T21. A total of 135.67 and 146.34 million sequencing reads were generated for T21 and TAT-10, respectively. From this resequencing data, 1,397,178 and 1,419,904 SNPs, 276,741 and 292,347 InDels, and 87,583 and 92,903 SVs were identified in T21 and TAT-10, respectively. We identified 203 genes in the pigeonpea genome that are homologs of flowering-related genes in Arabidopsis and found 791 genomic variations unique to TAT-10 linked to 94 flowering-related genes. We identified three candidate genes for early maturity in TAT-10; Suppressor of FRI 4 (SUF4), Early Flowering In Short Days (EFS), and Probable Lysine-Specific Demethylase ELF6. The variations in ELF6 were predicted to be possibly damaging and the expression profiles of EFS and ELF6 also supported their probable role during early flowering in TAT-10. The present study has generated information on genomic variations associated with candidate genes for early maturity, which can be further studied and exploited for developing the early-maturing pigeonpea cultivars.
Collapse
Affiliation(s)
- Anupam Singh
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; Centre for Transgenic Plant Development, Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi 110062, India
| | | | | | - Malik Zainul Abdin
- Centre for Transgenic Plant Development, Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi 110062, India.
| | - Kishor Gaikwad
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India.
| |
Collapse
|
14
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
15
|
Puckelwartz MJ, Pesce LL, Hernandez EJ, Webster G, Dellefave-Castillo LM, Russell MW, Geisler SS, Kearns SD, Karthik F, Etheridge SP, Monroe TO, Pottinger TD, Kannankeril PJ, Shoemaker MB, Fountain D, Roden DM, Faulkner M, MacLeod HM, Burns KM, Yandell M, Tristani-Firouzi M, George AL, McNally EM. The impact of damaging epilepsy and cardiac genetic variant burden in sudden death in the young. Genome Med 2024; 16:13. [PMID: 38229148 PMCID: PMC10792876 DOI: 10.1186/s13073-024-01284-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 01/03/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Sudden unexpected death in children is a tragic event. Understanding the genetics of sudden death in the young (SDY) enables family counseling and cascade screening. The objective of this study was to characterize genetic variation in an SDY cohort using whole genome sequencing. METHODS The SDY Case Registry is a National Institutes of Health/Centers for Disease Control and Prevention surveillance effort to discern the prevalence, causes, and risk factors for SDY. The SDY Case Registry prospectively collected clinical data and DNA biospecimens from SDY cases < 20 years of age. SDY cases were collected from medical examiner and coroner offices spanning 13 US jurisdictions from 2015 to 2019. The cohort included 211 children (median age 0.33 year; range 0-20 years), determined to have died suddenly and unexpectedly and from whom DNA biospecimens for DNA extractions and next-of-kin consent were ascertained. A control cohort consisted of 211 randomly sampled, sex- and ancestry-matched individuals from the 1000 Genomes Project. Genetic variation was evaluated in epilepsy, cardiomyopathy, and arrhythmia genes in the SDY and control cohorts. American College of Medical Genetics/Genomics guidelines were used to classify variants as pathogenic or likely pathogenic. Additionally, pathogenic and likely pathogenic genetic variation was identified using a Bayesian-based artificial intelligence (AI) tool. RESULTS The SDY cohort was 43% European, 29% African, 3% Asian, 16% Hispanic, and 9% with mixed ancestries and 39% female. Six percent of the cohort was found to harbor a pathogenic or likely pathogenic genetic variant in an epilepsy, cardiomyopathy, or arrhythmia gene. The genomes of SDY cases, but not controls, were enriched for rare, potentially damaging variants in epilepsy, cardiomyopathy, and arrhythmia-related genes. A greater number of rare epilepsy genetic variants correlated with younger age at death. CONCLUSIONS While damaging cardiomyopathy and arrhythmia genes are recognized contributors to SDY, we also observed an enrichment in epilepsy-related genes in the SDY cohort and a correlation between rare epilepsy variation and younger age at death. These findings emphasize the importance of considering epilepsy genes when evaluating SDY.
Collapse
Affiliation(s)
- Megan J Puckelwartz
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Lorenzo L Pesce
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Gregory Webster
- Division of Cardiology, Department of Pediatrics, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | | | - Mark W Russell
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Sarah S Geisler
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Samuel D Kearns
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Felix Karthik
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Susan P Etheridge
- Division of Pediatric Cardiology, University of Utah, Salt Lake City, UT, USA
| | - Tanner O Monroe
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Tess D Pottinger
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Prince J Kannankeril
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M Benjamin Shoemaker
- Department of Medicine, Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Darlene Fountain
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan M Roden
- Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | | | - Kristin M Burns
- Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Alfred L George
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Elizabeth M McNally
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
16
|
Haq SAU, Bashir T, Roberts TH, Husaini AM. Ameliorating the effects of multiple stresses on agronomic traits in crops: modern biotechnological and omics approaches. Mol Biol Rep 2023; 51:41. [PMID: 38158512 DOI: 10.1007/s11033-023-09042-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 10/13/2023] [Indexed: 01/03/2024]
Abstract
While global climate change poses a significant environmental threat to agriculture, the increasing population is another big challenge to food security. To address this, developing crop varieties with increased productivity and tolerance to biotic and abiotic stresses is crucial. Breeders must identify traits to ensure higher and consistent yields under inconsistent environmental challenges, possess resilience against emerging biotic and abiotic stresses and satisfy customer demands for safer and more nutritious meals. With the advent of omics-based technologies, molecular tools are now integrated with breeding to understand the molecular genetics of genotype-based traits and develop better climate-smart crops. The rapid development of omics technologies offers an opportunity to generate novel datasets for crop species. Identifying genes and pathways responsible for significant agronomic traits has been made possible by integrating omics data with genetic and phenotypic information. This paper discusses the importance and use of omics-based strategies, including genomics, transcriptomics, proteomics and phenomics, for agricultural and horticultural crop improvement, which aligns with developing better adaptability in these crop species to the changing climate conditions.
Collapse
Affiliation(s)
- Syed Anam Ul Haq
- Genome Engineering and Societal Biotechnology Lab, Division of Plant Biotechnology, SKUAST-K, Shalimar, Srinagar, Jammu and Kashmir, 190025, India
| | - Tanzeel Bashir
- Genome Engineering and Societal Biotechnology Lab, Division of Plant Biotechnology, SKUAST-K, Shalimar, Srinagar, Jammu and Kashmir, 190025, India
| | - Thomas H Roberts
- Plant Breeding Institute, School of Life and Environmental Sciences, Faculty of Science, Sydney Institute of Agriculture, The University of Sydney, Eveleigh, Australia
| | - Amjad M Husaini
- Genome Engineering and Societal Biotechnology Lab, Division of Plant Biotechnology, SKUAST-K, Shalimar, Srinagar, Jammu and Kashmir, 190025, India.
| |
Collapse
|
17
|
Shah RK, Cygan E, Kozlik T, Colina A, Zamora AE. Utilizing immunogenomic approaches to prioritize targetable neoantigens for personalized cancer immunotherapy. Front Immunol 2023; 14:1301100. [PMID: 38149253 PMCID: PMC10749952 DOI: 10.3389/fimmu.2023.1301100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 11/29/2023] [Indexed: 12/28/2023] Open
Abstract
Advancements in sequencing technologies and bioinformatics algorithms have expanded our ability to identify tumor-specific somatic mutation-derived antigens (neoantigens). While recent studies have shown neoantigens to be compelling targets for cancer immunotherapy due to their foreign nature and high immunogenicity, the need for increasingly accurate and cost-effective approaches to rapidly identify neoantigens remains a challenging task, but essential for successful cancer immunotherapy. Currently, gene expression analysis and algorithms for variant calling can be used to generate lists of mutational profiles across patients, but more care is needed to curate these lists and prioritize the candidate neoantigens most capable of inducing an immune response. A growing amount of evidence suggests that only a handful of somatic mutations predicted by mutational profiling approaches act as immunogenic neoantigens. Hence, unbiased screening of all candidate neoantigens predicted by Whole Genome Sequencing/Whole Exome Sequencing may be necessary to more comprehensively access the full spectrum of immunogenic neoepitopes. Once putative cancer neoantigens are identified, one of the largest bottlenecks in translating these neoantigens into actionable targets for cell-based therapies is identifying the cognate T cell receptors (TCRs) capable of recognizing these neoantigens. While many TCR-directed screening and validation assays have utilized bulk samples in the past, there has been a recent surge in the number of single-cell assays that provide a more granular understanding of the factors governing TCR-pMHC interactions. The goal of this review is to provide an overview of existing strategies to identify candidate neoantigens using genomics-based approaches and methods for assessing neoantigen immunogenicity. Additionally, applications, prospects, and limitations of some of the current single-cell technologies will be discussed. Finally, we will briefly summarize some of the recent models that have been used to predict TCR antigen specificity and analyze the TCR receptor repertoire.
Collapse
Affiliation(s)
- Ravi K. Shah
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Erin Cygan
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Tanya Kozlik
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Alfredo Colina
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Anthony E. Zamora
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
18
|
Meng X, Wang M, Luo M, Sun L, Yan Q, Liu Y. Systematic evaluation of multiple NGS platforms for structural variants detection. J Biol Chem 2023; 299:105436. [PMID: 37944616 PMCID: PMC10724692 DOI: 10.1016/j.jbc.2023.105436] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/29/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
Structural variations (SV) are critical genome changes affecting human diseases. Although many hybridization-based methods exist, evaluating SVs through next-generation sequencing (NGS) data is still necessary for broader research exploration. Here, we comprehensively compared the performance of 16 SV callers and multiple NGS platforms using NA12878 whole genome sequencing (WGS) datasets. The results indicated that several SV callers performed well relatively, such as Manta, GRIDSS, LUMPY, TARDIS, FermiKit, and Wham. Meanwhile, all NGS platforms have a similar performance using a single software. Additionally, we found that the source of undetected SVs was mostly from long reads datasets, therefore, the more appropriate strategy for accurate SV detection will be an integration of long and shorter reads in the future. At present, in the period of NGS as a mainstream method in bioinformatics, our study would provide helpful and comprehensive guidelines for specific categories of SV research.
Collapse
Affiliation(s)
- Xuan Meng
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Miao Wang
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Mingjie Luo
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Yongfeng Liu
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China.
| |
Collapse
|
19
|
Sopic M, Vilne B, Gerdts E, Trindade F, Uchida S, Khatib S, Wettinger SB, Devaux Y, Magni P. Multiomics tools for improved atherosclerotic cardiovascular disease management. Trends Mol Med 2023; 29:983-995. [PMID: 37806854 DOI: 10.1016/j.molmed.2023.09.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/10/2023]
Abstract
Multiomics studies offer accurate preventive and therapeutic strategies for atherosclerotic cardiovascular disease (ASCVD) beyond traditional risk factors. By using artificial intelligence (AI) and machine learning (ML) approaches, it is possible to integrate multiple 'omics and clinical data sets into tools that can be utilized for the development of personalized diagnostic and therapeutic approaches. However, currently multiple challenges in data quality, integration, and privacy still need to be addressed. In this opinion, we emphasize that joined efforts, exemplified by the AtheroNET COST Action, have a pivotal role in overcoming the challenges to advance multiomics approaches in ASCVD research, with the aim to foster more precise and effective patient care.
Collapse
Affiliation(s)
- Miron Sopic
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg; Department of Medical Biochemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, 11000, Serbia
| | - Baiba Vilne
- Bioinformatics Laboratory, Rīga Stradiņš University, Rīga, LV-1007, Latvia
| | - Eva Gerdts
- Center for Research on Cardiac Disease in Women, Department of Clinical Science, University of Bergen, Bergen, 5020, Norway
| | - Fábio Trindade
- Cardiovascular R&D Centre - UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto, 4099-002, Portugal
| | - Shizuka Uchida
- Center for RNA Medicine, Department of Clinical Medicine, Aalborg University, Copenhagen, SV, DK-2450, Denmark
| | - Soliman Khatib
- Natural Compounds and Analytical Chemistry Laboratory, MIGAL-Galilee Research Institute, Kiryat Shemona, 11016, Israel; Department of Biotechnology, Tel-Hai College, Upper Galilee 12210, Israel
| | - Stephanie Bezzina Wettinger
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, 2080, Malta
| | - Yvan Devaux
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg.
| | - Paolo Magni
- Department of Pharmacological and Biomolecular Sciences 'Rodolfo Paoletti', Università degli Studi di Milano, Via G. Balzaretti 9, 20133 Milano, Italy; IRCCS MultiMedica, Via Milanese 300, 20099 Sesto S. Giovanni, Milan, Italy.
| |
Collapse
|
20
|
Lemay MA, de Ronne M, Bélanger R, Belzile F. k-mer-based GWAS enhances the discovery of causal variants and candidate genes in soybean. THE PLANT GENOME 2023; 16:e20374. [PMID: 37596724 DOI: 10.1002/tpg2.20374] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/19/2023] [Indexed: 08/20/2023]
Abstract
Genome-wide association studies (GWAS) are powerful statistical methods that detect associations between genotype and phenotype at genome scale. Despite their power, GWAS frequently fail to pinpoint the causal variant or the gene controlling a given trait in crop species. Assessing genetic variants other than single-nucleotide polymorphisms (SNPs) could alleviate this problem. In this study, we tested the potential of structural variant (SV)- and k-mer-based GWAS in soybean by applying these methods as well as conventional SNP/indel-based GWAS to 13 traits. We assessed the performance of each GWAS approach based on loci for which the causal genes or variants were known from previous genetic studies. We found that k-mer-based GWAS was the most versatile approach and the best at pinpointing causal variants or candidate genes. Moreover, k-mer-based analyses identified promising candidate genes for loci related to pod color, pubescence form, and resistance to Phytophthora sojae. In our dataset, SV-based GWAS did not add value compared to k-mer-based GWAS and may not be worth the time and computational resources invested. Despite promising results, significant challenges remain regarding the downstream analysis of k-mer-based GWAS. Notably, better methods are needed to associate significant k-mers with sequence variation. Our results suggest that coupling k-mer- and SNP/indel-based GWAS is a powerful approach for discovering candidate genes in crop species.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - Maxime de Ronne
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - Richard Bélanger
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - François Belzile
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| |
Collapse
|
21
|
Li L, Hong C, Xu J, Chung CYL, Leung AKY, Boncan DAT, Cheng L, Lo KW, Lai PBS, Wong J, Zhou J, Cheng ASL, Chan TF, Yue F, Yip KY. Accurate identification of structural variations from cancer samples. Brief Bioinform 2023; 25:bbad520. [PMID: 38233091 PMCID: PMC10794023 DOI: 10.1093/bib/bbad520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/11/2023] [Accepted: 12/18/2023] [Indexed: 01/19/2024] Open
Abstract
Structural variations (SVs) are commonly found in cancer genomes. They can cause gene amplification, deletion and fusion, among other functional consequences. With an average read length of hundreds of kilobases, nano-channel-based optical DNA mapping is powerful in detecting large SVs. However, existing SV calling methods are not tailored for cancer samples, which have special properties such as mixed cell types and sub-clones. Here we propose the Cancer Optical Mapping for detecting Structural Variations (COMSV) method that is specifically designed for cancer samples. It shows high sensitivity and specificity in benchmark comparisons. Applying to cancer cell lines and patient samples, COMSV identifies hundreds of novel SVs per sample.
Collapse
Affiliation(s)
- Le Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Chenyang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Jie Xu
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60208, USA
| | - Claire Yik-Lok Chung
- School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Alden King-Yung Leung
- School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Delbert Almerick T Boncan
- School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Lixin Cheng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwok-Wai Lo
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Paul B S Lai
- Department of Surgery, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - John Wong
- Department of Surgery, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Jingying Zhou
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Alfred Sze-Lok Cheng
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Ting-Fung Chan
- School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60208, USA
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California 92037, USA
| |
Collapse
|
22
|
Klever MK, Sträng E, Hetzel S, Jungnitsch J, Dolnik A, Schöpflin R, Schrezenmeier JF, Schick F, Blau O, Westermann J, Rücker FG, Xia Z, Döhner K, Schrezenmeier H, Spielmann M, Meissner A, Melo US, Mundlos S, Bullinger L. AML with complex karyotype: extreme genomic complexity revealed by combined long-read sequencing and Hi-C technology. Blood Adv 2023; 7:6520-6531. [PMID: 37582288 PMCID: PMC10632680 DOI: 10.1182/bloodadvances.2023010887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 07/17/2023] [Accepted: 07/30/2023] [Indexed: 08/17/2023] Open
Abstract
Acute myeloid leukemia with complex karyotype (CK-AML) is associated with poor prognosis, which is only in part explained by underlying TP53 mutations. Especially in the presence of complex chromosomal rearrangements, such as chromothripsis, the outcome of CK-AML is dismal. However, this degree of complexity of genomic rearrangements contributes to the leukemogenic phenotype and treatment resistance of CK-AML remains largely unknown. Applying an integrative workflow for the detection of structural variants (SVs) based on Oxford Nanopore (ONT) genomic DNA long-read sequencing (gDNA-LRS) and high-throughput chromosome confirmation capture (Hi-C) in a well-defined cohort of CK-AML identified regions with an extreme density of SVs. These rearrangements consisted to a large degree of focal amplifications enriched in the proximity of mammalian-wide interspersed repeat elements, which often result in oncogenic fusion transcripts, such as USP7::MVD, or the deregulation of oncogenic driver genes as confirmed by RNA-seq and ONT direct complementary DNA sequencing. We termed this novel phenomenon chromocataclysm. Thus, our integrative SV detection workflow combing gDNA-LRS and Hi-C enables to unravel complex genomic rearrangements at a very high resolution in regions hard to analyze by conventional sequencing technology, thereby providing an important tool to identify novel important drivers underlying cancer with complex karyotypic changes.
Collapse
Affiliation(s)
- Marius-Konstantin Klever
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- RG Development and Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical Genetics and Human Genetics, Charité University Medicine Berlin, Berlin, Germany
| | - Eric Sträng
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Sara Hetzel
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Julius Jungnitsch
- Institute for Medical Genetics and Human Genetics, Charité University Medicine Berlin, Berlin, Germany
- Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Anna Dolnik
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Robert Schöpflin
- RG Development and Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical Genetics and Human Genetics, Charité University Medicine Berlin, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Jens-Florian Schrezenmeier
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Felix Schick
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Olga Blau
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- Labor Berlin – Charité Vivantes GmbH, Berlin, Germany
| | - Jörg Westermann
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- Labor Berlin – Charité Vivantes GmbH, Berlin, Germany
| | - Frank G. Rücker
- Department of Internal Medicine III, University Hospital of Ulm, Ulm, Germany
| | - Zuyao Xia
- Department of Internal Medicine III, University Hospital of Ulm, Ulm, Germany
| | - Konstanze Döhner
- Department of Internal Medicine III, University Hospital of Ulm, Ulm, Germany
| | - Hubert Schrezenmeier
- Institute of Transfusion Medicine, University of Ulm, Ulm, Germany
- Institute for Clinical Transfusion Medicine and Immunogenetics, German Red Cross Blood Transfusion Service Baden-Württemberg-Hessen and University Hospital Ulm, Ulm, Germany
| | - Malte Spielmann
- Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institut für Humangenetik Lübeck, Universität zu Lübeck, Lübeck, Germany
| | - Alexander Meissner
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Uirá Souto Melo
- RG Development and Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical Genetics and Human Genetics, Charité University Medicine Berlin, Berlin, Germany
| | - Stefan Mundlos
- RG Development and Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical Genetics and Human Genetics, Charité University Medicine Berlin, Berlin, Germany
- Labor Berlin – Charité Vivantes GmbH, Berlin, Germany
| | - Lars Bullinger
- Division of Hematology, Oncology, and Cancer Immunology, Medical Department, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- Labor Berlin – Charité Vivantes GmbH, Berlin, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
23
|
Zong W, Wang J, Zhao R, Niu N, Su Y, Hu Z, Liu X, Hou X, Wang L, Wang L, Zhang L. Associations of genome-wide structural variations with phenotypic differences in cross-bred Eurasian pigs. J Anim Sci Biotechnol 2023; 14:136. [PMID: 37805653 PMCID: PMC10559557 DOI: 10.1186/s40104-023-00929-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/03/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND During approximately 10,000 years of domestication and selection, a large number of structural variations (SVs) have emerged in the genome of pig breeds, profoundly influencing their phenotypes and the ability to adapt to the local environment. SVs (≥ 50 bp) are widely distributed in the genome, mainly in the form of insertion (INS), mobile element insertion (MEI), deletion (DEL), duplication (DUP), inversion (INV), and translocation (TRA). While studies have investigated the SVs in pig genomes, genome-wide association studies (GWAS)-based on SVs have been rarely conducted. RESULTS Here, we obtained a high-quality SV map containing 123,151 SVs from 15 Large White and 15 Min pigs through integrating the power of several SV tools, with 53.95% of the SVs being reported for the first time. These high-quality SVs were used to recover the population genetic structure, confirming the accuracy of genotyping. Potential functional SV loci were then identified based on positional effects and breed stratification. Finally, GWAS were performed for 36 traits by genotyping the screened potential causal loci in the F2 population according to their corresponding genomic positions. We identified a large number of loci involved in 8 carcass traits and 6 skeletal traits on chromosome 7, with FKBP5 containing the most significant SV locus for almost all traits. In addition, we found several significant loci in intramuscular fat, abdominal circumference, heart weight, and liver weight, etc. CONCLUSIONS: We constructed a high-quality SV map using high-coverage sequencing data and then analyzed them by performing GWAS for 25 carcass traits, 7 skeletal traits, and 4 meat quality traits to determine that SVs may affect body size between European and Chinese pig breeds.
Collapse
Affiliation(s)
- Wencheng Zong
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jinbu Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Runze Zhao
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science, Shanxi Agricultural University, Jinzhong, 030801, China
| | - Naiqi Niu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yanfang Su
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ziping Hu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, 266109, China
| | - Xin Liu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xinhua Hou
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ligang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Lixian Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Longchao Zhang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
24
|
Paskov K, Chrisman B, Stockham N, Washington PY, Dunlap K, Jung JY, Wall DP. Identifying crossovers and shared genetic material in whole genome sequencing data from families. Genome Res 2023; 33:1747-1756. [PMID: 37879861 PMCID: PMC10691535 DOI: 10.1101/gr.277172.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/12/2023] [Indexed: 10/27/2023]
Abstract
Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.
Collapse
Affiliation(s)
- Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA;
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Nathaniel Stockham
- Department of Neuroscience, Stanford University, Stanford, California 94305, USA
| | | | - Kaitlyn Dunlap
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
25
|
Chen T, Tang C, Zheng W, Qian Y, Chen M, Zou Q, Jin Y, Wang K, Zhou X, Gou S, Lai L. VCFshiny: an R/Shiny application for interactively analyzing and visualizing genetic variants. BIOINFORMATICS ADVANCES 2023; 3:vbad107. [PMID: 37701675 PMCID: PMC10493178 DOI: 10.1093/bioadv/vbad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 07/24/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023]
Abstract
Summary Next-generation sequencing generates variants that are typically documented in variant call format (VCF) files. However, comprehensively examining variant information from VCF files can pose a significant challenge for researchers lacking bioinformatics and programming expertise. To address this issue, we introduce VCFshiny, an R package that features a user-friendly web interface enabling interactive annotation, interpretation, and visualization of variant information stored in VCF files. VCFshiny offers two annotation methods, Annovar and VariantAnnotation, to add annotations such as genes or functional impact. Annotated VCF files are deemed acceptable inputs for the purpose of summarizing and visualizing variant information. This includes the total number of variants, overlaps across sample replicates, base alterations of single nucleotides, length distributions of insertions and deletions (indels), high-frequency mutated genes, variant distribution in the genome and of genome features, variants in cancer driver genes, and cancer mutational signatures. VCFshiny serves to enhance the intelligibility of VCF files by offering an interactive web interface for analysis and visualization. Availability and implementation The source code is available under an MIT open source license at https://github.com/123xiaochen/VCFshiny with documentation at https://123xiaochen.github.io/VCFshiny.
Collapse
Affiliation(s)
- Tao Chen
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Chengcheng Tang
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Wei Zheng
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Yanan Qian
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Min Chen
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Qingjian Zou
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Yinge Jin
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Kepin Wang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- Sanya Institute of Swine Resource, Hainan Provincial Research Centre of Laboratory Animals, Sanya 572000, China
| | - Xiaoqing Zhou
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
| | - Shixue Gou
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- Sanya Institute of Swine Resource, Hainan Provincial Research Centre of Laboratory Animals, Sanya 572000, China
- Guangzhou National Laboratory, Guangzhou 510005, China
| | - Liangxue Lai
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, South China Institute of Large Animal Models for Biomedicine, School of Biotechnology and Health Sciences, Wuyi University, Jiangmen 529020, China
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- Sanya Institute of Swine Resource, Hainan Provincial Research Centre of Laboratory Animals, Sanya 572000, China
| |
Collapse
|
26
|
Khandekar A, Vangara R, Barnes M, Díaz-Gay M, Abbasi A, Bergstrom EN, Steele CD, Pillay N, Alexandrov LB. Visualizing and exploring patterns of large mutational events with SigProfilerMatrixGenerator. BMC Genomics 2023; 24:469. [PMID: 37605126 PMCID: PMC10440861 DOI: 10.1186/s12864-023-09584-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 08/14/2023] [Indexed: 08/23/2023] Open
Abstract
BACKGROUND All cancers harbor somatic mutations in their genomes. In principle, mutations affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large mutational events affect more than fifty base pairs, and, in most cases, they encompass copy-number and structural variants affecting many thousands of base pairs. Prior studies have demonstrated that examining patterns of somatic mutations can be leveraged to provide both biological and clinical insights, thus, resulting in an extensive repertoire of tools for evaluating small mutational events. Recently, classification schemas for examining large-scale mutational events have emerged and shown their utility across the spectrum of human cancers. However, there has been no computationally efficient bioinformatics tool that allows visualizing and exploring these large-scale mutational events. RESULTS Here, we present a new version of SigProfilerMatrixGenerator that now delivers integrated capabilities for examining large mutational events. The tool provides support for examining copy-number variants and structural variants under two previously developed classification schemas and it supports data from numerous algorithms and data modalities. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. CONCLUSIONS The new version of SigProfilerMatrixGenerator provides the first standardized bioinformatics tool for optimized exploration and visualization of two previously developed classification schemas for copy number and structural variants. The tool is freely available at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator with an extensive documentation at https://osf.io/s93d5/wiki/home/ .
Collapse
Affiliation(s)
- Azhar Khandekar
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Raviteja Vangara
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Mark Barnes
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Marcos Díaz-Gay
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Ammal Abbasi
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Erik N Bergstrom
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Christopher D Steele
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Nischalan Pillay
- Research Department of Pathology, Cancer Institute, University College London, London, WC1E 6BT, UK
- Department of Cellular and Molecular Pathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, HA7 4LP, Middlesex, UK
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA.
- Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA.
- Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA.
| |
Collapse
|
27
|
Grossi A, Rusmini M, Cusano R, Massidda M, Santamaria G, Napoli F, Angelelli A, Fava D, Uva P, Ceccherini I, Maghnie M. Whole genome sequencing in ROHHAD trios proved inconclusive: what's beyond? Front Genet 2023; 14:1031074. [PMID: 37609037 PMCID: PMC10440434 DOI: 10.3389/fgene.2023.1031074] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/27/2023] [Indexed: 08/24/2023] Open
Abstract
Rapid-onset Obesity with Hypothalamic dysfunction, Hypoventilation and Autonomic Dysregulation (ROHHAD) is a rare, life-threatening, pediatric disorder of unknown etiology, whose diagnosis is made difficult by poor knowledge of clinical manifestation, and lack of any confirmatory tests. Children with ROHHAD usually present with rapid onset weight gain which may be followed, over months or years, by hypothalamic dysfunction, hypoventilation, autonomic dysfunction, including impaired bowel motility, and tumors of neural crest origin. Despite the lack of evidence of inheritance in ROHHAD, several studies have been conducted in recent years that have explored possible genetic origins, with unsuccessful results. In order to broaden the search for possible genetic risk factors, an attempt was made to analyse the non-coding variants in two trios (proband with parents), recruited in the Gaslini Children's Hospital in Genoa (Italy). Both patients were females, with a typical history of ROHHAD. Gene variants (single nucleotide variants, short insertions/deletions, splice variants or in tandem expansion of homopolymeric tracts) or altered genomic regions (copy number variations or structural variants) shared between the two probands were searched. Currently, we have not found any potentially pathogenic changes, consistent with the ROHHAD clinical phenotype, and involving genes, regions or pathways shared between the two trios. To definitively rule out the genetic etiology, third-generation sequencing technologies (e.g., long-reads sequencing, optical mapping) should be applied, as well as other pathways, including those associated with immunological and autoimmune disorders, should be explored, making use not only of genomics but also of different -omic datasets.
Collapse
Affiliation(s)
- A. Grossi
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - M. Rusmini
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
- Clinical Bioinformatics, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - R. Cusano
- CRS4, Science and Technology Park Polaris, Pula, Italy
| | - M. Massidda
- CRS4, Science and Technology Park Polaris, Pula, Italy
| | - G. Santamaria
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - F. Napoli
- Pediatric Clinic and Endocrinology, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - A. Angelelli
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| | - D. Fava
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| | - P. Uva
- Clinical Bioinformatics, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - I. Ceccherini
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - M. Maghnie
- Pediatric Clinic and Endocrinology, IRCCS Istituto Giannina Gaslini, Genova, Italy
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| |
Collapse
|
28
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
29
|
Schmidt M, Kutzner A. MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads. Genome Biol 2023; 24:170. [PMID: 37461107 DOI: 10.1186/s13059-023-03009-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/06/2023] [Indexed: 07/20/2023] Open
Abstract
Structural variant (SV) calling belongs to the standard tools of modern bioinformatics for identifying and describing alterations in genomes. Initially, this work presents several complex genomic rearrangements that reveal conceptual ambiguities inherent to the representation via basic SV. We contextualize these ambiguities theoretically as well as practically and propose a graph-based approach for resolving them. For various yeast genomes, we practically compute adjacency matrices of our graph model and demonstrate that they provide highly accurate descriptions of one genome in terms of another. An open-source prototype implementation of our approach is available under the MIT license at https://github.com/ITBE-Lab/MA .
Collapse
Affiliation(s)
- Markus Schmidt
- Biomedical Center Munich, Department of Physiological Chemistry, Ludwig-Maximilians-Universität, Großhaderner Str. 9, 82152, Planegg-Martinsried, Germany
| | - Arne Kutzner
- Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 133-791, Republic of Korea.
| |
Collapse
|
30
|
Sohn JI, Choi MH, Yi D, Menon VA, Kim YJ, Lee J, Park JW, Kyung S, Shin SH, Na B, Joung JG, Ju YS, Yeom MS, Koh Y, Yoon SS, Baek D, Kim TM, Nam JW. Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets. Nat Biomed Eng 2023; 7:853-866. [PMID: 36536253 DOI: 10.1038/s41551-022-00980-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/01/2022] [Indexed: 12/24/2022]
Abstract
Variant callers typically produce massive numbers of false positives for structural variations, such as cancer-relevant copy-number alterations and fusion genes resulting from genome rearrangements. Here we describe an ultrafast and accurate detector of somatic structural variations that reduces read-mapping costs by filtering out reads matched to pan-genome k-mer sets. The detector, which we named ETCHING (for efficient detection of chromosomal rearrangements and fusion genes), reduces the number of false positives by leveraging machine-learning classifiers trained with six breakend-related features (clipped-read count, split-reads count, supporting paired-end read count, average mapping quality, depth difference and total length of clipped bases). When benchmarked against six callers on reference cell-free DNA, validated biomarkers of structural variants, matched tumour and normal whole genomes, and tumour-only targeted sequencing datasets, ETCHING was 11-fold faster than the second-fastest structural-variant caller at comparable performance and memory use. The speed and accuracy of ETCHING may aid large-scale genome projects and facilitate practical implementations in precision medicine.
Collapse
Affiliation(s)
- Jang-Il Sohn
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, Republic of Korea
| | - Min-Hak Choi
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Dohun Yi
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Vipin A Menon
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Yeon Jeong Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
| | - Junehawk Lee
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Jung Woo Park
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | | | | | - Byunggook Na
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
| | - Je-Gun Joung
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, Republic of Korea
| | - Young Seok Ju
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
- Biomedical Science and Engineering Interdisciplinary Program, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Min Sun Yeom
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Youngil Koh
- College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Sung-Soo Yoon
- College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Tae-Min Kim
- Department of Medical Informatics and Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, Hanyang University, Seoul, Republic of Korea.
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, Republic of Korea.
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea.
| |
Collapse
|
31
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
32
|
Medvedev P. Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond. COMMUNICATIONS OF THE ACM 2023; 66:118-125. [PMID: 38736702 PMCID: PMC11087067 DOI: 10.1145/3571723] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
A case study reveals the theoretical analysis of algorithms is not always as helpful as standard dogma might suggest.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
33
|
Rajaby R, Liu DX, Au CH, Cheung YT, Lau AYT, Yang QY, Sung WK. INSurVeyor: improving insertion calling from short read sequencing data. Nat Commun 2023; 14:3243. [PMID: 37277343 DOI: 10.1038/s41467-023-38870-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 05/18/2023] [Indexed: 06/07/2023] Open
Abstract
Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.
Collapse
Affiliation(s)
- Ramesh Rajaby
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore
| | - Dong-Xu Liu
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Chun Hang Au
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Yuen-Ting Cheung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Amy Yuet Ting Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Qing-Yong Yang
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wing-Kin Sung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China.
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore.
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| |
Collapse
|
34
|
Wilson TE, Ahmed S, Higgins J, Salk J, Glover T. svCapture: efficient and specific detection of very low frequency structural variant junctions by error-minimized capture sequencing. NAR Genom Bioinform 2023; 5:lqad042. [PMID: 37181851 PMCID: PMC10167630 DOI: 10.1093/nargab/lqad042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 03/15/2023] [Accepted: 04/28/2023] [Indexed: 05/16/2023] Open
Abstract
Error-corrected sequencing of genomic targets enriched by probe-based capture has become a standard approach for detecting single-nucleotide variants (SNVs) and small insertion/deletions (indels) present at very low variant allele frequencies. Less attention has been given to comparable strategies for rare structural variant (SV) junctions, where different error mechanisms must be addressed. Working from samples with known SV properties, we demonstrate that duplex sequencing (DuplexSeq), which demands confirmation of variants on both strands of a source DNA molecule, eliminates false SV junctions arising from chimeric PCR. DuplexSeq could not address frequent intermolecular ligation artifacts that arise during Y-adapter addition prior to strand denaturation without requiring multiple source molecules. In contrast, tagmentation libraries coupled with data filtering based on strand family size greatly reduced both artifact classes and enabled efficient and specific detection of single-molecule SV junctions. The throughput of SV capture sequencing (svCapture) and base-level accuracy of DuplexSeq provided detailed views of the microhomology profile and limited occurrence of de novo SNVs near the junctions of hundreds of newly created SVs, suggesting end joining as a possible formation mechanism. The open source svCapture pipeline enables rare SV detection as a routine addition to SNVs/indels in properly prepared capture sequencing libraries.
Collapse
Affiliation(s)
- Thomas E Wilson
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Samreen Ahmed
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jake Higgins
- TwinStrand Biosciences Inc., Seattle, WA 98121, USA
| | - Jesse J Salk
- TwinStrand Biosciences Inc., Seattle, WA 98121, USA
| | - Thomas W Glover
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
35
|
Lee YL, Bosse M, Takeda H, Moreira GCM, Karim L, Druet T, Oget-Ebrad C, Coppieters W, Veerkamp RF, Groenen MAM, Georges M, Bouwman AC, Charlier C. High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data. BMC Genomics 2023; 24:225. [PMID: 37127590 PMCID: PMC10152703 DOI: 10.1186/s12864-023-09259-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 03/20/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios). RESULTS We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array. CONCLUSION We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.
Collapse
Affiliation(s)
- Young-Lim Lee
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands.
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium.
| | - Mirte Bosse
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Haruko Takeda
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | | | - Latifa Karim
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Claire Oget-Ebrad
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
- GIGA Institute, GIGA Genomics Platform, University of Liège, Liège, Belgium
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Michel Georges
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Carole Charlier
- Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
| |
Collapse
|
36
|
Steensma MJ, Lee YL, Bouwman AC, Pita Barros C, Derks MFL, Bink MCAM, Harlizius B, Huisman AE, Crooijmans RPMA, Groenen MAM, Mulder HA, Rochus CM. Identification and characterisation of de novo germline structural variants in two commercial pig lines using trio-based whole genome sequencing. BMC Genomics 2023; 24:208. [PMID: 37072725 PMCID: PMC10114323 DOI: 10.1186/s12864-023-09296-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/04/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND De novo mutations arising in the germline are a source of genetic variation and their discovery broadens our understanding of genetic disorders and evolutionary patterns. Although the number of de novo single nucleotide variants (dnSNVs) has been studied in a number of species, relatively little is known about the occurrence of de novo structural variants (dnSVs). In this study, we investigated 37 deeply sequenced pig trios from two commercial lines to identify dnSVs present in the offspring. The identified dnSVs were characterised by identifying their parent of origin, their functional annotations and characterizing sequence homology at the breakpoints. RESULTS We identified four swine germline dnSVs, all located in intronic regions of protein-coding genes. Our conservative, first estimate of the swine germline dnSV rate is 0.108 (95% CI 0.038-0.255) per generation (one dnSV per nine offspring), detected using short-read sequencing. Two detected dnSVs are clusters of mutations. Mutation cluster 1 contains a de novo duplication, a dnSNV and a de novo deletion. Mutation cluster 2 contains a de novo deletion and three de novo duplications, of which one is inverted. Mutation cluster 2 is 25 kb in size, whereas mutation cluster 1 (197 bp) and the other two individual dnSVs (64 and 573 bp) are smaller. Only mutation cluster 2 could be phased and is located on the paternal haplotype. Mutation cluster 2 originates from both micro-homology as well as non-homology mutation mechanisms, where mutation cluster 1 and the other two dnSVs are caused by mutation mechanisms lacking sequence homology. The 64 bp deletion and mutation cluster 1 were validated through PCR. Lastly, the 64 bp deletion and the 573 bp duplication were validated in sequenced offspring of probands with three generations of sequence data. CONCLUSIONS Our estimate of 0.108 dnSVs per generation in the swine germline is conservative, due to our small sample size and restricted possibilities of dnSV detection from short-read sequencing. The current study highlights the complexity of dnSVs and shows the potential of breeding programs for pigs and livestock species in general, to provide a suitable population structure for identification and characterisation of dnSVs.
Collapse
Affiliation(s)
- Marije J Steensma
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands.
| | - Y L Lee
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - A C Bouwman
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - C Pita Barros
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - M F L Derks
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
- Topigs Norsvin Research Center, Schoenaker 6, Beuningen, 6641 SZ, the Netherlands
| | - M C A M Bink
- Hendrix Genetics, P.O. Box 114, Boxmeer, 5830 AC, the Netherlands
| | - B Harlizius
- Topigs Norsvin Research Center, Schoenaker 6, Beuningen, 6641 SZ, the Netherlands
| | - A E Huisman
- Hendrix Genetics, P.O. Box 114, Boxmeer, 5830 AC, the Netherlands
| | - R P M A Crooijmans
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - M A M Groenen
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - H A Mulder
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, Wageningen, 6700 AH, the Netherlands
| | - C M Rochus
- University of Guelph, Centre for Genetic Improvement of Livestock, 50 Stone Rd E, Guelph, O N, N1G 2W1, Canada
| |
Collapse
|
37
|
Tan L, Qi X, Kong W, Jin J, Lu D, Zhang X, Wang Y, Wang S, Dong W, Shi X, Chen W, Wang J, Li K, Xie Y, Gao L, Guan F, Gao K, Li C, Wang C, Hu Z, Zhang L, Guo X, Shen B, Ma Y. A conditional knockout rat resource of mitochondrial protein-coding genes via a DdCBE-induced premature stop codon. SCIENCE ADVANCES 2023; 9:eadf2695. [PMID: 37058569 PMCID: PMC10104465 DOI: 10.1126/sciadv.adf2695] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 03/14/2023] [Indexed: 06/19/2023]
Abstract
Hundreds of pathogenic variants of mitochondrial DNA (mtDNA) have been reported to cause mitochondrial diseases, which still lack effective treatments. It is a huge challenge to install these mutations one by one. We repurposed the DddA-derived cytosine base editor to incorporate a premature stop codon in the mtProtein-coding genes to ablate mitochondrial proteins encoded in the mtDNA (mtProteins) instead of installing pathogenic variants and generated a library of both cell and rat resources with mtProtein depletion. In vitro, we depleted 12 of 13 mtProtein-coding genes with high efficiency and specificity, resulting in decreased mtProtein levels and impaired oxidative phosphorylation. Moreover, we generated six conditional knockout rat strains to ablate mtProteins using Cre/loxP system. Mitochondrially encoded ATP synthase membrane subunit 8 and NADH:ubiquinone oxidoreductase core subunit 1 were specifically depleted in heart cells or neurons, resulting in heart failure or abnormal brain development. Our work provides cell and rat resources for studying the function of mtProtein-coding genes and therapeutic strategies.
Collapse
Affiliation(s)
- Lei Tan
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xiaolong Qi
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Weining Kong
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Jiachuan Jin
- Center for Reproductive Medicine, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Dan Lu
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Xu Zhang
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Yue Wang
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Siting Wang
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Dong
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Xudong Shi
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Wei Chen
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Jianying Wang
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Keru Li
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Yuan Xie
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Lijuan Gao
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Feifei Guan
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Kai Gao
- Beijing Engineering Research Center for Experimental Animal Models of Human Critical Diseases, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| | - Chaojun Li
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Cheng Wang
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Zhibin Hu
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Gusu School, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Lianfeng Zhang
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
- Neuroscience center, Chinese Academy of Medical Sciences, Beijing, China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Bin Shen
- State Key Laboratory of Reproductive Medicine, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Gusu School, Nanjing Medical University, Nanjing, Jiangsu, China
- Zhejiang Laboratory, Hangzhou, Zhejiang, China
| | - Yuanwu Ma
- Key Laboratory of Human Disease Comparative Medicine, National Health Commission of China (NHC), Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
- Neuroscience center, Chinese Academy of Medical Sciences, Beijing, China
- National Human Diseases Animal Model Resource Center, Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Peking Union Medicine College, Beijing, China
| |
Collapse
|
38
|
Bezdvornykh I, Cherkasov N, Kanapin A, Samsonova A. A collection of read depth profiles at structural variant breakpoints. Sci Data 2023; 10:186. [PMID: 37024526 PMCID: PMC10079824 DOI: 10.1038/s41597-023-02076-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 03/16/2023] [Indexed: 04/08/2023] Open
Abstract
SWaveform, a newly created open genome-wide resource for read depth signal in the vicinity of structural variant (SV) breakpoints, aims to boost development of computational tools and algorithms for discovery of genomic rearrangement events from sequencing data. SVs are a dominant force shaping genomes and substantially contributing to genetic diversity. Still, there are challenges in reliable and efficient genotyping of SVs from whole genome sequencing data, thus delaying translation into clinical applications and wasting valuable resources. SWaveform includes a database containing ~7 M of read depth profiles at SV breakpoints extracted from 911 sequencing samples generated by the Human Genome Diversity Project, generalised patterns of the signal at breakpoints, an interface for navigation and download, as well as a toolbox for local deployment with user's data. The dataset can be of immense value to bioinformatics and engineering communities as it empowers smooth application of intelligent signal processing and machine learning techniques for discovery of genomic rearrangement events and thus opens the floodgates for development of innovative algorithms and software.
Collapse
Affiliation(s)
- Igor Bezdvornykh
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Nikolay Cherkasov
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Alexander Kanapin
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia
| | - Anastasia Samsonova
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, 199004, Russia.
| |
Collapse
|
39
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
40
|
Divakar MK, Jain A, Bhoyar RC, Senthivel V, Jolly B, Imran M, Sharma D, Bajaj A, Gupta V, Scaria V, Sivasubbu S. Whole-genome sequencing of 1029 Indian individuals reveals unique and rare structural variants. J Hum Genet 2023; 68:409-417. [PMID: 36813834 DOI: 10.1038/s10038-023-01131-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 02/24/2023]
Abstract
Structural variants contribute to genetic variability in human genomes and they can be presented in population-specific patterns. We aimed to understand the landscape of structural variants in the genomes of healthy Indian individuals and explore their potential implications in genetic disease conditions. For the identification of structural variants, a whole genome sequencing dataset of 1029 self-declared healthy Indian individuals from the IndiGen project was analysed. Further, these variants were evaluated for potential pathogenicity and their associations with genetic diseases. We also compared our identified variations with the existing global datasets. We generated a compendium of total 38,560 high-confident structural variants, comprising 28,393 deletions, 5030 duplications, 5038 insertions, and 99 inversions. Particularly, we identified around 55% of all these variants were found to be unique to the studied population. Further analysis revealed 134 deletions with predicted pathogenic/likely pathogenic effects and their affected genes were majorly enriched for neurological disease conditions, such as intellectual disability and neurodegenerative diseases. The IndiGenomes dataset helped us to understand the unique spectrum of structural variants in the Indian population. More than half of identified variants were not present in the publicly available global dataset on structural variants. Clinically important deletions identified in IndiGenomes might aid in improving the diagnosis of unsolved genetic diseases, particularly in neurological conditions. Along with basal allele frequency data and clinically important deletions, IndiGenomes data might serve as a baseline resource for future studies on genomic structural variant analysis in the Indian population.
Collapse
Affiliation(s)
- Mohit Kumar Divakar
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Abhinav Jain
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Rahul C Bhoyar
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India
| | - Vigneshwar Senthivel
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Bani Jolly
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Mohamed Imran
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Disha Sharma
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Anjali Bajaj
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Vishu Gupta
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Vinod Scaria
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| | - Sridhar Sivasubbu
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, New Delhi, 110025, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
41
|
Khandekar A, Vangara R, Barnes M, Díaz-Gay M, Abbasi A, Bergstrom EN, Steele CD, Pillay N, Alexandrov LB. Visualizing and exploring patterns of large mutational events with SigProfilerMatrixGenerator. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.03.527015. [PMID: 36778452 PMCID: PMC9915726 DOI: 10.1101/2023.02.03.527015] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Background All cancers harbor somatic mutations in their genomes. In principle, mutations affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large mutational events affect more than fifty base pairs, and, in most cases, they encompass copy-number and structural variants affecting many thousands of base pairs. Prior studies have demonstrated that examining patterns of somatic mutations can be leveraged to provide both biological and clinical insights, thus, resulting in an extensive repertoire of tools for evaluating small mutational events. Recently, classification schemas for examining large-scale mutational events have emerged and shown their utility across the spectrum of human cancers. However, there has been no standard bioinformatics tool that allows visualizing and exploring these large-scale mutational events. Results Here, we present a new version of SigProfilerMatrixGenerator that now delivers integrated capabilities for examining large mutational events. The tool provides support for examining copy-number variants and structural variants under two previously developed classification schemas and it supports data from numerous algorithms and data modalities. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. Conclusions The new version of SigProfilerMatrixGenerator provides the first standardized bioinformatics tool for optimized exploration and visualization of two previously developed classification schemas for copy number and structural variants. The tool is freely available at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator with an extensive documentation at https://osf.io/s93d5/wiki/home/ .
Collapse
|
42
|
Udine E, Jain A, van Blitterswijk M. Advances in sequencing technologies for amyotrophic lateral sclerosis research. Mol Neurodegener 2023; 18:4. [PMID: 36635726 PMCID: PMC9838075 DOI: 10.1186/s13024-022-00593-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/23/2022] [Indexed: 01/14/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is caused by upper and lower motor neuron loss and has a fairly rapid disease progression, leading to fatality in an average of 2-5 years after symptom onset. Numerous genes have been implicated in this disease; however, many cases remain unexplained. Several technologies are being used to identify regions of interest and investigate candidate genes. Initial approaches to detect ALS genes include, among others, linkage analysis, Sanger sequencing, and genome-wide association studies. More recently, next-generation sequencing methods, such as whole-exome and whole-genome sequencing, have been introduced. While those methods have been particularly useful in discovering new ALS-linked genes, methodological advances are becoming increasingly important, especially given the complex genetics of ALS. Novel sequencing technologies, like long-read sequencing, are beginning to be used to uncover the contribution of repeat expansions and other types of structural variation, which may help explain missing heritability in ALS. In this review, we discuss how popular and/or upcoming methods are being used to discover ALS genes, highlighting emerging long-read sequencing platforms and their role in aiding our understanding of this challenging disease.
Collapse
Affiliation(s)
- Evan Udine
- grid.417467.70000 0004 0443 9942Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Mayo Clinic Graduate School of Biomedical Sciences, 4500 San Pablo Road S, Jacksonville, FL 32224 USA
| | - Angita Jain
- grid.417467.70000 0004 0443 9942Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Mayo Clinic Graduate School of Biomedical Sciences, 4500 San Pablo Road S, Jacksonville, FL 32224 USA ,grid.417467.70000 0004 0443 9942Center for Clinical and Translational Sciences, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL 32224 USA
| | - Marka van Blitterswijk
- Department of Neuroscience, Mayo Clinic, 4500 San Pablo Road S, Jacksonville, FL, 32224, USA.
| |
Collapse
|
43
|
Li J, Gao L, Ye Y. HiSV: A control-free method for structural variation detection from Hi-C data. PLoS Comput Biol 2023; 19:e1010760. [PMID: 36608109 DOI: 10.1371/journal.pcbi.1010760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 01/07/2023] Open
Abstract
Structural variations (SVs) play an essential role in the evolution of human genomes and are associated with cancer genetics and rare disease. High-throughput chromosome capture (Hi-C) technology probed all genome-wide crosslinked chromatin to study the spatial architecture of chromosomes. Hi-C read pairs can span megabases, making the technology useful for detecting large-scale SVs. So far, the identification of SVs from Hi-C data is still in the early stages with only a few methods available. Especially, no algorithm has been developed that can detect SVs without control samples. Therefore, we developed HiSV (Hi-C for Structural Variation), a control-free method for identifying large-scale SVs from a Hi-C sample. Inspired by the single image saliency detection model, HiSV constructed a saliency map of interaction frequencies and extracted saliency segments as large-scale SVs. By evaluating both simulated and real data, HiSV not only detected all variant types, but also achieved a higher level of accuracy and sensitivity than existing methods. Moreover, our results on cancer cell lines showed that HiSV effectively detected eight complex SV events and identified two novel SVs of key factors associated with cancer development. Finally, we found that integrating the result of HiSV helped the WGS method to identify a total number of 94 novel SVs in two cancer cell lines.
Collapse
Affiliation(s)
- Junping Li
- Department of Computer Science, School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Lin Gao
- Department of Computer Science, School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Yusen Ye
- Department of Computer Science, School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| |
Collapse
|
44
|
Stuart KC, Sherwin WB, Edwards RJ, Rollins LA. Evolutionary genomics: Insights from the invasive European starlings. Front Genet 2023; 13:1010456. [PMID: 36685843 PMCID: PMC9845568 DOI: 10.3389/fgene.2022.1010456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/23/2022] [Indexed: 01/06/2023] Open
Abstract
Two fundamental questions for evolutionary studies are the speed at which evolution occurs, and the way that this evolution may present itself within an organism's genome. Evolutionary studies on invasive populations are poised to tackle some of these pressing questions, including understanding the mechanisms behind rapid adaptation, and how it facilitates population persistence within a novel environment. Investigation of these questions are assisted through recent developments in experimental, sequencing, and analytical protocols; in particular, the growing accessibility of next generation sequencing has enabled a broader range of taxa to be characterised. In this perspective, we discuss recent genetic findings within the invasive European starlings in Australia, and outline some critical next steps within this research system. Further, we use discoveries within this study system to guide discussion of pressing future research directions more generally within the fields of population and evolutionary genetics, including the use of historic specimens, phenotypic data, non-SNP genetic variants (e.g., structural variants), and pan-genomes. In particular, we emphasise the need for exploratory genomics studies across a range of invasive taxa so we can begin understanding broad mechanisms that underpin rapid adaptation in these systems. Understanding how genetic diversity arises and is maintained in a population, and how this contributes to adaptability, requires a deep understanding of how evolution functions at the molecular level, and is of fundamental importance for the future studies and preservation of biodiversity across the globe.
Collapse
Affiliation(s)
- Katarina C. Stuart
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia,*Correspondence: Katarina C. Stuart,
| | - William B. Sherwin
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Richard J. Edwards
- Evolution & Ecology Research Centre, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Lee A Rollins
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia
| |
Collapse
|
45
|
Ding X, Han J, Van Winkle LS, Zhang QY. Detection of Transgene Location in the CYP2A13/2B6/2F1-transgenic Mouse Model using Optical Genome Mapping Technology. Drug Metab Dispos 2023; 51:46-53. [PMID: 36273825 PMCID: PMC9832375 DOI: 10.1124/dmd.122.001090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/01/2022] [Accepted: 10/04/2022] [Indexed: 01/14/2023] Open
Abstract
Most transgenic mouse models are generated through random integration of the transgene. The location of the transgene provides valuable information for assessing potential effects of the transgenesis on the host and for designing genotyping protocols that can amplify across the integration site, but it is challenging to identify. Here, we report the successful utility of optical genome mapping technology to identify the transgene insertion site in a CYP2A13/2B6/2F1-transgenic mouse model, which produces three human cytochrome P450 (P450) enzymes (CYP2A13, CYP2B6, and CYP2F1) that are encoded by neighboring genes on human chromosome 19. These enzymes metabolize many drugs, respiratory toxicants, and chemical carcinogens. Initial efforts to identify candidate insertion sites by whole genome sequencing was unsuccessful, apparently because the transgene is located in a region of the mouse genome that contains highly repetitive sequences. Subsequent utility of the optical genome mapping approach, which compares genome-wide marker distribution between the transgenic mouse genome and a reference mouse (GRCm38) or human (GRCh38) genome, localized the insertion site to mouse chromosome 14, between two marker positions at 4451324 base pair and 4485032 base pair. A transgene-mouse genome junction sequence was further identified through long-polymerase chain reaction amplification and DNA sequencing at GRCm38 Chr.14:4484726. The transgene insertion (∼2.4 megabase pair) contained 5-7 copies of the human transgenes, which replaced a 26.9-33.4 kilobase pair mouse genomic region, including exons 1-4 of Gm3182, a predicted and highly redundant gene. Finally, the sequencing results enabled the design of a new genotyping protocol that can distinguish between hemizygous and homozygous CYP2A13/2B6/2F1-transgenic mice. SIGNIFICANCE STATEMENT: This study characterizes the genomic structure of, and provides a new genotyping method for, a transgenic mouse model that expresses three human P450 enzymes, CYP2A13, CYP2B6, and CYP2F1, that are important in xenobiotic metabolism and toxicity. The demonstrated success in applying the optical genome mapping technology for identification of transgene insertion sites should encourage others to do the same for other transgenic models generated through random integration, including most of the currently available human P450 transgenic mouse models.
Collapse
Affiliation(s)
- Xinxin Ding
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - John Han
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - Laura S Van Winkle
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| | - Qing-Yu Zhang
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, Arizona (X.D., J.H., Q.-Y.Z.) and Center for Health and the Environment and Department of Anatomy Physiology and Cell Biology, School of Veterinary Medicine, UC Davis, Davis, California (L.S.V.W.)
| |
Collapse
|
46
|
Lesack K, Mariene GM, Andersen EC, Wasmuth JD. Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans. PLoS One 2022; 17:e0278424. [PMID: 36584177 PMCID: PMC9803319 DOI: 10.1371/journal.pone.0278424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 01/01/2023] Open
Abstract
The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
Collapse
Affiliation(s)
- Kyle Lesack
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Grace M. Mariene
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Erik C. Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, United States of America
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
- * E-mail:
| |
Collapse
|
47
|
Dashnow H, Pedersen BS, Hiatt L, Brown J, Beecroft SJ, Ravenscroft G, LaCroix AJ, Lamont P, Roxburgh RH, Rodrigues MJ, Davis M, Mefford HC, Laing NG, Quinlan AR. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol 2022; 23:257. [PMID: 36517892 PMCID: PMC9753380 DOI: 10.1186/s13059-022-02826-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Collapse
Affiliation(s)
- Harriet Dashnow
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| | - Brent S. Pedersen
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah, Salt Lake City, UT USA ,grid.7692.a0000000090126352Utrecht University Medical Center, Utrecht, The Netherlands
| | - Laurel Hiatt
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| | - Joe Brown
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| | - Sarah J. Beecroft
- Pawsey Supercomputing Research Centre, Kensington, WA Australia ,grid.1012.20000 0004 1936 7910Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA Australia
| | - Gianina Ravenscroft
- grid.1012.20000 0004 1936 7910Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA Australia
| | - Amy J. LaCroix
- grid.34477.330000000122986657Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195 USA
| | - Phillipa Lamont
- grid.416195.e0000 0004 0453 3875Neurogenetic Unit, Royal Perth Hospital, Perth, WA Australia
| | - Richard H. Roxburgh
- grid.414055.10000 0000 9027 2851Neurology, Auckland City Hospital, Auckland, New Zealand
| | - Miriam J. Rodrigues
- grid.414055.10000 0000 9027 2851Neurology, Auckland City Hospital, Auckland, New Zealand ,grid.9654.e0000 0004 0372 3343Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Mark Davis
- grid.413880.60000 0004 0453 2856Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Heather C. Mefford
- grid.34477.330000000122986657Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195 USA
| | - Nigel G. Laing
- grid.1012.20000 0004 1936 7910Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, WA Australia ,grid.413880.60000 0004 0453 2856Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health, Nedlands, Australia
| | - Aaron R. Quinlan
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, University of Utah, Salt Lake City, UT USA
| |
Collapse
|
48
|
Muñoz-Barrera A, Rubio-Rodríguez LA, Díaz-de Usera A, Jáspez D, Lorenzo-Salazar JM, González-Montelongo R, García-Olivares V, Flores C. From Samples to Germline and Somatic Sequence Variation: A Focus on Next-Generation Sequencing in Melanoma Research. Life (Basel) 2022; 12:1939. [PMID: 36431075 PMCID: PMC9695713 DOI: 10.3390/life12111939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Next-generation sequencing (NGS) applications have flourished in the last decade, permitting the identification of cancer driver genes and profoundly expanding the possibilities of genomic studies of cancer, including melanoma. Here we aimed to present a technical review across many of the methodological approaches brought by the use of NGS applications with a focus on assessing germline and somatic sequence variation. We provide cautionary notes and discuss key technical details involved in library preparation, the most common problems with the samples, and guidance to circumvent them. We also provide an overview of the sequence-based methods for cancer genomics, exposing the pros and cons of targeted sequencing vs. exome or whole-genome sequencing (WGS), the fundamentals of the most common commercial platforms, and a comparison of throughputs and key applications. Details of the steps and the main software involved in the bioinformatics processing of the sequencing results, from preprocessing to variant prioritization and filtering, are also provided in the context of the full spectrum of genetic variation (SNVs, indels, CNVs, structural variation, and gene fusions). Finally, we put the emphasis on selected bioinformatic pipelines behind (a) short-read WGS identification of small germline and somatic variants, (b) detection of gene fusions from transcriptomes, and (c) de novo assembly of genomes from long-read WGS data. Overall, we provide comprehensive guidance across the main methodological procedures involved in obtaining sequencing results for the most common short- and long-read NGS platforms, highlighting key applications in melanoma research.
Collapse
Affiliation(s)
- Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Ana Díaz-de Usera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Rafaela González-Montelongo
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, 35450 Las Palmas de Gran Canaria, Spain
| |
Collapse
|
49
|
Chen Y, Miao Y, Bai W, Lin K, Pang E. Characteristics and potential functional effects of long insertions in Asian butternuts. BMC Genomics 2022; 23:732. [PMID: 36307757 PMCID: PMC9617325 DOI: 10.1186/s12864-022-08961-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/17/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species.
Results
We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS (https://github.com/CMB-BNU/PLAINS.git), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing.
Conclusions
Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.
Collapse
|
50
|
Pang H, Lin J, Luo S, Huang G, Li X, Xie Z, Zhou Z. The missing heritability in type 1 diabetes. Diabetes Obes Metab 2022; 24:1901-1911. [PMID: 35603907 PMCID: PMC9545639 DOI: 10.1111/dom.14777] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/04/2022] [Accepted: 05/17/2022] [Indexed: 12/15/2022]
Abstract
Type 1 diabetes (T1D) is a complex autoimmune disease characterized by an absolute deficiency of insulin. It affects more than 20 million people worldwide and imposes an enormous financial burden on patients. The underlying pathogenic mechanisms of T1D are still obscure, but it is widely accepted that both genetics and the environment play an important role in its onset and development. Previous studies have identified more than 60 susceptible loci associated with T1D, explaining approximately 80%-85% of the heritability. However, most identified variants confer only small increases in risk, which restricts their potential clinical application. In addition, there is still a so-called 'missing heritability' phenomenon. While the gap between known heritability and true heritability in T1D is small compared with that in other complex traits and disorders, further elucidation of T1D genetics has the potential to bring novel insights into its aetiology and provide new therapeutic targets. Many hypotheses have been proposed to explain the missing heritability, including variants remaining to be found (variants with small effect sizes, rare variants and structural variants) and interactions (gene-gene and gene-environment interactions; e.g. epigenetic effects). In the following review, we introduce the possible sources of missing heritability and discuss the existing related knowledge in the context of T1D.
Collapse
Affiliation(s)
- Haipeng Pang
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Jian Lin
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Shuoming Luo
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Gan Huang
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Xia Li
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Zhiguo Xie
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| | - Zhiguang Zhou
- National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and EndocrinologyThe Second Xiangya Hospital of Central South UniversityChangshaChina
| |
Collapse
|