1
|
She H, Liu Z, Xu Z, Zhang H, Wu J, Wang X, Cheng F, Charlesworth D, Qian W. Insights into spinach domestication from genome sequences of two wild spinach progenitors, Spinacia turkestanica and Spinacia tetrandra. THE NEW PHYTOLOGIST 2024; 243:477-494. [PMID: 38715078 DOI: 10.1111/nph.19799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/18/2024] [Indexed: 06/07/2024]
Abstract
Cultivated spinach (Spinacia oleracea) is a dioecious species. We report high-quality genome sequences for its two closest wild relatives, Spinacia turkestanica and Spinacia tetrandra, which are also dioecious, and are used to study the genetics of spinach domestication. Using a combination of genomic approaches, we assembled genomes of both these species and analyzed them in comparison with the previously assembled S. oleracea genome. These species diverged c. 6.3 million years ago (Ma), while cultivated spinach split from S. turkestanica 0.8 Ma. In all three species, all six chromosomes include very large gene-poor, repeat-rich regions, which, in S. oleracea, are pericentromeric regions with very low recombination rates in both male and female genetic maps. We describe population genomic evidence that the similar regions in the wild species also recombine rarely. We characterized 282 structural variants (SVs) that have been selected during domestication. These regions include genes associated with leaf margin type and flowering time. We also describe evidence that the downy mildew resistance loci of cultivated spinach are derived from introgression from both wild spinach species. Collectively, this study reveals the genome architecture of spinach assemblies and highlights the importance of SVs during the domestication of cultivated spinach.
Collapse
Affiliation(s)
- Hongbing She
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhiyuan Liu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhaosheng Xu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Helong Zhang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jian Wu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Xiaowu Wang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Feng Cheng
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Deborah Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, Edinburgh, EH9 3FL, UK
| | - Wei Qian
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| |
Collapse
|
2
|
Srivastav SP, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. Genome Res 2024; 34:711-724. [PMID: 38749655 DOI: 10.1101/gr.278062.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/07/2024] [Indexed: 05/28/2024]
Abstract
The piRNA pathway is a highly conserved mechanism to repress transposable element (TE) activity in the animal germline via a specialized class of small RNAs called piwi-interacting RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). Although the molecular processes by which piCs function are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC origin and evolution, we use a population genomic approach to compare piC activity and sequence composition across eight geographically distant strains of D. melanogaster with high-quality long-read genome assemblies. We perform annotations of ovary piCs and genome-wide TE content in each strain. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs. Most TEs inferred to be recently active show an enrichment of insertions into old and large piCs, consistent with the previously proposed "trap" model of piC evolution. In contrast, a small subset of active LTR families is enriched for the formation of new piCs, suggesting that these TEs have higher proclivity to form piCs. Thus, our findings uncover processes leading to the origin of piCs. We propose that piC evolution begins with the emergence of piRNAs from individual insertions of a few select TE families prone to seed new piCs that subsequently expand by accretion of insertions from most other TE families during evolution to form larger "trap" clusters. Our study shows that TEs themselves are the major force driving the rapid evolution of piCs.
Collapse
Affiliation(s)
- Satyam P Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
3
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
4
|
Margalit S, Tulpová Z, Detinis Zur T, Michaeli Y, Deek J, Nifker G, Haldar R, Gnatek Y, Omer D, Dekel B, Baris Feldman H, Grunwald A, Ebenstein Y. Long-Read Structural and Epigenetic Profiling of a Kidney Tumor-Matched Sample with Nanopore Sequencing and Optical Genome Mapping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.31.587463. [PMID: 38915648 PMCID: PMC11195078 DOI: 10.1101/2024.03.31.587463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor's structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.
Collapse
|
5
|
Wang H, Li C, Yu X, Gao J. Deletion variants calling in third-generation sequencing data based on a dual-attention mechanism. Brief Bioinform 2024; 25:bbae269. [PMID: 38851298 PMCID: PMC11162298 DOI: 10.1093/bib/bbae269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/10/2024] Open
Abstract
Deletion is a crucial type of genomic structural variation and is associated with numerous genetic diseases. The advent of third-generation sequencing technology has facilitated the analysis of complex genomic structures and the elucidation of the mechanisms underlying phenotypic changes and disease onset due to genomic variants. Importantly, it has introduced innovative perspectives for deletion variants calling. Here we propose a method named Dual Attention Structural Variation (DASV) to analyze deletion structural variations in sequencing data. DASV converts gene alignment information into images and integrates them with genomic sequencing data through a dual attention mechanism. Subsequently, it employs a multi-scale network to precisely identify deletion regions. Compared with four widely used genome structural variation calling tools: cuteSV, SVIM, Sniffles and PBSV, the results demonstrate that DASV consistently achieves a balance between precision and recall, enhancing the F1 score across various datasets. The source code is available at https://github.com/deconvolution-w/DASV.
Collapse
Affiliation(s)
- Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Chang Li
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Xinyu Yu
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| |
Collapse
|
6
|
Yu Y, Gao R, Luo J. LcDel: deletion variation detection based on clustering and long reads. Front Genet 2024; 15:1404415. [PMID: 38798694 PMCID: PMC11116628 DOI: 10.3389/fgene.2024.1404415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results. Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
7
|
Chen Q, Wu B, Li C, Ding L, Huang S, Wang J, Zhao J. Deciphering male influence in gynogenetic Pengze crucian carp ( Carassius auratus var. pengsenensis): insights from Nanopore sequencing of structural variations. Front Genet 2024; 15:1392110. [PMID: 38784042 PMCID: PMC11111978 DOI: 10.3389/fgene.2024.1392110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 04/11/2024] [Indexed: 05/25/2024] Open
Abstract
In this study, we investigate gynogenetic reproduction in Pengze Crucian Carp (Carassius auratus var. pengsenensis) using third-generation Nanopore sequencing to uncover structural variations (SVs) in offspring. Our objective was to understand the role of male genetic material in gynogenesis by examining the genomes of both parents and their offspring. We discovered a notable number of male-specific structural variations (MSSVs): 1,195 to 1,709 MSSVs in homologous offspring, accounting for approximately 0.52%-0.60% of their detected SVs, and 236 to 350 MSSVs in heterologous offspring, making up about 0.10%-0.13%. These results highlight the significant influence of male genetic material on the genetic composition of offspring, particularly in homologous pairs, challenging the traditional view of asexual reproduction. The gene annotation of MSSVs revealed their presence in critical gene regions, indicating potential functional impacts. Specifically, we found 5 MSSVs in the exonic regions of protein-coding genes in homologous offspring, suggesting possible direct effects on protein structure and function. Validation of an MSSV in the exonic region of the polyunsaturated fatty acid 5-lipoxygenase gene confirmed male genetic material transmission in some offspring. This study underscores the importance of further research on the genetic diversity and gynogenesis mechanisms, providing valuable insights for reproductive biology, aquaculture, and fostering innovation in biological research and aquaculture practices.
Collapse
Affiliation(s)
- Qianhui Chen
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Biyu Wu
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Chao Li
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Liyun Ding
- Jiangxi Fisheries Research Institute, Nanchang, China
| | - Shiting Huang
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Junjie Wang
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Jun Zhao
- Guangzhou Key Laboratory of Subtropical Biodiversity and Biomonitoring, School of Life Sciences, South China Normal University, Guangzhou, China
| |
Collapse
|
8
|
Shi T, Zhang X, Hou Y, Jia C, Dan X, Zhang Y, Jiang Y, Lai Q, Feng J, Feng J, Ma T, Wu J, Liu S, Zhang L, Long Z, Chen L, Street NR, Ingvarsson PK, Liu J, Yin T, Wang J. The super-pangenome of Populus unveils genomic facets for its adaptation and diversification in widespread forest trees. MOLECULAR PLANT 2024; 17:725-746. [PMID: 38486452 DOI: 10.1016/j.molp.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/22/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Understanding the underlying mechanisms and links between genome evolution and adaptive innovations stands as a key goal in evolutionary studies. Poplars, among the world's most widely distributed and cultivated trees, exhibit extensive phenotypic diversity and environmental adaptability. In this study, we present a genus-level super-pangenome comprising 19 Populus genomes, revealing the likely pivotal role of private genes in facilitating local environmental and climate adaptation. Through the integration of pangenomes with transcriptomes, methylomes, and chromatin accessibility mapping, we unveil that the evolutionary trajectories of pangenes and duplicated genes are closely linked to local genomic landscapes of regulatory and epigenetic architectures, notably CG methylation in gene-body regions. Further comparative genomic analyses have enabled the identification of 142 202 structural variants across species that intersect with a significant number of genes and contribute substantially to both phenotypic and adaptive divergence. We have experimentally validated a ∼180-bp presence/absence variant affecting the expression of the CUC2 gene, crucial for leaf serration formation. Finally, we developed a user-friendly web-based tool encompassing the multi-omics resources associated with the Populus super-pangenome (http://www.populus-superpangenome.com). Together, the present pioneering super-pangenome resource in forest trees not only aids in the advancement of breeding efforts of this globally important tree genus but also offers valuable insights into potential avenues for comprehending tree biology.
Collapse
Affiliation(s)
- Tingting Shi
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Xinxin Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yukang Hou
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Changfu Jia
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Xuming Dan
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yulin Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yuanzhong Jiang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Qiang Lai
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jiajun Feng
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jianju Feng
- College of Horticulture and Forestry, Tarim University, Alar 843300, China
| | - Tao Ma
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jiali Wu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Shuyu Liu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Lei Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Zhiqin Long
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Liyang Chen
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Nathaniel R Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Västerbotten, Sweden
| | - Pär K Ingvarsson
- Linnean Centre for Plant Biology, Department of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jianquan Liu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.
| | - Tongming Yin
- The Key Laboratory of Tree Genetics and Biotechnology of Jiangsu Province and Education Department of China, Nanjing Forestry University, Nanjing, Jiangsu, China.
| | - Jing Wang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
9
|
Lindeboom TA, Sanchez Olmos MDC, Schulz K, Brinkmann CK, Ramírez Rojas AA, Hochrein L, Schindler D. An Optimized Genotyping Workflow for Identifying Highly SCRaMbLEd Synthetic Yeasts. ACS Synth Biol 2024; 13:1116-1127. [PMID: 38597458 PMCID: PMC11036488 DOI: 10.1021/acssynbio.3c00476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 03/01/2024] [Accepted: 03/25/2024] [Indexed: 04/11/2024]
Abstract
Synthetic Sc2.0 yeast strains contain hundreds to thousands of loxPsym recombination sites that allow restructuring of the Saccharomyces cerevisiae genome by SCRaMbLE. Thus, a highly diverse yeast population can arise from a single genotype. The selection of genetically diverse candidates with rearranged synthetic chromosomes for downstream analysis requires an efficient and straightforward workflow. Here we present loxTags, a set of qPCR primers for genotyping across loxPsym sites to detect not only deletions but also inversions and translocations after SCRaMbLE. To cope with the large number of amplicons, we generated qTagGer, a qPCR genotyping primer prediction tool. Using loxTag-based genotyping and long-read sequencing, we show that light-inducible Cre recombinase L-SCRaMbLE can efficiently generate diverse recombination events when applied to Sc2.0 strains containing a linear or a circular version of synthetic chromosome III.
Collapse
Affiliation(s)
- Timon A Lindeboom
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043 Marburg, Germany
| | | | - Karina Schulz
- Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany
| | - Cedric K Brinkmann
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043 Marburg, Germany
| | - Adán A Ramírez Rojas
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043 Marburg, Germany
| | - Lena Hochrein
- Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany
| | - Daniel Schindler
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043 Marburg, Germany
- Center for Synthetic Microbiology, Philipps-University Marburg, Karl-von-Frisch-Str. 14, 35032Marburg, Germany
| |
Collapse
|
10
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
11
|
Fleming A, Galey M, Briggs L, Edwards M, Hogg C, John S, Wilkinson S, Quinn E, Rai R, Burgoyne T, Rogers A, Patel MP, Griffin P, Muller S, Carr SB, Loebinger MR, Lucas JS, Shah A, Jose R, Mitchison HM, Shoemark A, Miller DE, Morris-Rosendahl DJ. Combined approaches, including long-read sequencing, address the diagnostic challenge of HYDIN in primary ciliary dyskinesia. Eur J Hum Genet 2024:10.1038/s41431-024-01599-7. [PMID: 38605126 DOI: 10.1038/s41431-024-01599-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/08/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024] Open
Abstract
Primary ciliary dyskinesia (PCD), a disorder of the motile cilia, is now recognised as an underdiagnosed cause of bronchiectasis. Accurate PCD diagnosis comprises clinical assessment, analysis of cilia and the identification of biallelic variants in one of 50 known PCD-related genes, including HYDIN. HYDIN-related PCD is underdiagnosed due to the presence of a pseudogene, HYDIN2, with 98% sequence homology to HYDIN. This presents a significant challenge for Short-Read Next Generation Sequencing (SR-NGS) and analysis, and many diagnostic PCD gene panels do not include HYDIN. We have used a combined approach of SR-NGS with bioinformatic masking of HYDIN2, and state-of-the-art long-read Nanopore sequencing (LR_NGS), together with analysis of respiratory cilia including transmission electron microscopy and immunofluorescence to address the underdiagnosis of HYDIN as a cause of PCD. Bioinformatic masking of HYDIN2 after SR-NGS facilitated the detection of biallelic HYDIN variants in 15 of 437 families, but compromised the detection of copy number variants. Supplementing testing with LR-NGS detected HYDIN deletions in 2 families, where SR-NGS had detected a single heterozygous HYDIN variant. LR-NGS was also able to confirm true homozygosity in 2 families when parental testing was not possible. Utilising a combined genomic diagnostic approach, biallelic HYDIN variants were detected in 17 families from 242 genetically confirmed PCD cases, comprising 7% of our PCD cohort. This represents the largest reported HYDIN cohort to date and highlights previous underdiagnosis of HYDIN-associated PCD. Moreover this provides further evidence for the utility of LR-NGS in diagnostic testing, particularly for regions of high genomic complexity.
Collapse
Affiliation(s)
- Andrew Fleming
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Miranda Galey
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington and Seattle Children's Hospital, Seattle, WA, 98105, USA
| | - Lizi Briggs
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Matthew Edwards
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Claire Hogg
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- National Heart and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Shibu John
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Sam Wilkinson
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Ellie Quinn
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Ranjit Rai
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Tom Burgoyne
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- Genetics and Genomic Medicine Department, University College London, UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
| | - Andy Rogers
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Mitali P Patel
- Genetics and Genomic Medicine Department, University College London, UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
- MRC Prion Unit at UCL, Institute of Prion Diseases, UCL, London, W1W 7FF, UK
| | - Paul Griffin
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Steven Muller
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Siobhan B Carr
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- National Heart and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Michael R Loebinger
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- National Heart and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Jane S Lucas
- Primary Ciliary Dyskinesia Centre, University Hospital Southampton NHS Foundation Trust, Southampton, SO16 6YD, UK
- Clinical and Experimental Sciences Academic Unit, University of Southampton Faculty of Medicine, Southampton, SO16 6YD, UK
| | - Anand Shah
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- MRC Centre of Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, W2 1PG, UK
| | - Ricardo Jose
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
| | - Hannah M Mitchison
- Genetics and Genomic Medicine Department, University College London, UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
- MRC Prion Unit at UCL, Institute of Prion Diseases, UCL, London, W1W 7FF, UK
| | - Amelia Shoemark
- Primary Ciliary Dyskinesia Centre, Royal Brompton and Harefield Clinical Group, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK
- Respiratory Research Group, Molecular and Cellular Medicine, University of Dundee, Dundee, DD1 9SY, UK
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington and Seattle Children's Hospital, Seattle, WA, 98105, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Deborah J Morris-Rosendahl
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, SW3 6NP, UK.
- National Heart and Lung Institute, Imperial College London, London, SW3 6LY, UK.
| |
Collapse
|
12
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
13
|
Mori T, Fujimaru T, Liu C, Patterson K, Yamamoto K, Suzuki T, Chiga M, Sekine A, Ubara Y, Miller DE, Zalusky MPG, Mandai S, Ando F, Mori Y, Kikuchi H, Susa K, Chong JX, Bamshad MJ, Tan YQ, Zhang F, Uchida S, Sohara E. CFAP47 is a novel causative gene implicated in X-linked polycystic kidney disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.05.24304760. [PMID: 38633811 PMCID: PMC11023651 DOI: 10.1101/2024.04.05.24304760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Autosomal dominant polycystic kidney disease (ADPKD) is a well-described condition in which ~80% of cases have a genetic explanation, while the genetic basis of sporadic cystic kidney disease in adults remains unclear in ~30% of cases. This study aimed to identify novel genes associated with polycystic kidney disease (PKD) in patients with sporadic cystic kidney disease in which a clear genetic change was not identified in established genes. A next-generation sequencing panel analyzed known genes related to renal cysts in 118 sporadic cases, followed by whole-genome sequencing on 47 unrelated individuals without identified candidate variants. Three male patients were found to have rare missense variants in the X-linked gene Cilia And Flagella Associated Protein 47 (CFAP47). CFAP47 was expressed in primary cilia of human renal tubules, and knockout mice exhibited vacuolation of tubular cells and tubular dilation, providing evidence that CFAP47 is a causative gene involved in cyst formation. This discovery of CFAP47 as a newly identified gene associated with PKD, displaying X-linked inheritance, emphasizes the need for further cases to understand the role of CFAP47 in PKD.
Collapse
Affiliation(s)
- Takayasu Mori
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Takuya Fujimaru
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Chunyu Liu
- Soong Ching Ling Institute of Maternal and Child Health, International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- State Key Laboratory of Genetic Engineering, Institute of Medical Genetics and Genomics, Fudan University, Shanghai, China
| | - Karynne Patterson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Kohei Yamamoto
- Department of Comprehensive Pathology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Takefumi Suzuki
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Motoko Chiga
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Akinari Sekine
- Department of Nephrology and Rheumatology, Toranomon Hospital, Japan
- Okinaka Memorial Institute for Medical Research, Toranomon Hospital, Tokyo, Japan
| | - Yoshifumi Ubara
- Department of Nephrology and Rheumatology, Toranomon Hospital, Japan
- Okinaka Memorial Institute for Medical Research, Toranomon Hospital, Tokyo, Japan
| | - Danny E Miller
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA, 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA, 98195, USA
| | - Miranda PG Zalusky
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA, 98195, USA
| | - Shintaro Mandai
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Fumiaki Ando
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yutaro Mori
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Hiroaki Kikuchi
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Koichiro Susa
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | | | - Jessica X. Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA, 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA, 98195, USA
| | - Michael J. Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA, 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA, 98195, USA
| | - Yue-Qiu Tan
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha, China
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, China
| | - Feng Zhang
- Soong Ching Ling Institute of Maternal and Child Health, International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- State Key Laboratory of Genetic Engineering, Institute of Medical Genetics and Genomics, Fudan University, Shanghai, China
| | - Shinichi Uchida
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Eisei Sohara
- Department of Nephrology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
14
|
Sedeek K, Mohammed N, Zhou Y, Zuccolo A, Sanikommu K, Kantharajappa S, Al-Bader N, Tashkandi M, Wing RA, Mahfouz MM. Multitrait engineering of Hassawi red rice for sustainable cultivation. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024; 341:112018. [PMID: 38325660 DOI: 10.1016/j.plantsci.2024.112018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 01/15/2024] [Accepted: 01/31/2024] [Indexed: 02/09/2024]
Abstract
Sustainable agriculture requires locally adapted varieties that produce nutritious food with limited agricultural inputs. Genome engineering represents a viable approach to develop cultivars that fulfill these criteria. For example, the red Hassawi rice, a native landrace of Saudi Arabia, tolerates local drought and high-salinity conditions and produces grain with diverse health-promoting phytochemicals. However, Hassawi has a long growth cycle, high cultivation costs, low productivity, and susceptibility to lodging. Here, to improve these undesirable traits via genome editing, we established efficient regeneration and Agrobacterium-mediated transformation protocols for Hassawi. In addition, we generated the first high-quality reference genome and targeted the key flowering repressor gene, Hd4, thus shortening the plant's lifecycle and height. Using CRISPR/Cas9 multiplexing, we simultaneously disrupted negative regulators of flowering time (Hd2, Hd4, and Hd5), grain size (GS3), grain number (GN1a), and plant height (Sd1). The resulting homozygous mutant lines flowered extremely early (∼56 days) and had shorter stems (approximately 107 cm), longer grains (by 5.1%), and more grains per plant (by 50.2%), thereby enhancing overall productivity. Furthermore, the awns of grains were 86.4% shorter compared to unedited plants. Moreover, the modified rice grain displayed improved nutritional attributes. As a result, the modified Hassawi rice combines several desirable traits that can incentivize large-scale cultivation and reduce malnutrition.
Collapse
Affiliation(s)
- Khalid Sedeek
- Laboratory for Genome Engineering and Synthetic Biology, Division of Biological Sciences, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Nahed Mohammed
- Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Yong Zhou
- Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Andrea Zuccolo
- Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Crop Science Research Center, Sant'Anna School of Advanced Studies, Piazza Martiri della Libertà 33, 56127 Pisa, Italy
| | - Krishnaveni Sanikommu
- Laboratory for Genome Engineering and Synthetic Biology, Division of Biological Sciences, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Sunitha Kantharajappa
- Laboratory for Genome Engineering and Synthetic Biology, Division of Biological Sciences, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Noor Al-Bader
- Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Manal Tashkandi
- Department of Biological Science, College of Science, University of Jeddah, Jeddah, Saudi Arabia
| | - Rod A Wing
- Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ, USA; International Rice Research Institute (IRRI), Strategic Innovation, Los Baños, 4031 Laguna, Philippines
| | - Magdy M Mahfouz
- Laboratory for Genome Engineering and Synthetic Biology, Division of Biological Sciences, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Center for Desert Agriculture, Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
| |
Collapse
|
15
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
16
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
17
|
Lesack KJ, Wasmuth JD. The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data. PeerJ 2024; 12:e17101. [PMID: 38500526 PMCID: PMC10946394 DOI: 10.7717/peerj.17101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 02/21/2024] [Indexed: 03/20/2024] Open
Abstract
Background Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization. Conclusion The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.
Collapse
Affiliation(s)
- Kyle J. Lesack
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
18
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
19
|
Carpinteyro-Ponce J, Machado CA. The Complex Landscape of Structural Divergence Between the Drosophila pseudoobscura and D. persimilis Genomes. Genome Biol Evol 2024; 16:evae047. [PMID: 38482945 PMCID: PMC10980976 DOI: 10.1093/gbe/evae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 04/01/2024] Open
Abstract
Structural genomic variants are key drivers of phenotypic evolution. They can span hundreds to millions of base pairs and can thus affect large numbers of genetic elements. Although structural variation is quite common within and between species, its characterization depends upon the quality of genome assemblies and the proportion of repetitive elements. Using new high-quality genome assemblies, we report a complex and previously hidden landscape of structural divergence between the genomes of Drosophila persimilis and D. pseudoobscura, two classic species in speciation research, and study the relationships among structural variants, transposable elements, and gene expression divergence. The new assemblies confirm the already known fixed inversion differences between these species. Consistent with previous studies showing higher levels of nucleotide divergence between fixed inversions relative to collinear regions of the genome, we also find a significant overrepresentation of INDELs inside the inversions. We find that transposable elements accumulate in regions with low levels of recombination, and spatial correlation analyses reveal a strong association between transposable elements and structural variants. We also report a strong association between differentially expressed (DE) genes and structural variants and an overrepresentation of DE genes inside the fixed chromosomal inversions that separate this species pair. Interestingly, species-specific structural variants are overrepresented in DE genes involved in neural development, spermatogenesis, and oocyte-to-embryo transition. Overall, our results highlight the association of transposable elements with structural variants and their importance in driving evolutionary divergence.
Collapse
Affiliation(s)
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
20
|
Lecomte L, Árnyasi M, Ferchaud A, Kent M, Lien S, Stenløkk K, Sylvestre F, Bernatchez L, Mérot C. Investigating structural variant, indel and single nucleotide polymorphism differentiation between locally adapted Atlantic salmon populations. Evol Appl 2024; 17:e13653. [PMID: 38495945 PMCID: PMC10940791 DOI: 10.1111/eva.13653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/14/2023] [Accepted: 01/13/2024] [Indexed: 03/19/2024] Open
Abstract
Genomic structural variants (SVs) are now recognized as an integral component of intraspecific polymorphism and are known to contribute to evolutionary processes in various organisms. However, they are inherently difficult to detect and genotype from readily available short-read sequencing data, and therefore remain poorly documented in wild populations. Salmonid species displaying strong interpopulation variability in both life history traits and habitat characteristics, such as Atlantic salmon (Salmo salar), offer a prime context for studying adaptive polymorphism, but the contribution of SVs to fine-scale local adaptation has yet to be explored. Here, we performed a comparative analysis of SVs, single nucleotide polymorphisms (SNPs) and small indels (<50 bp) segregating in the Romaine and Puyjalon salmon, two putatively locally adapted populations inhabiting neighboring rivers (Québec, Canada) and showing pronounced variation in life history traits, namely growth, fecundity, and age at maturity and smoltification. We first catalogued polymorphism using a hybrid SV characterization approach pairing both short- (16X) and long-read sequencing (20X) for variant discovery with graph-based genotyping of SVs across 60 salmon genomes, along with characterization of SNPs and small indels from short reads. We thus identified 115,907 SVs, 8,777,832 SNPs and 1,089,321 short indels, with SVs covering 4.8 times more base pairs than SNPs. All three variant types revealed a highly congruent population structure and similar patterns of F ST and density variation along the genome. Finally, we performed outlier detection and redundancy analysis (RDA) to identify variants of interest in the putative local adaptation of Romaine and Puyjalon salmon. Genes located near these variants were enriched for biological processes related to nervous system function, suggesting that observed variation in traits such as age at smoltification could arise from differences in neural development. This study therefore demonstrates the feasibility of large-scale SV characterization and highlights its relevance for salmonid population genomics.
Collapse
Affiliation(s)
- Laurie Lecomte
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Mariann Árnyasi
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Anne‐Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
- Present address:
Parks Canada, Office of the Chief Ecosystem ScientistQuébecQCCanada
| | - Matthew Kent
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Sigbjørn Lien
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Kristina Stenløkk
- Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Centre for Integrative Genetics (CIGENE)Norwegian University of Life Sciences (NMBU)ÅsNorway
| | - Florent Sylvestre
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
| | - Claire Mérot
- Institut de Biologie Intégrative et des Systèmes (IBIS)Université LavalQuébecCanada
- Département de BiologieUniversité LavalQuébecCanada
- Present address:
UMR 6553 Ecobio, OSUR, CNRSUniversité de RennesRennesFrance
| |
Collapse
|
21
|
Cui X, Lin Q, Chen M, Wang Y, Wang Y, Wang Y, Tao J, Yin H, Zhao T. Long-read sequencing unveils novel somatic variants and methylation patterns in the genetic information system of early lung cancer. Comput Biol Med 2024; 171:108174. [PMID: 38442557 DOI: 10.1016/j.compbiomed.2024.108174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 01/25/2024] [Accepted: 02/18/2024] [Indexed: 03/07/2024]
Abstract
Lung cancer poses a global health challenge, necessitating advanced diagnostics for improved outcomes. Intensive efforts are ongoing to pinpoint early detection biomarkers, such as genomic variations and DNA methylation, to elevate diagnostic precision. We conducted long-read sequencing on cancerous and adjacent non-cancerous tissues from a patient with lung adenocarcinoma. We identified somatic structural variations (SVs) specific to lung cancer by integrating data from various SV calling methods and differentially methylated regions (DMRs) that were distinct between these two tissue samples, revealing a unique methylation pattern associated with lung cancer. This study discovered over 40,000 somatic SVs and over 180,000 DMRs linked to lung cancer. We identified approximately 700 genes of significant relevance through comprehensive analysis, including genes intricately associated with many lung cancers, such as NOTCH1, SMOC2, CSMD2, and others. Furthermore, we observed that somatic SVs and DMRs were substantially enriched in several pathways, such as axon guidance signaling pathways, which suggests a comprehensive multi-omics impact on lung cancer progression across various biological investigation levels. These datasets can potentially serve as biomarkers for early lung cancer detection and may hold significant value in clinical diagnosis and treatment applications.
Collapse
Affiliation(s)
- Xinran Cui
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Qingyan Lin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Ming Chen
- Institute of Bioinformatics, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Yidan Wang
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Yiwen Wang
- Tanwei College, Tsinghua University, Shuangqing Road, Beijing, 100084, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Jiang Tao
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Honglei Yin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China.
| | - Tianyi Zhao
- School of Medicine, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| |
Collapse
|
22
|
Song B, Buckler ES, Stitzer MC. New whole-genome alignment tools are needed for tapping into plant diversity. TRENDS IN PLANT SCIENCE 2024; 29:355-369. [PMID: 37749022 DOI: 10.1016/j.tplants.2023.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/19/2023] [Accepted: 08/23/2023] [Indexed: 09/27/2023]
Abstract
Genome alignment is one of the most foundational methods for genome sequence studies. With rapid advances in sequencing and assembly technologies, these newly assembled genomes present challenges for alignment tools to meet the increased complexity and scale. Plant genome alignment is technologically challenging because of frequent whole-genome duplications (WGDs) as well as chromosome rearrangements and fractionation, high nucleotide diversity, widespread structural variation, and high transposable element (TE) activity causing large proportions of repeat elements. We summarize classical pairwise and multiple genome alignment (MGA) methods, and highlight techniques that are widely used or are being developed by the plant research community. We also outline the remaining challenges for precise genome alignment and the interpretation of alignment results in plants.
Collapse
Affiliation(s)
- Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong 261325, China; Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA; Agricultural Research Service, United States Department of Agriculture, Ithaca, NY 14853, USA
| | - Michelle C Stitzer
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
23
|
Liu X, Zheng J, Ding J, Wu J, Zuo F, Zhang G. When Livestock Genomes Meet Third-Generation Sequencing Technology: From Opportunities to Applications. Genes (Basel) 2024; 15:245. [PMID: 38397234 PMCID: PMC10888458 DOI: 10.3390/genes15020245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/30/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
Third-generation sequencing technology has found widespread application in the genomic, transcriptomic, and epigenetic research of both human and livestock genetics. This technology offers significant advantages in the sequencing of complex genomic regions, the identification of intricate structural variations, and the production of high-quality genomes. Its attributes, including long sequencing reads, obviation of PCR amplification, and direct determination of DNA/RNA, contribute to its efficacy. This review presents a comprehensive overview of third-generation sequencing technologies, exemplified by single-molecule real-time sequencing (SMRT) and Oxford Nanopore Technology (ONT). Emphasizing the research advancements in livestock genomics, the review delves into genome assembly, structural variation detection, transcriptome sequencing, and epigenetic investigations enabled by third-generation sequencing. A comprehensive analysis is conducted on the application and potential challenges of third-generation sequencing technology for genome detection in livestock. Beyond providing valuable insights into genome structure analysis and the identification of rare genes in livestock, the review ventures into an exploration of the genetic mechanisms underpinning exemplary traits. This review not only contributes to our understanding of the genomic landscape in livestock but also provides fresh perspectives for the advancement of research in this domain.
Collapse
Affiliation(s)
- Xinyue Liu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Junyuan Zheng
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jialan Ding
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jiaxin Wu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Fuyuan Zuo
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| | - Gongwei Zhang
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| |
Collapse
|
24
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
25
|
Zheng Z, Zhu M, Zhang J, Liu X, Hou L, Liu W, Yuan S, Luo C, Yao X, Liu J, Yang Y. A sequence-aware merger of genomic structural variations at population scale. Nat Commun 2024; 15:960. [PMID: 38307885 PMCID: PMC10837428 DOI: 10.1038/s41467-024-45244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open
Abstract
Merging structural variations (SVs) at the population level presents a significant challenge, yet it is essential for conducting comprehensive genotypic analyses, especially in the era of pangenomics. Here, we introduce PanPop, a tool that utilizes an advanced sequence-aware SV merging algorithm to efficiently merge SVs of various types. We demonstrate that PanPop can merge and optimize the majority of multiallelic SVs into informative biallelic variants. We show its superior precision and lower rates of missing data compared to alternative software solutions. Our approach not only enables the filtering of SVs by leveraging multiple SV callers for enhanced accuracy but also facilitates the accurate merging of large-scale population SVs. These capabilities of PanPop will help to accelerate future SV-related studies.
Collapse
Affiliation(s)
- Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Mingjia Zhu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jin Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinfeng Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Liqiang Hou
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Shuai Yuan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Changhong Luo
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinhao Yao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| |
Collapse
|
26
|
Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics 2024; 40:btae066. [PMID: 38366603 PMCID: PMC10898327 DOI: 10.1093/bioinformatics/btae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 01/16/2024] [Accepted: 02/14/2024] [Indexed: 02/18/2024] Open
Abstract
MOTIVATION Genomic variations comprise a spectrum of alterations, ranging from single nucleotide polymorphisms (SNPs) to large-scale structural variants (SVs), which play crucial roles in bacterial evolution and species diversification. Accurately identifying SNPs and SVs is beneficial for subsequent evolutionary and epidemiological studies. This study presents VariantDetective (VD), a novel, user-friendly, and all-in-one pipeline combining SNP and SV calling to generate consensus genomic variants using multiple tools. RESULTS The VD pipeline accepts various file types as input to initiate SNP and/or SV calling, and benchmarking results demonstrate VD's robustness and high accuracy across multiple tested datasets when compared to existing variant calling approaches. AVAILABILITY AND IMPLEMENTATION The source code, test data, and relevant information for VD are freely accessible at https://github.com/OLF-Bioinformatics/VariantDetective under the MIT License.
Collapse
Affiliation(s)
- Philippe Charron
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| | - Mingsong Kang
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| |
Collapse
|
27
|
Zhang Z, Jiang T, Li G, Cao S, Liu Y, Liu B, Wang Y. Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data. Brief Bioinform 2024; 25:bbae049. [PMID: 38385878 PMCID: PMC10883419 DOI: 10.1093/bib/bbae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.
Collapse
Affiliation(s)
- Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Gaoyang Li
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
28
|
Zheng Y, Shang X. SVvalidation: A long-read-based validation method for genomic structural variation. PLoS One 2024; 19:e0291741. [PMID: 38181020 PMCID: PMC10769053 DOI: 10.1371/journal.pone.0291741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/05/2023] [Indexed: 01/07/2024] Open
Abstract
Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
29
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024:10.1038/s41587-023-02024-y. [PMID: 38168980 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
30
|
Gaitán N, Duitama J. A graph clustering algorithm for detection and genotyping of structural variants from long reads. Gigascience 2024; 13:giad112. [PMID: 38206589 PMCID: PMC10783151 DOI: 10.1093/gigascience/giad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 08/02/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed. FINDINGS We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths. CONCLUSION The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.
Collapse
Affiliation(s)
- Nicolás Gaitán
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| |
Collapse
|
31
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
32
|
Wolff K, Friedhoff R, Schwarzer F, Pucker B. Data literacy in genome research. J Integr Bioinform 2023; 20:jib-2023-0033. [PMID: 38047760 PMCID: PMC10777367 DOI: 10.1515/jib-2023-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/15/2023] [Indexed: 12/05/2023] Open
Abstract
With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.
Collapse
Affiliation(s)
- Katharina Wolff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Ronja Friedhoff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Friderieke Schwarzer
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| |
Collapse
|
33
|
Choo ZN, Behr JM, Deshpande A, Hadi K, Yao X, Tian H, Takai K, Zakusilo G, Rosiene J, Da Cruz Paula A, Weigelt B, Setton J, Riaz N, Powell SN, Busam K, Shoushtari AN, Ariyan C, Reis-Filho J, de Lange T, Imieliński M. Most large structural variants in cancer genomes can be detected without long reads. Nat Genet 2023; 55:2139-2148. [PMID: 37945902 PMCID: PMC10703688 DOI: 10.1038/s41588-023-01540-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 09/19/2023] [Indexed: 11/12/2023]
Abstract
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
Collapse
Affiliation(s)
- Zi-Ning Choo
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional MD PhD Program, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Julie M Behr
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Aditya Deshpande
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Kevin Hadi
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Physiology and Biophysics PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Xiaotong Yao
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Tri-institutional PhD Program in Computational Biology and Medicine, New York, NY, USA
| | - Huasong Tian
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Kaori Takai
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - George Zakusilo
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Joel Rosiene
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | | | - Britta Weigelt
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeremy Setton
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nadeem Riaz
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simon N Powell
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Klaus Busam
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Titia de Lange
- Laboratory of Cell Biology and Genetics, Rockefeller University, New York, NY, USA
| | - Marcin Imieliński
- New York Genome Center, New York, NY, USA.
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
34
|
Chen S, Wang P, Kong W, Chai K, Zhang S, Yu J, Wang Y, Jiang M, Lei W, Chen X, Wang W, Gao Y, Qu S, Wang F, Wang Y, Zhang Q, Gu M, Fang K, Ma C, Sun W, Ye N, Wu H, Zhang X. Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis. NATURE PLANTS 2023; 9:1986-1999. [PMID: 38012346 DOI: 10.1038/s41477-023-01565-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 10/20/2023] [Indexed: 11/29/2023]
Abstract
Tea is one of the world's oldest crops and is cultivated to produce beverages with various flavours. Despite advances in sequencing technologies, the genetic mechanisms underlying key agronomic traits of tea remain unclear. In this study, we present a high-quality pangenome of 22 elite cultivars, representing broad genetic diversity in the species. Our analysis reveals that a recent long terminal repeat burst contributed nearly 20% of gene copies, introducing functional genetic variants that affect phenotypes such as leaf colour. Our graphical pangenome improves the efficiency of genome-wide association studies and allows the identification of key genes controlling bud flush timing. We also identified strong correlations between allelic variants and flavour-related chemistries. These findings deepen our understanding of the genetic basis of tea quality and provide valuable genomic resources to facilitate its genomics-assisted breeding.
Collapse
Affiliation(s)
- Shuai Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Pengjie Wang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Weilong Kong
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Kun Chai
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shengcheng Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jiaxin Yu
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yibin Wang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Mengwei Jiang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Wenlong Lei
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xiao Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wenling Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yingying Gao
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Shenyang Qu
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Fang Wang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yinghao Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Qing Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Mengya Gu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Kaixing Fang
- Tea Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Provincial Key Laboratory of Tea Plant Resources Innovation and Utilization, Guangzhou, China
| | - Chunlei Ma
- Key Laboratory of Biology, Genetics and Breeding of Special Economic Animals and Plants, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, Hangzhou, China
| | - Weijiang Sun
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Naixing Ye
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou, China.
| | - Hualing Wu
- Tea Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Provincial Key Laboratory of Tea Plant Resources Innovation and Utilization, Guangzhou, China.
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
35
|
Wei ZG, Bu PY, Zhang XD, Liu F, Qian Y, Wu FX. invMap: a sensitive mapping tool for long noisy reads with inversion structural variants. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:btad726. [PMID: 38058196 DOI: 10.1093/bioinformatics/btad726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 12/08/2023]
Abstract
MOTIVATION Longer reads produced by PacBio or Oxford Nanopore sequencers could more frequently span the breakpoints of structural variations (SVs) than shorter reads. Therefore, existing long-read mapping methods often generate wrong alignments and variant calls. Compared to deletions and insertions, inversion events are more difficult to be detected since the anchors in inversion regions are nonlinear to those in SV-free regions. To address this issue, this study presents a novel long-read mapping algorithm (named as invMap). RESULTS For each long noisy read, invMap first locates the aligned region with a specifically designed scoring method for chaining, then checks the remaining anchors in the aligned region to discover potential inversions. We benchmark invMap on simulated datasets across different genomes and sequencing coverages, experimental results demonstrate that invMap is more accurate to locate aligned regions and call SVs for inversions than the competing methods. The real human genome sequencing dataset of NA12878 illustrates that invMap can effectively find more candidate variant calls for inversions than the competing methods. AVAILABILITY AND IMPLEMENTATION The invMap software is available at https://github.com/zhang134/invMap.git.
Collapse
Affiliation(s)
- Ze-Gang Wei
- School of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji 721016, China
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Peng-Yu Bu
- School of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji 721016, China
| | - Xiao-Dan Zhang
- School of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji 721016, China
| | - Fei Liu
- School of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji 721016, China
| | - Yu Qian
- School of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji 721016, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
36
|
Magi A, Mattei G, Mingrino A, Caprioli C, Ronchini C, Frigè G, Semeraro R, Baragli M, Bolognini D, Colombo E, Mazzarella L, Pelicci PG. GASOLINE: detecting germline and somatic structural variants from long-reads data. Sci Rep 2023; 13:20817. [PMID: 38012350 PMCID: PMC10682169 DOI: 10.1038/s41598-023-48285-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4-5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, 50100, Florence, Italy.
- Institute for Biomedical Technologies, National Research Council, Segrate, Milan, Italy.
| | - Gianluca Mattei
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Chiara Caprioli
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Chiara Ronchini
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Marta Baragli
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Emanuela Colombo
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy.
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.
| |
Collapse
|
37
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
38
|
Huang Q, Mitsiades I, Dowst H, Zarrin-Khameh N, Noor AB, Castro P, Scheurer ME, Godoy G, Mims MP, Mitsiades N. Incidental detection of FGFR3 fusion via liquid biopsy leading to earlier diagnosis of urothelial carcinoma. NPJ Precis Oncol 2023; 7:123. [PMID: 37980380 PMCID: PMC10657397 DOI: 10.1038/s41698-023-00467-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 10/13/2023] [Indexed: 11/20/2023] Open
Abstract
The rising utilization of circulating tumor DNA (ctDNA) assays in Precision Oncology may incidentally detect genetic material from secondary sources. It is important that such findings are recognized and properly leveraged for both diagnosis and monitoring of response to treatment. Here, we report a patient in whom serial cell-free DNA (cfDNA) monitoring for his known prostate adenocarcinoma uncovered the emergence of an unexpected FGFR3-TACC3 gene fusion, a BRCA1 frameshift mutation, and other molecular abnormalities. Due to the rarity of FGFR3 fusions in prostate cancer, a workup for a second primary cancer was performed, leading to the diagnosis of an otherwise-asymptomatic urothelial carcinoma (UC). Once UC-directed treatment was initiated, the presence of these genetic abnormalities in cfDNA allowed for disease monitoring and early detection of resistance, well before radiographic progression. These findings also uncovered opportunities for targeted therapies against FGFR and BRCA1. Overall, this report highlights the multifaceted utility of longitudinal ctDNA monitoring in early cancer diagnosis, disease prognostication, therapeutic target identification, monitoring of treatment response, and early detection of emergence of resistance.
Collapse
Affiliation(s)
- Quillan Huang
- Dept. of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA
- Ben Taub General Hospital, Harris Health System, Houston, TX, 77030, USA
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
| | - Irene Mitsiades
- Harvard Medical School, Boston, MA, 02115, USA
- Boston University School of Arts and Sciences, Boston, MA, 02215, USA
| | - Heidi Dowst
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
| | - Neda Zarrin-Khameh
- Ben Taub General Hospital, Harris Health System, Houston, TX, 77030, USA
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
- Dept. of Pathology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Attiya Batool Noor
- Ben Taub General Hospital, Harris Health System, Houston, TX, 77030, USA
| | - Patricia Castro
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
- Dept. of Pathology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Michael E Scheurer
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
- Dept. of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Guilherme Godoy
- Ben Taub General Hospital, Harris Health System, Houston, TX, 77030, USA
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
- Dept. of Urology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Martha P Mims
- Dept. of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA
- Ben Taub General Hospital, Harris Health System, Houston, TX, 77030, USA
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, 77030, USA
| | - Nicholas Mitsiades
- Department of Internal Medicine, UC Davis Comprehensive Cancer Center, Sacramento, CA, 95817, USA.
| |
Collapse
|
39
|
Cuenca-Guardiola J, Morena-Barrio BDL, Navarro-Manzano E, Stevens J, Ouwehand WH, Gleadall NS, Corral J, Fernández-Breis JT. Detection and annotation of transposable element insertions and deletions on the human genome using nanopore sequencing. iScience 2023; 26:108214. [PMID: 37953943 PMCID: PMC10638045 DOI: 10.1016/j.isci.2023.108214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/28/2023] [Accepted: 10/11/2023] [Indexed: 11/14/2023] Open
Abstract
Repetitive sequences represent about 45% of the human genome. Some are transposable elements (TEs) with the ability to change their position in the genome, creating genetic variability both as insertions or deletions, with potential pathogenic consequences. We used long-read nanopore sequencing to identify TE variants in the genomes of 24 patients with antithrombin deficiency. We identified 7 344 TE insertions and 3 056 TE deletions, 2 926 were not previously described in publicly available databases. The insertions affected 3 955 genes, with 6 insertions located in exons, 3 929 in introns, and 147 in promoters. Potential functional impact was evaluated with gene annotation and enrichment analysis, which suggested a strong relationship with neuron-related functions and autism. We conclude that this study encourages the generation of a complete map of TEs in the human genome, which will be useful for identifying new TEs involved in genetic disorders.
Collapse
Affiliation(s)
- Javier Cuenca-Guardiola
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Pascual Parrilla, Facultad de Informática, Campus de Espinardo, Murcia 30100, Spain
| | - Belén de la Morena-Barrio
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Esther Navarro-Manzano
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Jonathan Stevens
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
| | - Willem H Ouwehand
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
- British Heart Foundation Cambridge Centre of Excellence, Division of Cardiovascular Medicine, Cambridge Heart and Lung Research Institute, Cambridge Biomedical Campus, Cambridge, England CB2 0AY, UK
- University College London Hospitals, NHS Foundation Trust, London, England, UK
| | - Nicholas S Gleadall
- Department of Haematology, University of Cambridge, CB2 0PT, Cambridge Biomedical Campus, Cambridge, Cambridge, England, UK
- Blood and Transplant, National Health Service (NHS), CB2 0QQ, Cambridge Biomedical Campus, Cambridge, England, UK
| | - Javier Corral
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, IMIB-Pascual Parrilla, CIBERER-III, Ronda de Garay S/N, Murcia 30003, Spain
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Pascual Parrilla, Facultad de Informática, Campus de Espinardo, Murcia 30100, Spain
| |
Collapse
|
40
|
LaFlamme CW, Rastin C, Sengupta S, Pennington HE, Russ-Hall SJ, Schneider AL, Bonkowski ES, Almanza Fuerte EP, Galey M, Goffena J, Gibson SB, Allan TJ, Nyaga DM, Lieffering N, Hebbar M, Walker EV, Darnell D, Olsen SR, Kolekar P, Djekidel N, Rosikiewicz W, McConkey H, Kerkhof J, Levy MA, Relator R, Lev D, Lerman-Sagie T, Park KL, Alders M, Cappuccio G, Chatron N, Demain L, Genevieve D, Lesca G, Roscioli T, Sanlaville D, Tedder ML, Hubshman MW, Ketkar S, Dai H, Worley KC, Rosenfeld JA, Chao HT, Neale G, Carvill GL, Wang Z, Berkovic SF, Sadleir LG, Miller DE, Scheffer IE, Sadikovic B, Mefford HC. Diagnostic Utility of Genome-wide DNA Methylation Analysis in Genetically Unsolved Developmental and Epileptic Encephalopathies and Refinement of a CHD2 Episignature. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.11.23296741. [PMID: 37873138 PMCID: PMC10592992 DOI: 10.1101/2023.10.11.23296741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Sequence-based genetic testing currently identifies causative genetic variants in ∼50% of individuals with developmental and epileptic encephalopathies (DEEs). Aberrant changes in DNA methylation are implicated in various neurodevelopmental disorders but remain unstudied in DEEs. Rare epigenetic variations ("epivariants") can drive disease by modulating gene expression at single loci, whereas genome-wide DNA methylation changes can result in distinct "episignature" biomarkers for monogenic disorders in a growing number of rare diseases. Here, we interrogate the diagnostic utility of genome-wide DNA methylation array analysis on peripheral blood samples from 516 individuals with genetically unsolved DEEs who had previously undergone extensive genetic testing. We identified rare differentially methylated regions (DMRs) and explanatory episignatures to discover causative and candidate genetic etiologies in 10 individuals. We then used long-read sequencing to identify DNA variants underlying rare DMRs, including one balanced translocation, three CG-rich repeat expansions, and two copy number variants. We also identify pathogenic sequence variants associated with episignatures; some had been missed by previous exome sequencing. Although most DEE genes lack known episignatures, the increase in diagnostic yield for DNA methylation analysis in DEEs is comparable to the added yield of genome sequencing. Finally, we refine an episignature for CHD2 using an 850K methylation array which was further refined at higher CpG resolution using bisulfite sequencing to investigate potential insights into CHD2 pathophysiology. Our study demonstrates the diagnostic yield of genome-wide DNA methylation analysis to identify causal and candidate genetic causes as ∼2% (10/516) for unsolved DEE cases.
Collapse
|
41
|
Hwang HY, Wang J. Effect of recombination on genetic diversity of Caenorhabditis elegans. Sci Rep 2023; 13:16425. [PMID: 37777524 PMCID: PMC10542817 DOI: 10.1038/s41598-023-42600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/12/2023] [Indexed: 10/02/2023] Open
Abstract
Greater molecular divergence and genetic diversity are present in regions of high recombination in many species. Studies describing the correlation between variant abundance and recombination rate have long focused on recombination in the context of linked selection models, whereby interference between linked sites under positive or negative selection reduces genetic diversity in regions of low recombination. Here, we show that indels, especially those of intermediate sizes, are enriched relative to single nucleotide polymorphisms in regions of high recombination in C. elegans. To explain this phenomenon, we reintroduce an alternative model that emphasizes the mutagenic effect of recombination. To extend the analysis, we examine the variants with a phylogenetic context and discuss how different models could be examined together. The number of variants generated by recombination in natural populations could be substantial including possibly the majority of some indel subtypes. Our work highlights the potential importance of a mutagenic effect of recombination, which could have a significant role in the shaping of natural genetic diversity.
Collapse
Affiliation(s)
- Ho-Yon Hwang
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Department of Neuroscience, School of Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Jiou Wang
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Department of Neuroscience, School of Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
42
|
Jiang T, Liu S, Guo H. Reply: Correspondence on NanoVar's performance outlined by Jiang T. et al. in 'Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation'. BMC Bioinformatics 2023; 24:352. [PMID: 37730581 PMCID: PMC10510213 DOI: 10.1186/s12859-023-05483-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/13/2023] [Indexed: 09/22/2023] Open
Abstract
We published a paper in BMC Bioinformatics comprehensively evaluating the performance of structural variation (SV) calling with long-read SV detection methods based on simulated error-prone long-read data under various sequencing settings. Recently, C.Y.T. et al. wrote a correspondence claiming that the performance of NanoVar was underestimated in our benchmarking and listed some errors in our previous manuscripts. To clarify these matters, we reproduced our previous benchmarking results and carried out a series of parallel experiments on both the newly generated simulated datasets and the ones provided by C.Y.T. et al. The robust benchmark results indicate that NanoVar has unstable performance on simulated data produced from different versions of VISOR, while other tools do not exhibit this phenomenon. Furthermore, the errors proposed by C.Y.T. et al. were due to them using another version of VISOR and Sniffles, which caused many changes in usage and results compared to the versions applied in our previous work. We hope that this commentary proves the validity of our previous publication, clarifies and eliminates the misunderstanding about the commands and results in our benchmarking. Furthermore, we welcome more experts and scholars in the scientific community to pay attention to our research and help us better optimize these valuable works.
Collapse
Affiliation(s)
- Tao Jiang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Shiqi Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Hongzhe Guo
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
43
|
Xia X, Zhang F, Li S, Luo X, Peng L, Dong Z, Pausch H, Leonard AS, Crysnanto D, Wang S, Tong B, Lenstra JA, Han J, Li F, Xu T, Gu L, Jin L, Dang R, Huang Y, Lan X, Ren G, Wang Y, Gao Y, Ma Z, Cheng H, Ma Y, Chen H, Pang W, Lei C, Chen N. Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment. Genome Biol 2023; 24:211. [PMID: 37723525 PMCID: PMC10507960 DOI: 10.1186/s13059-023-03052-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/07/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds in East Asia are of taurine and indicine origins that have evolved to adapt to cold and hot environments, respectively. However, few studies have investigated SVs in East Asian cattle genomes and their roles in environmental adaptation, and little is known about adaptively introgressed SVs in East Asian cattle. RESULTS In this study, we examine the roles of SVs in the climate adaptation of these two cattle lineages by generating highly contiguous chromosome-scale genome assemblies. Comparison of the two assemblies along with 18 Mongolian and Hainan cattle genomes obtained by long-read sequencing data provides a catalog of 123,898 nonredundant SVs. Several SVs detected from long reads are in exons of genes associated with epidermal differentiation, skin barrier, and bovine tuberculosis resistance. Functional investigations show that a 108-bp exonic insertion in SPN may affect the uptake of Mycobacterium tuberculosis by macrophages, which might contribute to the low susceptibility of Hainan cattle to bovine tuberculosis. Genotyping of 373 whole genomes from 39 breeds identifies 2610 SVs that are differentiated along a "north-south" gradient in China and overlap with 862 related genes that are enriched in pathways related to environmental adaptation. We identify 1457 Chinese indicine-stratified SVs that possibly originate from banteng and are frequent in Chinese indicine cattle. CONCLUSIONS Our findings highlight the unique contribution of SVs in East Asian cattle to environmental adaptation and disease resistance.
Collapse
Affiliation(s)
- Xiaoting Xia
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Fengwei Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Shuang Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Xiaoyu Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Lixin Peng
- National Engineering Research Center for Non-Food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, China
| | - Zheng Dong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Alexander S Leonard
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Danang Crysnanto
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Shikang Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Bin Tong
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Jianlin Han
- Livestock Genetics Program, International Livestock Research Institute (ILRI), Nairobi, Kenya
- CAAS-ILRI Joint Laboratory On Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agriculture Sciences (CAAS), Beijing, China
| | - Fuyong Li
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Tieshan Xu
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China
| | - Lihong Gu
- Institute of Animal Science & Veterinary Medicine, Hainan Academy of Agricultural Sciences, Haikou, China
| | - Liangliang Jin
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Ruihua Dang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yongzhen Huang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Xianyong Lan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Gang Ren
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Yuanpeng Gao
- College of Veterinary Medicine, Northwest A&F University, Xianyang, Yangling, China
| | - Zhijie Ma
- Qinghai Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Haijian Cheng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Shandong Key Lab of Animal Disease Control and Breeding, Jinan, China
| | - Yun Ma
- Key Laboratory of Ruminant Molecular and Cellular Breeding of Ningxia Hui Autonomous Region, School of Agriculture, Ningxia University, Yinchuan, China
| | - Hong Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China
| | - Weijun Pang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| | - Chuzhao Lei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| | - Ningbo Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang, China.
| |
Collapse
|
44
|
Shiraishi Y, Koya J, Chiba K, Okada A, Arai Y, Saito Y, Shibata T, Kataoka K. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res 2023; 51:e74. [PMID: 37336583 PMCID: PMC10415145 DOI: 10.1093/nar/gkad526] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/23/2023] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs.
Collapse
Affiliation(s)
- Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Junji Koya
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuki Saito
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Laboratory of Molecular Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Keisuke Kataoka
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Hematology, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
45
|
Dai X, Bian P, Hu D, Luo F, Huang Y, Jiao S, Wang X, Gong M, Li R, Cai Y, Wen J, Yang Q, Deng W, Nanaei HA, Wang Y, Wang F, Zhang Z, Rosen BD, Heller R, Jiang Y. A Chinese indicine pangenome reveals a wealth of novel structural variants introgressed from other Bos species. Genome Res 2023; 33:1284-1298. [PMID: 37714713 PMCID: PMC10547261 DOI: 10.1101/gr.277481.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 06/30/2023] [Indexed: 09/17/2023]
Abstract
Chinese indicine cattle harbor a much higher genetic diversity compared with other domestic cattle, but their genome architecture remains uninvestigated. Using PacBio HiFi sequencing data from 10 Chinese indicine cattle across southern China, we assembled 20 high-quality partially phased genomes and integrated them into a multiassembly graph containing 148.5 Mb (5.6%) of novel sequence. We identified 156,009 high-confidence nonredundant structural variants (SVs) and 206 SV hotspots spanning ∼195 Mb of gene-rich sequence. We detected 34,249 archaic introgressed fragments in Chinese indicine cattle covering 1.93 Gb (73.3%) of the genome. We inferred an average of 3.8%, 3.2%, 1.4%, and 0.5% of introgressed sequence originating, respectively, from banteng-like, kouprey-like, gayal-like, and gaur-like Bos species, as well as 0.6% of unknown origin. Introgression from multiple donors might have contributed to the genetic diversity of Chinese indicine cattle. Altogether, this study highlights the contribution of interspecies introgression to the genomic architecture of an important livestock population and shows how exotic genomic elements can contribute to the genetic variation available for selection.
Collapse
Affiliation(s)
- Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Peipei Bian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Dexiang Hu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Funong Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yongzhen Huang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shaohua Jiao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Mian Gong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yudong Cai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiayue Wen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Qimeng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Weidong Deng
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Hojjat Asadollahpour Nanaei
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
- Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran 1983969412, Iran
| | - Yu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Fei Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zijing Zhang
- Institute of Animal Husbandry and Veterinary Science, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705, USA
| | - Rasmus Heller
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark;
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China;
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
46
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
47
|
Zwaig M, Johnston MJ, Lee JJ, Farooq H, Gallo M, Jabado N, Taylor MD, Ragoussis J. Linked-read based analysis of the medulloblastoma genome. Front Oncol 2023; 13:1221611. [PMID: 37576901 PMCID: PMC10419201 DOI: 10.3389/fonc.2023.1221611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 07/06/2023] [Indexed: 08/15/2023] Open
Abstract
Introduction Medulloblastoma is the most common type of malignant pediatric brain tumor with group 4 medulloblastomas (G4 MBs) accounting for 40% of cases. However, the molecular mechanisms that underlie this subgroup are still poorly understood. Point mutations are detected in a large number of genes at low incidence per gene while the detection of complex structural variants in recurrently affected genes typically requires the application of long-read technologies. Methods Here, we applied linked-read sequencing, which combines the long-range genome information of long-read sequencing with the high base pair accuracy of short read sequencing and very low sample input requirements. Results We demonstrate the detection of complex structural variants and point mutations in these tumors, and, for the first time, the detection of extrachromosomal DNA (ecDNA) with linked-reads. We provide further evidence for the high heterogeneity of somatic mutations in G4 MBs and add new complex events associated with it. Discussion We detected several enhancer-hijacking events, an ecDNA containing the MYCN gene, and rare structural rearrangements, such a chromothripsis in a G4 medulloblastoma, chromoplexy involving 8 different chromosomes, a TERT gene rearrangement, and a PRDM6 duplication.
Collapse
Affiliation(s)
- Melissa Zwaig
- Victor Phillip Dahdaleh Institute of Genomic Medicine and Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Michael J. Johnston
- Alberta Children’s Hospital Research Institute, Arnie Charbonneau Cancer Institute, and Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - John J.Y. Lee
- Department of Pathology and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, United States
| | | | - Marco Gallo
- Alberta Children’s Hospital Research Institute, Arnie Charbonneau Cancer Institute, and Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Nada Jabado
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- The Research Institute of the McGill University Health Centre, Montreal, QC, Canada
- Department of Pediatrics, McGill University, Montreal, QC, Canada
| | - Michael D. Taylor
- Division of Neurosurgery, The Arthur and Sonia Labatt Brain Tumour Research Centre and the Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada
- Texas Children’s Cancer Center , Hematology-Oncology Section and Department of Pediatrics – Hematology/Oncology and Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| | - Jiannis Ragoussis
- Victor Phillip Dahdaleh Institute of Genomic Medicine and Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
48
|
Schmidt M, Kutzner A. MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads. Genome Biol 2023; 24:170. [PMID: 37461107 DOI: 10.1186/s13059-023-03009-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/06/2023] [Indexed: 07/20/2023] Open
Abstract
Structural variant (SV) calling belongs to the standard tools of modern bioinformatics for identifying and describing alterations in genomes. Initially, this work presents several complex genomic rearrangements that reveal conceptual ambiguities inherent to the representation via basic SV. We contextualize these ambiguities theoretically as well as practically and propose a graph-based approach for resolving them. For various yeast genomes, we practically compute adjacency matrices of our graph model and demonstrate that they provide highly accurate descriptions of one genome in terms of another. An open-source prototype implementation of our approach is available under the MIT license at https://github.com/ITBE-Lab/MA .
Collapse
Affiliation(s)
- Markus Schmidt
- Biomedical Center Munich, Department of Physiological Chemistry, Ludwig-Maximilians-Universität, Großhaderner Str. 9, 82152, Planegg-Martinsried, Germany
| | - Arne Kutzner
- Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 133-791, Republic of Korea.
| |
Collapse
|
49
|
Benson CW, Sheltra MR, Maughan PJ, Jellen EN, Robbins MD, Bushman BS, Patterson EL, Hall ND, Huff DR. Homoeologous evolution of the allotetraploid genome of Poa annua L. BMC Genomics 2023; 24:350. [PMID: 37365554 DOI: 10.1186/s12864-023-09456-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/15/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Poa annua (annual bluegrass) is an allotetraploid turfgrass, an agronomically significant weed, and one of the most widely dispersed plant species on earth. Here, we report the chromosome-scale genome assemblies of P. annua's diploid progenitors, P. infirma and P. supina, and use multi-omic analyses spanning all three species to better understand P. annua's evolutionary novelty. RESULTS We find that the diploids diverged from their common ancestor 5.5 - 6.3 million years ago and hybridized to form P. annua ≤ 50,000 years ago. The diploid genomes are similar in chromosome structure and most notably distinguished by the divergent evolutionary histories of their transposable elements, leading to a 1.7 × difference in genome size. In allotetraploid P. annua, we find biased movement of retrotransposons from the larger (A) subgenome to the smaller (B) subgenome. We show that P. annua's B subgenome is preferentially accumulating genes and that its genes are more highly expressed. Whole-genome resequencing of several additional P. annua accessions revealed large-scale chromosomal rearrangements characterized by extensive TE-downsizing and evidence to support the Genome Balance Hypothesis. CONCLUSIONS The divergent evolutions of the diploid progenitors played a central role in conferring onto P. annua its remarkable phenotypic plasticity. We find that plant genes (guided by selection and drift) and transposable elements (mostly guided by host immunity) each respond to polyploidy in unique ways and that P. annua uses whole-genome duplication to purge highly parasitized heterochromatic sequences. The findings and genomic resources presented here will enable the development of homoeolog-specific markers for accelerated weed science and turfgrass breeding.
Collapse
Affiliation(s)
- Christopher W Benson
- Department of Plant Science, Pennsylvania State University, University Park, PA, USA.
- Intercollegiate Graduate Degree Program in Plant Biology, Pennsylvania State University, University Park, PA, USA.
| | - Matthew R Sheltra
- Department of Plant Science, Pennsylvania State University, University Park, PA, USA
- Intercollegiate Graduate Degree Program in Plant Biology, Pennsylvania State University, University Park, PA, USA
| | - Peter J Maughan
- Department of Plant and Wildlife Sciences, Brigham Young University, Logan, UT, USA
| | - Eric N Jellen
- Department of Plant and Wildlife Sciences, Brigham Young University, Logan, UT, USA
| | | | | | - Eric L Patterson
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - Nathan D Hall
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - David R Huff
- Department of Plant Science, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
50
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|