1
|
Wan RD, Gao X, Wang GW, Wu SX, Yang QL, Zhang YW, Yang QE. Identification of Candidate Genes Related to Hybrid Sterility by Genomic Structural Variation and Transcriptome Analyses in Cattle-yak. J Dairy Sci 2024:S0022-0302(24)01212-8. [PMID: 39414017 DOI: 10.3168/jds.2024-24770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 09/24/2024] [Indexed: 10/18/2024]
Abstract
Hybrids between closely related but genetically incompatible species are often inviable or sterile. Cattle-yak, an interspecific hybrid of yak and cattle, exhibits male-specific sterility, which limits the fixation of its desired traits and prevents genetic improvement in yak through crossbreeding. Transcriptome profiles of testicular tissues have been generated in cattle, yak, and cattle-yak; however, the genetic variations underlying differential gene expression associated with hybrid sterility have yet to be elucidated. We detected differences in the cellular composition and gene expression of testes from yak and cattle-yak at 3 mo of age, 10 mo of age and adulthood. Histological analysis revealed that the most advanced germ cells were gonocytes (prospermatogonia) at 3 mo and spermatocytes at 10 mo. Complete spermatogenesis occurred in the seminiferous tubules of adult yak, whereas only spermatogonia and a limited number of spermatocytes were detected in the testis of adult cattle-yak. Transcriptome analysis revealed 180, 6310, and 6112 differentially expressed genes (DEGs) in yak and cattle-yak at each stage, respectively. Next, we examined the spermatogenic cell types in the backcross generation (BC1) and detected the appearance of round spermatids, indicating the partial recovery of spermatogenesis in these animals. Compared with those in cattle-yak, 272 DEGs were identified in the testes of BC1 animals. Notably, we discovered that the expression of X chromosome-linked (X-linked) genes was upregulated in the testis of cattle-yak compared with yak, suggesting a possible abnormality in the process of meiotic sex chromosome inactivation (MSCI) in hybrid animals. We next screened DEGs harboring structural variations (SVs) and identified a list of SV genes associated with spermatogonial development, meiotic recombination, and double-strand break (DSB) repair. Furthermore, we found that the SV genes ESCO2 (establishment of sister chromatid cohesion N-acetyltransferase 2) and BRDT (bromodomain testis associated) may be involved in meiotic arrest of cattle-yak spermatocytes. Overall, our research provides a valuable database for identifying structural variant loci that contribute to hybrid sterility.
Collapse
Affiliation(s)
- Rui-Dong Wan
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Xue Gao
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Guo-Wen Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Shi-Xin Wu
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Qi-Lin Yang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Yi-Wen Zhang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China
| | - Qi-En Yang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, Qinghai, 810001, China; University of Chinese Academy of Sciences, Beijing, 100049, China; Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining 810001, Qinghai, China.
| |
Collapse
|
2
|
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Genome-wide association testing beyond SNPs. Nat Rev Genet 2024:10.1038/s41576-024-00778-y. [PMID: 39375560 DOI: 10.1038/s41576-024-00778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/09/2024]
Abstract
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
Collapse
Affiliation(s)
- Laura Harris
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Xiaolei Zhang
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Katherine Fawcett
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Amy Foreman
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Petr Daneck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Panagiotis I Sergouniotis
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Francesco Mazzarotto
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael Inouye
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Ewan Birney
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
3
|
Kong Q, Jiang Y, Sun M, Wang Y, Zhang L, Zeng X, Wang Z, Wang Z, Liu Y, Gan Y, Liu H, Gao X, Yang X, Song X, Liu H, Shi J. Biparental graph strategy to represent and analyze hybrid plant genomes. PLANT PHYSIOLOGY 2024; 196:1284-1297. [PMID: 38991561 PMCID: PMC11444280 DOI: 10.1093/plphys/kiae375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 05/14/2024] [Accepted: 06/20/2024] [Indexed: 07/13/2024]
Abstract
Hybrid plants are found extensively in the wild, and they often demonstrate superior performance of complex traits over their parents and other selfing plants. This phenomenon, known as heterosis, has been extensively applied in plant breeding for decades. However, the process of decoding hybrid plant genomes has seriously lagged due to the challenges associated with genome assembly and the lack of appropriate methodologies for their subsequent representation and analysis. Here, we present the assembly and analysis of 2 hybrids, an intraspecific hybrid between 2 maize (Zea mays ssp. mays) inbred lines and an interspecific hybrid between maize and its wild relative teosinte (Z. mays ssp. parviglumis), utilizing a combination of PacBio High Fidelity sequencing and chromatin conformation capture sequencing data. The haplotypic assemblies are well phased at chromosomal scale, successfully resolving the complex loci with extensive parental structural variations (SVs). By integrating into a biparental genome graph, the haplotypic assemblies can facilitate downstream short-read-based SV calling and allele-specific gene expression analysis, demonstrating outstanding advantages over a single linear genome. Our work offers a comprehensive workflow that aims to facilitate the decoding of numerous hybrid plant genomes, particularly those with unknown or inaccessible parentage, thereby enhancing our understanding of genome evolution and heterosis.
Collapse
Affiliation(s)
- Qianqian Kong
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Yi Jiang
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Mingfei Sun
- Modern Crop Biotechnology Research and Application Laboratory, College of Life Sciences, Shandong Agricultural University, Tai'an 271018, China
| | - Yunpeng Wang
- Jilin Provincial Crop Transgenic Science and Technology Innovation Center, Institute of Agricultural Biotechnology, Jilin Academy of Agricultural Sciences, Changchun 130033, China
| | - Lin Zhang
- College of Agriculture, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin 150030, China
| | - Xing Zeng
- College of Agriculture, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin 150030, China
| | - Zhiheng Wang
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Zijie Wang
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Yuting Liu
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Yuanxian Gan
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Han Liu
- College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
| | - Xiang Gao
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| | - Xuerong Yang
- Modern Crop Biotechnology Research and Application Laboratory, College of Life Sciences, Shandong Agricultural University, Tai'an 271018, China
| | - Xinyuan Song
- Jilin Provincial Crop Transgenic Science and Technology Innovation Center, Institute of Agricultural Biotechnology, Jilin Academy of Agricultural Sciences, Changchun 130033, China
| | - Hongjun Liu
- Modern Crop Biotechnology Research and Application Laboratory, College of Life Sciences, Shandong Agricultural University, Tai'an 271018, China
| | - Junpeng Shi
- School of Agriculture and Biotechnology, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|
4
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
5
|
Liu L, Zhang J, Wood S, Newell F, Leonard C, Koufariotis LT, Nones K, Dalley AJ, Chittoory H, Bashirzadeh F, Son JH, Steinfort D, Williamson JP, Bint M, Pahoff C, Nguyen PT, Twaddell S, Arnold D, Grainge C, Simpson PT, Fielding D, Waddell N, Pearson JV. Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology. BMC Genomics 2024; 25:898. [PMID: 39350042 PMCID: PMC11441263 DOI: 10.1186/s12864-024-10792-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes). RESULTS Different combinations of aligners and variant callers influenced somatic structural variant detection. The choice of caller had a significant influence on somatic structural variant detection in terms of variant type, size, sensitivity, and accuracy. The performance of each variant caller was assessed by comparing to somatic structural variants identified by short-read sequencing. When compared to somatic structural variants detected with short-read sequencing, more events were detected with long-read sequencing. The mean recall of somatic variant events identified by long-read sequencing was higher for the somatic callers (72%) than generic callers (53%). Among the somatic callers when using the minimap2 aligner, SAVANA and Severus achieved the highest recall at 79.5% and 79.25% respectively, followed by nanomonsv with a recall of 72.5%. CONCLUSION Long-read sequencing can identify somatic structural variants in clincal samples. The longer reads have the potential to improve our understanding of cancer development and inform personalized cancer treatment.
Collapse
Affiliation(s)
- Lingchen Liu
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Jia Zhang
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Scott Wood
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Felicity Newell
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Conrad Leonard
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | | | - Katia Nones
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Andrew J Dalley
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Haarika Chittoory
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Farzad Bashirzadeh
- Department of Thoracic Medicine, The Royal Brisbane & Women's Hospital, Brisbane, Australia
| | - Jung Hwa Son
- Department of Thoracic Medicine, The Royal Brisbane & Women's Hospital, Brisbane, Australia
| | - Daniel Steinfort
- Department of Thoracic Medicine, Royal Melbourne Hospital, Melbourne, Australia
| | | | - Michael Bint
- Department of Thoracic Medicine, Sunshine Coast University Hospital, Birtinya, Australia
| | - Carl Pahoff
- Department of Thoracic Medicine, Gold Coast University Hospital, Southport, Australia
| | - Phan T Nguyen
- Department of Thoracic Medicine, Royal Adelaide Hospital, Adelaide, Australia
| | - Scott Twaddell
- Department of Respiratory and Sleep Medicine, John Hunter Hospital, Newcastle, Australia
| | - David Arnold
- Department of Respiratory and Sleep Medicine, John Hunter Hospital, Newcastle, Australia
| | - Christopher Grainge
- Department of Respiratory and Sleep Medicine, John Hunter Hospital, Newcastle, Australia
| | - Peter T Simpson
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - David Fielding
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
- Department of Thoracic Medicine, The Royal Brisbane & Women's Hospital, Brisbane, Australia
| | - Nicola Waddell
- QIMR Berghofer Medical Research Institute, Brisbane, Australia.
- Faculty of Medicine, The University of Queensland, Brisbane, Australia.
| | - John V Pearson
- QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| |
Collapse
|
6
|
Zheng Y, Shang X. FindCSV: a long-read based method for detecting complex structural variations. BMC Bioinformatics 2024; 25:315. [PMID: 39342151 PMCID: PMC11439270 DOI: 10.1186/s12859-024-05937-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
7
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
8
|
Hou Y, Gan J, Fan Z, Sun L, Garg V, Wang Y, Li S, Bao P, Cao B, Varshney RK, Zhao H. Haplotype-based pangenomes reveal genetic variations and climate adaptations in moso bamboo populations. Nat Commun 2024; 15:8085. [PMID: 39278956 PMCID: PMC11402969 DOI: 10.1038/s41467-024-52376-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/30/2024] [Indexed: 09/18/2024] Open
Abstract
Moso bamboo (Phyllostachys edulis), an ecologically and economically important forest species in East Asia, plays vital roles in carbon sequestration and climate change mitigation. However, intensifying climate change threatens moso bamboo survival. Here we generate high-quality haplotype-based pangenome assemblies for 16 representative moso bamboo accessions and integrated these assemblies with 427 previously resequenced accessions. Characterization of the haplotype-based pangenome reveals extensive genetic variation, predominantly between haplotypes rather than within accessions. Many genes with allele-specific expression patterns are implicated in climate responses. Integrating spatiotemporal climate data reveals more than 1050 variations associated with pivotal climate factors, including temperature and precipitation. Climate-associated variations enable the prediction of increased genetic risk across the northern and western regions of China under future emissions scenarios, underscoring the threats posed by rising temperatures. Our integrated haplotype-based pangenome elucidates moso bamboo's local climate adaptation mechanisms and provides critical genomic resources for addressing intensifying climate pressures on this essential bamboo. More broadly, this study demonstrates the power of long-read sequencing in dissecting adaptive traits in climate-sensitive species, advancing evolutionary knowledge to support conservation.
Collapse
Affiliation(s)
- Yinguang Hou
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Junwei Gan
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Zeyu Fan
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Lei Sun
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Vanika Garg
- Centre for Crop & Food Innovation, WA State Agricultural Biotechnology Centre, Food Futures Institute, Murdoch University, Murdoch, WA, 6150, Australia
| | - Yu Wang
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Shanying Li
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Pengfei Bao
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Bingchen Cao
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China
| | - Rajeev K Varshney
- Centre for Crop & Food Innovation, WA State Agricultural Biotechnology Centre, Food Futures Institute, Murdoch University, Murdoch, WA, 6150, Australia
| | - Hansheng Zhao
- Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Centre for Bamboo and Rattan, Beijing, 100102, China.
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Beijing, 100102, China.
| |
Collapse
|
9
|
Sirén J, Eskandar P, Ungaro MT, Hickey G, Eizenga JM, Novak AM, Chang X, Chang PC, Kolmogorov M, Carroll A, Monlong J, Paten B. Personalized pangenome references. Nat Methods 2024:10.1038/s41592-024-02407-2. [PMID: 39261641 DOI: 10.1038/s41592-024-02407-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/06/2024] [Indexed: 09/13/2024]
Abstract
Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit ( https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.
Collapse
Affiliation(s)
- Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
| | - Parsa Eskandar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Matteo Tommaso Ungaro
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- University of Ferrara, Ferrara, Italy
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jordan M Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xian Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
10
|
Jiang T, Zhou Z, Zhang Z, Cao S, Wang Y, Liu Y. MEHunter: transformer-based mobile element variant detection from long reads. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae557. [PMID: 39287014 DOI: 10.1093/bioinformatics/btae557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/03/2024] [Accepted: 09/13/2024] [Indexed: 09/19/2024]
Abstract
SUMMARY Mobile genetic elements (MEs) are heritable mutagens that significantly contribute to genetic diseases. The advent of long-read sequencing technologies, capable of resolving large DNA fragments, offers promising prospects for the comprehensive detection of ME variants (MEVs). However, achieving high precision while maintaining recall performance remains challenging mainly brought by the variable length and similar content of MEV signatures, which are often obscured by the noise in long reads. Here, we propose MEHunter, a high-performance MEV detection approach utilizing a fine-tuned transformer model adept at identifying potential MEVs with fragmented features. Benchmark experiments on both simulated and real datasets demonstrate that MEHunter consistently achieves higher accuracy and sensitivity than the state-of-the-art tools. Furthermore, it is capable of detecting novel potentially individual-specific MEVs that have been overlooked in published population projects. AVAILABILITY AND IMPLEMENTATION MEHunter is available from https://github.com/120L021101/MEHunter.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Zuji Zhou
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| |
Collapse
|
11
|
Xia Z, Xiang W, Wang Q, Li X, Li Y, Gao J, Tang T, Yang C, Cui Y. CSV-Filter: a deep learning-based comprehensive structural variant filtering method for both short and long reads. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae539. [PMID: 39240375 PMCID: PMC11419953 DOI: 10.1093/bioinformatics/btae539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 07/29/2024] [Accepted: 09/03/2024] [Indexed: 09/07/2024]
Abstract
MOTIVATION Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed. RESULTS We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature. AVAILABILITY AND IMPLEMENTATION https://github.com/xzyschumacher/CSV-Filter.
Collapse
Affiliation(s)
- Zeyu Xia
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Weiming Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Hunan 410082, P. R. China
| | - Qingzhe Wang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Xingze Li
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Yilin Li
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Junyu Gao
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Tao Tang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| | - Canqun Yang
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
- National Supercomputer Center in Tianjin, Tianjin, 300457, P. R. China
- Haihe Lab of ITAI, Tianjin, 300457, P. R. China
| | - Yingbo Cui
- College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, P. R. China
| |
Collapse
|
12
|
Huang G, Bao Z, Feng L, Zhai J, Wendel JF, Cao X, Zhu Y. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat Genet 2024; 56:1953-1963. [PMID: 39147922 DOI: 10.1038/s41588-024-01877-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 07/18/2024] [Indexed: 08/17/2024]
Abstract
Assembly of complete genomes can reveal functional genetic elements missing from draft sequences. Here we present the near-complete telomere-to-telomere and contiguous genome of the cotton species Gossypium raimondii. Our assembly identified gaps and misoriented or misassembled regions in previous assemblies and produced 13 centromeres, with 25 chromosomal ends having telomeres. In contrast to satellite-rich Arabidopsis and rice centromeres, cotton centromeres lack phased CENH3 nucleosome positioning patterns and probably evolved by invasion from long terminal repeat retrotransposons. In-depth expression profiling of transposable elements revealed a previously unannotated DNA transposon (MuTC01) that interacts with miR2947 to produce trans-acting small interfering RNAs (siRNAs), one of which targets the newly evolved LEC2 (LEC2b) to produce phased siRNAs. Systematic genome editing experiments revealed that this tripartite module, miR2947-MuTC01-LEC2b, controls the morphogenesis of complex folded embryos characteristic of Gossypium and its close relatives in the cotton tribe. Our study reveals a trans-acting siRNA-based tripartite regulatory pathway for embryo development in higher plants.
Collapse
Affiliation(s)
- Gai Huang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China.
- Institute for Advanced Studies, Wuhan University, Wuhan, China.
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.
| | - Zhigui Bao
- Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Li Feng
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Jixian Zhai
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Xiaofeng Cao
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Yuxian Zhu
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China.
- Institute for Advanced Studies, Wuhan University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
- Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China.
| |
Collapse
|
13
|
Zheng J, Li T, Ye H, Jiang Z, Jiang W, Yang H, Wu Z, Xie Z. Comprehensive identification of pathogenic variants in retinoblastoma by long- and short-read sequencing. Cancer Lett 2024; 598:217121. [PMID: 39009069 DOI: 10.1016/j.canlet.2024.217121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 06/16/2024] [Accepted: 07/11/2024] [Indexed: 07/17/2024]
Abstract
Retinoblastoma (RB) is the most common intraocular malignancy in childhood. The causal variants in RB are mostly characterized by previously used short-read sequencing (SRS) analysis, which has technical limitations in identifying structural variants (SVs) and phasing information. Long-read sequencing (LRS) technology has advantages over SRS in detecting SVs, phased genetic variants, and methylation. In this study, we comprehensively characterized the genetic landscape of RB using combinatorial LRS and SRS of 16 RB tumors and 16 matched blood samples. We detected a total of 232 somatic SVs, with an average of 14.5 SVs per sample across the whole genome in our cohort. We identified 20 distinct pathogenic variants disrupting RB1 gene, including three novel small variants and five somatic SVs. We found more somatic SVs were detected from LRS than SRS (140 vs. 122) in RB samples with WGS data, particularly the insertions (18 vs. 1). Furthermore, our analysis shows that, with the exception of one sample who lacked the methylation data, all samples presented biallelic inactivation of RB1 in various forms, including two cases with the biallelic hypermethylated promoter and four cases with compound heterozygous mutations which were missing in SRS analysis. By inferring relative timing of somatic events, we reveal the genetic progression that RB1 disruption early and followed by copy number changes, including amplifications of Chr2p and deletions of Chr16q, during RB tumorigenesis. Altogether, we characterize the comprehensive genetic landscape of RB, providing novel insights into the genetic alterations and mechanisms contributing to RB initiation and development. Our work also establishes a framework to analyze genomic landscape of cancers based on LRS data.
Collapse
Affiliation(s)
- Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huijing Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Wenbing Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huasheng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
14
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 PMCID: PMC11322167 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
15
|
LaFlamme CW, Rastin C, Sengupta S, Pennington HE, Russ-Hall SJ, Schneider AL, Bonkowski ES, Almanza Fuerte EP, Allan TJ, Zalusky MPG, Goffena J, Gibson SB, Nyaga DM, Lieffering N, Hebbar M, Walker EV, Darnell D, Olsen SR, Kolekar P, Djekidel MN, Rosikiewicz W, McConkey H, Kerkhof J, Levy MA, Relator R, Lev D, Lerman-Sagie T, Park KL, Alders M, Cappuccio G, Chatron N, Demain L, Genevieve D, Lesca G, Roscioli T, Sanlaville D, Tedder ML, Gupta S, Jones EA, Weisz-Hubshman M, Ketkar S, Dai H, Worley KC, Rosenfeld JA, Chao HT, Neale G, Carvill GL, Wang Z, Berkovic SF, Sadleir LG, Miller DE, Scheffer IE, Sadikovic B, Mefford HC. Diagnostic utility of DNA methylation analysis in genetically unsolved pediatric epilepsies and CHD2 episignature refinement. Nat Commun 2024; 15:6524. [PMID: 39107278 PMCID: PMC11303402 DOI: 10.1038/s41467-024-50159-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 06/28/2024] [Indexed: 08/09/2024] Open
Abstract
Sequence-based genetic testing identifies causative variants in ~ 50% of individuals with developmental and epileptic encephalopathies (DEEs). Aberrant changes in DNA methylation are implicated in various neurodevelopmental disorders but remain unstudied in DEEs. We interrogate the diagnostic utility of genome-wide DNA methylation array analysis on peripheral blood samples from 582 individuals with genetically unsolved DEEs. We identify rare differentially methylated regions (DMRs) and explanatory episignatures to uncover causative and candidate genetic etiologies in 12 individuals. Using long-read sequencing, we identify DNA variants underlying rare DMRs, including one balanced translocation, three CG-rich repeat expansions, and four copy number variants. We also identify pathogenic variants associated with episignatures. Finally, we refine the CHD2 episignature using an 850 K methylation array and bisulfite sequencing to investigate potential insights into CHD2 pathophysiology. Our study demonstrates the diagnostic yield of genome-wide DNA methylation analysis to identify causal and candidate variants as 2% (12/582) for unsolved DEE cases.
Collapse
Affiliation(s)
- Christy W LaFlamme
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Cassandra Rastin
- Department of Pathology & Laboratory Medicine, Western University, London, ON, N5A 3K7, Canada
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada
| | - Soham Sengupta
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Helen E Pennington
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Mathematics & Statistics, Rhodes College, Memphis, TN, 38112, USA
| | - Sophie J Russ-Hall
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, 3084, Australia
| | - Amy L Schneider
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, 3084, Australia
| | - Emily S Bonkowski
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Edith P Almanza Fuerte
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Talia J Allan
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, 3084, Australia
| | - Miranda Perez-Galey Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
| | - Sophia B Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Denis M Nyaga
- Department of Paediatrics and Child Health, University of Otago, Wellington, 6242, New Zealand
| | - Nico Lieffering
- Department of Paediatrics and Child Health, University of Otago, Wellington, 6242, New Zealand
| | - Malavika Hebbar
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
| | - Emily V Walker
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital Memphis, Memphis, TN, 38105, USA
| | - Daniel Darnell
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital Memphis, Memphis, TN, 38105, USA
| | - Scott R Olsen
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital Memphis, Memphis, TN, 38105, USA
| | - Pandurang Kolekar
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Mohamed Nadhir Djekidel
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Wojciech Rosikiewicz
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Haley McConkey
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada
| | - Jennifer Kerkhof
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada
| | - Michael A Levy
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada
| | - Raissa Relator
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada
| | - Dorit Lev
- Institute of Medical Genetics, Wolfson Medical Center, Holon, 58100, Israel
| | - Tally Lerman-Sagie
- Fetal Neurology Clinic, Pediatric Neurology Unit, Wolfson Medical Center, Holon, 58100, Israel
- Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Kristen L Park
- Departments of Pediatrics and Neurology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Marielle Alders
- Department of Human Genetics, Amsterdam Reproduction and Development Research Institute, Amsterdam UMC, University of Amsterdam, Amsterdam, Meibergdreef 9, Amsterdam, Netherlands
| | - Gerarda Cappuccio
- Telethon Institute of Genetics and Medicine, Pozzuoli, Italy
- Department of Translational Medicine, Federico II University of Naples, Naples, Italy
| | - Nicolas Chatron
- Department of Medical Genetics, Member of the ERN EpiCARE, University Hospital of Lyon and Claude Bernard Lyon I University, Lyon, France
- Pathophysiology and Genetics of Neuron and Muscle (PNMG), UCBL, CNRS UMR5261 - INSERM, U1315, Lyon, France
| | - Leigh Demain
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - David Genevieve
- Montpellier University, Inserm Unit 1183, Reference Center for Rare Diseases Developmental Anomaly and Malformative Syndrome, Clinical Genetic Department, CHU Montpellier, Montpellier, France
| | - Gaetan Lesca
- Department of Medical Genetics, Member of the ERN EpiCARE, University Hospital of Lyon and Claude Bernard Lyon I University, Lyon, France
- Pathophysiology and Genetics of Neuron and Muscle (PNMG), UCBL, CNRS UMR5261 - INSERM, U1315, Lyon, France
| | - Tony Roscioli
- Neuroscience Research Australia (NeuRA), Sydney, NSW, Australia
- Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
- New South Wales Health Pathology Randwick Genomics, Prince of Wales Hospital, Sydney, NSW, Australia
| | - Damien Sanlaville
- Department of Medical Genetics, Member of the ERN EpiCARE, University Hospital of Lyon and Claude Bernard Lyon I University, Lyon, France
- Pathophysiology and Genetics of Neuron and Muscle (PNMG), UCBL, CNRS UMR5261 - INSERM, U1315, Lyon, France
| | | | - Sachin Gupta
- TY Nelson Department of Neurology and Neurosurgery, The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Elizabeth A Jones
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Monika Weisz-Hubshman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Texas Children's Hospital, Genetic Department, Houston, TX, 77030, USA
| | - Shamika Ketkar
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hongzheng Dai
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kim C Worley
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hsiao-Tuan Chao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Pediatrics, Section of Neurology and Developmental Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
- Cain Pediatric Neurology Research Foundation Laboratories, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
- Texas Children's Hospital, Houston, TX, 77030, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
- McNair Medical Institute, The Robert and Janice McNair Foundation, Houston, TX, 77030, USA
| | - Geoffrey Neale
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital Memphis, Memphis, TN, 38105, USA
| | - Gemma L Carvill
- Ken and Ruth Davee Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Zhaoming Wang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, 3084, Australia
| | - Lynette G Sadleir
- Department of Paediatrics and Child Health, University of Otago, Wellington, 6242, New Zealand
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC, 3084, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Melbourne, VIC, Australia
- Florey Institute and Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Bekim Sadikovic
- Department of Pathology & Laboratory Medicine, Western University, London, ON, N5A 3K7, Canada.
- Verspeeten Clinical Genome Centre, London Health Science Centre, London, ON, N6A 5W9, Canada.
| | - Heather C Mefford
- Center for Pediatric Neurological Disease Research, Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
16
|
She H, Liu Z, Xu Z, Zhang H, Wu J, Cheng F, Wang X, Qian W. Pan-genome analysis of 13 Spinacia accessions reveals structural variations associated with sex chromosome evolution and domestication traits in spinach. PLANT BIOTECHNOLOGY JOURNAL 2024. [PMID: 39095952 DOI: 10.1111/pbi.14433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/12/2024] [Accepted: 06/27/2024] [Indexed: 08/04/2024]
Abstract
Structural variations (SVs) are major genetic variants that can be involved in the origin, adaptation and domestication of species. However, the identification and characterization of SVs in Spinacia species are rare due to the lack of a pan-genome. Here, we report eight chromosome-scale assemblies of cultivated spinach and its two wild species. After integration with five existing assemblies, we constructed a comprehensive Spinacia pan-genome and identified 193 661 pan-SVs, which were genotyped in 452 Spinacia accessions. Our pan-SVs enabled genome-wide association study identified signals associated with sex and clarified the evolutionary direction of spinach. Most sex-linked SVs (86%) were biased to occur on the Y chromosome during the evolution of the sex-linked region, resulting in reduced Y-linked gene expression. The frequency of pan-SVs among Spinacia accessions further illustrated the contribution of these SVs to domestication, such as bolting time and seed dormancy. Furthermore, compared with SNPs, pan-SVs act as efficient variants in genomic selection (GS) because of their ability to capture missing heritability information and higher prediction accuracy. Overall, this study provides a valuable resource for spinach genomics and highlights the potential utility of pan-SV in crop improvement and breeding programmes.
Collapse
Affiliation(s)
- Hongbing She
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhiyuan Liu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhaosheng Xu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Helong Zhang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Wu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Feng Cheng
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaowu Wang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Wei Qian
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
- Zhongyuan Research Center, Chinese Academy of Agricultural Sciences, Xinxiang, China
| |
Collapse
|
17
|
Rangel V, Sterrenberg JN, Garawi A, Mezcord V, Folkerts ML, Calderon SE, Garcia YE, Wang J, Soyfer EM, Eng OS, Valerin JB, Tanjasiri SP, Quintero-Rivera F, Seldin MM, Masri S, Frock RL, Fleischman AG, Pannunzio NR. Increased AID results in mutations at the CRLF2 locus implicated in Latin American ALL health disparities. Nat Commun 2024; 15:6331. [PMID: 39068148 PMCID: PMC11283463 DOI: 10.1038/s41467-024-50537-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 07/10/2024] [Indexed: 07/30/2024] Open
Abstract
Activation-induced cytidine deaminase (AID) is a B cell-specific mutator required for antibody diversification. However, it is also implicated in the etiology of several B cell malignancies. Evaluating the AID-induced mutation load in patients at-risk for certain blood cancers is critical in assessing disease severity and treatment options. We have developed a digital PCR (dPCR) assay that allows us to quantify mutations resulting from AID modification or DNA double-strand break (DSB) formation and repair at sites known to be prone to DSBs. Implementation of this assay shows that increased AID levels in immature B cells increase genome instability at loci linked to chromosomal translocation formation. This includes the CRLF2 locus that is often involved in translocations associated with a subtype of acute lymphoblastic leukemia (ALL) that disproportionately affects Hispanics, particularly those with Latin American ancestry. Using dPCR, we characterize the CRLF2 locus in B cell-derived genomic DNA from both Hispanic ALL patients and healthy Hispanic donors and found increased mutations in both, suggesting that vulnerability to DNA damage at CRLF2 may be driving this health disparity. Our ability to detect and quantify these mutations will potentiate future risk identification, early detection of cancers, and reduction of associated cancer health disparities.
Collapse
Affiliation(s)
- Valeria Rangel
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Jason N Sterrenberg
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Aya Garawi
- School of Biological Sciences, University of California, Irvine, Irvine, CA, USA
| | - Vyanka Mezcord
- Center for Applied Biotechnology Studies, Department of Biological Science, California State University Fullerton, Fullerton, CA, USA
| | - Melissa L Folkerts
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Sabrina E Calderon
- School of Biological Sciences, University of California, Irvine, Irvine, CA, USA
| | - Yadhira E Garcia
- Department of Pharmaceutical Sciences, School of Pharmacy & Pharmaceutical Sciences, University of California, Irvine, CA, USA
| | - Jinglong Wang
- Division of Radiation and Cancer Biology, Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Eli M Soyfer
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Oliver S Eng
- Division of Surgical Oncology, Department of Surgery, University of California, Irvine, Irvine, CA, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
| | - Jennifer B Valerin
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
| | - Sora Park Tanjasiri
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
- Department of Health, Society and Behavior, University of California, Irvine, Irvine, CA, USA
| | - Fabiola Quintero-Rivera
- Department of Pathology and Laboratory Medicine, University of California, Irvine, Irvine, CA, USA
- Department of Pediatrics, University of California, Irvine, Irvine, CA, USA
| | - Marcus M Seldin
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
| | - Selma Masri
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
| | - Richard L Frock
- Division of Radiation and Cancer Biology, Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Angela G Fleischman
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA
| | - Nicholas R Pannunzio
- Division of Hematology/Oncology, Department of Medicine, University of California, Irvine, Irvine, CA, USA.
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, USA.
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
18
|
Yuan N, Jia P. Comprehensive assessment of long-read sequencing platforms and calling algorithms for detection of copy number variation. Brief Bioinform 2024; 25:bbae441. [PMID: 39256200 PMCID: PMC11387058 DOI: 10.1093/bib/bbae441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/09/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024] Open
Abstract
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
Collapse
Affiliation(s)
- Na Yuan
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Peilin Jia
- National Genomics Data Center, China National Center for Bioinformation, Beichen West Road, Chaoyang District, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
19
|
Haase MAB, Lazar-Stefanita L, Ólafsson G, Wudzinska A, Shen MJ, Truong DM, Boeke JD. macroH2A1 drives nucleosome dephasing and genome instability in histone humanized yeast. Cell Rep 2024; 43:114472. [PMID: 38990716 DOI: 10.1016/j.celrep.2024.114472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 01/15/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024] Open
Abstract
In addition to replicative histones, eukaryotic genomes encode a repertoire of non-replicative variant histones, providing additional layers of structural and epigenetic regulation. Here, we systematically replace individual replicative human histones with non-replicative human variant histones using a histone replacement system in yeast. We show that variants H2A.J, TsH2B, and H3.5 complement their respective replicative counterparts. However, macroH2A1 fails to complement, and its overexpression is toxic in yeast, negatively interacting with yeast's native histones and kinetochore genes. To isolate yeast with macroH2A1 chromatin, we uncouple the effects of its macro and histone fold domains, revealing that both domains suffice to override native nucleosome positioning. Furthermore, both uncoupled constructs of macroH2A1 exhibit lower nucleosome occupancy, decreased short-range chromatin interactions (<20 kb), disrupted centromeric clustering, and increased chromosome instability. Our observations demonstrate that lack of a canonical histone H2A dramatically alters chromatin organization in yeast, leading to genome instability and substantial fitness defects.
Collapse
Affiliation(s)
- Max A B Haase
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA; Vilcek Institute of Graduate Biomedical Sciences, NYU School of Medicine, New York, NY 10016, USA
| | - Luciana Lazar-Stefanita
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
| | - Guðjón Ólafsson
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
| | - Aleksandra Wudzinska
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
| | - Michael J Shen
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
| | - David M Truong
- Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA; Department of Pathology, NYU Langone Health, New York, NY 10016, USA
| | - Jef D Boeke
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA; Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA.
| |
Collapse
|
20
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
21
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
22
|
Zhang Z, Liu Y, Li X, Liu Y, Wang Y, Jiang T. HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data. Front Genet 2024; 15:1435087. [PMID: 39045321 PMCID: PMC11263161 DOI: 10.3389/fgene.2024.1435087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 06/13/2024] [Indexed: 07/25/2024] Open
Abstract
Introduction: Structural Variants (SVs) are a type of variation that can significantly influence phenotypes and cause diseases. Thus, the accurate detection of SVs is a vital part of modern genetic analysis. The advent of long-read sequencing technology ushers in a new era of more accurate and comprehensive SV calling, and many tools have been developed to call SVs using long-read data. Haplotype-tagging is a procedure that can tag haplotype information on reads and can thus potentially improve the SV detection; nevertheless, few methods make use of this information. In this article, we introduce HapKled, a new SV detection tool that can accurately detect SVs from Oxford Nanopore Technologies (ONT) long-read alignment data. Methods: HapKled utilizes haplotype information underlying alignment data by conducting haplotype-tagging using Whatshap on the reads to improve the detection performance, with three unique calling mechanics including altering clustering conditions according to haplotype information of signatures, determination of similar SVs based on haplotype information, and slack filtering conditions based on haplotype quality. Results: In our evaluations, HapKled outperformed state-of-the-art tools and can deliver better SV detection results on both simulated and real sequencing data. The code and experiments of HapKled can be obtained from https://github.com/CoREse/HapKled. Discussion: With the superb SV detection performance that HapKled can deliver, HapKled could be useful in bioinformatics research, clinical diagnosis, and medical research and development.
Collapse
Affiliation(s)
- Zhendong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yue Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Xin Li
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| | - Tao Jiang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| |
Collapse
|
23
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
24
|
She H, Liu Z, Xu Z, Zhang H, Wu J, Wang X, Cheng F, Charlesworth D, Qian W. Insights into spinach domestication from genome sequences of two wild spinach progenitors, Spinacia turkestanica and Spinacia tetrandra. THE NEW PHYTOLOGIST 2024; 243:477-494. [PMID: 38715078 DOI: 10.1111/nph.19799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/18/2024] [Indexed: 06/07/2024]
Abstract
Cultivated spinach (Spinacia oleracea) is a dioecious species. We report high-quality genome sequences for its two closest wild relatives, Spinacia turkestanica and Spinacia tetrandra, which are also dioecious, and are used to study the genetics of spinach domestication. Using a combination of genomic approaches, we assembled genomes of both these species and analyzed them in comparison with the previously assembled S. oleracea genome. These species diverged c. 6.3 million years ago (Ma), while cultivated spinach split from S. turkestanica 0.8 Ma. In all three species, all six chromosomes include very large gene-poor, repeat-rich regions, which, in S. oleracea, are pericentromeric regions with very low recombination rates in both male and female genetic maps. We describe population genomic evidence that the similar regions in the wild species also recombine rarely. We characterized 282 structural variants (SVs) that have been selected during domestication. These regions include genes associated with leaf margin type and flowering time. We also describe evidence that the downy mildew resistance loci of cultivated spinach are derived from introgression from both wild spinach species. Collectively, this study reveals the genome architecture of spinach assemblies and highlights the importance of SVs during the domestication of cultivated spinach.
Collapse
Affiliation(s)
- Hongbing She
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhiyuan Liu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhaosheng Xu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Helong Zhang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jian Wu
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Xiaowu Wang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Feng Cheng
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Deborah Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, Edinburgh, EH9 3FL, UK
| | - Wei Qian
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| |
Collapse
|
25
|
Zhang T, Peng W, Xiao H, Cao S, Chen Z, Su X, Luo Y, Liu Z, Peng Y, Yang X, Jiang GF, Xu X, Ma Z, Zhou Y. Population genomics highlights structural variations in local adaptation to saline coastal environments in woolly grape. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2024; 66:1408-1426. [PMID: 38578160 DOI: 10.1111/jipb.13653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 03/04/2024] [Indexed: 04/06/2024]
Abstract
Structural variations (SVs) are a feature of plant genomes that has been largely unexplored despite their significant impact on plant phenotypic traits and local adaptation to abiotic and biotic stress. In this study, we employed woolly grape (Vitis retordii), a species native to the tropical and subtropical regions of East Asia with both coastal and inland habitats, as a valuable model for examining the impact of SVs on local adaptation. We assembled a haplotype-resolved chromosomal reference genome for woolly grape, and conducted population genetic analyses based on whole-genome sequencing (WGS) data from coastal and inland populations. The demographic analyses revealed recent bottlenecks in all populations and asymmetric gene flow from the inland to the coastal population. In total, 1,035 genes associated with plant adaptive regulation for salt stress, radiation, and environmental adaptation were detected underlying local selection by SVs and SNPs in the coastal population, of which 37.29% and 65.26% were detected by SVs and SNPs, respectively. Candidate genes such as FSD2, RGA1, and AAP8 associated with salt tolerance were found to be highly differentiated and selected during the process of local adaptation to coastal habitats in SV regions. Our study highlights the importance of SVs in local adaptation; candidate genes related to salt stress and climatic adaptation to tropical and subtropical environments are important genomic resources for future breeding programs of grapevine and its rootstocks.
Collapse
Affiliation(s)
- Tianhao Zhang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Forest Ecology and Conservation, Guangxi Colleges and Universities Key Laboratory for Cultivation and Utilization of Subtropical Forest Plantation, College of Forestry, Guangxi University, Nanning, 530004, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wenjing Peng
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Sugarcane Biology, College of Agriculture, Guangxi University, Nanning, 530004, China
| | - Hua Xiao
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Shuo Cao
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
- Key Laboratory of Horticultural Plant Biology Ministry of Education, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhuyifu Chen
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Xiangnian Su
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Forest Ecology and Conservation, Guangxi Colleges and Universities Key Laboratory for Cultivation and Utilization of Subtropical Forest Plantation, College of Forestry, Guangxi University, Nanning, 530004, China
| | - Yuanyuan Luo
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, 450009, China
| | - Zhongjie Liu
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Yanling Peng
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Xiping Yang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Sugarcane Biology, College of Agriculture, Guangxi University, Nanning, 530004, China
| | - Guo-Feng Jiang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Forest Ecology and Conservation, Guangxi Colleges and Universities Key Laboratory for Cultivation and Utilization of Subtropical Forest Plantation, College of Forestry, Guangxi University, Nanning, 530004, China
| | - Xiaodong Xu
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Zhiyao Ma
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Yongfeng Zhou
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
- National Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, 571101, China
| |
Collapse
|
26
|
Srivastav SP, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. Genome Res 2024; 34:711-724. [PMID: 38749655 PMCID: PMC11216404 DOI: 10.1101/gr.278062.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/07/2024] [Indexed: 05/28/2024]
Abstract
The piRNA pathway is a highly conserved mechanism to repress transposable element (TE) activity in the animal germline via a specialized class of small RNAs called piwi-interacting RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). Although the molecular processes by which piCs function are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC origin and evolution, we use a population genomic approach to compare piC activity and sequence composition across eight geographically distant strains of D. melanogaster with high-quality long-read genome assemblies. We perform annotations of ovary piCs and genome-wide TE content in each strain. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs. Most TEs inferred to be recently active show an enrichment of insertions into old and large piCs, consistent with the previously proposed "trap" model of piC evolution. In contrast, a small subset of active LTR families is enriched for the formation of new piCs, suggesting that these TEs have higher proclivity to form piCs. Thus, our findings uncover processes leading to the origin of piCs. We propose that piC evolution begins with the emergence of piRNAs from individual insertions of a few select TE families prone to seed new piCs that subsequently expand by accretion of insertions from most other TE families during evolution to form larger "trap" clusters. Our study shows that TEs themselves are the major force driving the rapid evolution of piCs.
Collapse
Affiliation(s)
- Satyam P Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
27
|
Schmidt M, Guerreiro R, Baig N, Habekuß A, Will T, Ruckwied B, Stich B. Fine mapping a QTL for BYDV-PAV resistance in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:163. [PMID: 38896149 PMCID: PMC11186928 DOI: 10.1007/s00122-024-04668-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 06/01/2024] [Indexed: 06/21/2024]
Abstract
Barley yellow dwarf (BYD) is one of the economically most important virus diseases of cereals worldwide, causing yield losses up to 80%. The means to control BYD are limited, and the use of genetically resistant cultivars is the most economical and environmentally friendly approach. The objectives of this study were i) to identify the causative gene for BYD virus (BYDV)-PAV resistance in maize, ii) to identify single nucleotide polymorphisms and/or structural variations in the gene sequences, which may cause differing susceptibilities to BYDV-PAV of maize inbreds, and iii) to characterize the effect of BYDV-PAV infection on gene expression of susceptible, tolerant, and resistant maize inbreds. Using two biparental mapping populations, we could reduce a previously published quantitative trait locus for BYDV-PAV resistance in maize to ~ 0.3 Mbp, comprising nine genes. Association mapping and gene expression analysis further reduced the number of candidate genes for BYDV-PAV resistance in maize to two: Zm00001eb428010 and Zm00001eb428020. The predicted functions of these genes suggest that they confer BYDV-PAV resistance either via interfering with virus replication or by inducing reactive oxygen species signaling. The gene sequence of Zm00001eb428010 is affected by a 54 bp deletion in the 5`-UTR and a protein altering variant in BYDV-PAV-resistant maize inbreds but not in BYDV-PAV-susceptible and -tolerant inbreds. This finding suggests that altered abundance and/or properties of the proteins encoded by Zm00001eb428010 may lead to BYDV-PAV resistance.
Collapse
Affiliation(s)
- Maria Schmidt
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany
| | - Ricardo Guerreiro
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany
| | - Nadia Baig
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany
| | - Antje Habekuß
- Federal Research Center for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Julius-Kühn Institute, Quedlinburg, Germany
| | - Torsten Will
- Federal Research Center for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Julius-Kühn Institute, Quedlinburg, Germany
| | - Britta Ruckwied
- Federal Research Center for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Julius-Kühn Institute, Quedlinburg, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany.
- Cluster of Excellence On Plant Sciences, From Complex Traits Towards Synthetic Modules, Heinrich Heine University, Düsseldorf, Germany.
- Federal Research Center for Cultivated Plants, Institute for Breeding Research On Agricultural Crops, Julius-Kühn Institute, Sanitz, Germany.
| |
Collapse
|
28
|
Daida K, Yoshino H, Malik L, Baker B, Ishiguro M, Genner R, Paquette K, Li Y, Nishioka K, Masuzugawa S, Hirano M, Takahashi K, Kolmogolv M, Billingsley KJ, Funayama M, Blauwendraat C, Hattori N. The Utility of Long-Read Sequencing in Diagnosing Genetic Autosomal Recessive Parkinson's Disease: a genetic screening study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.14.24308784. [PMID: 39108517 PMCID: PMC11302705 DOI: 10.1101/2024.06.14.24308784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Background Mutations within the genes PRKN and PINK1 are the leading cause of early onset autosomal recessive Parkinson's disease (PD). However, the genetic cause of most early-onset PD (EOPD) cases still remains unresolved. Long-read sequencing has successfully identified many pathogenic structural variants that cause disease, but this technology has not been widely applied to PD. We recently identified the genetic cause of EOPD in a pair of monozygotic twins by uncovering a complex structural variant that spans over 7 Mb, utilizing Oxford Nanopore Technologies (ONT) long-read sequencing. In this study, we aimed to expand on this and assess whether a second variant could be detected with ONT long-read sequencing in other unresolved EOPD cases reported to carry one heterozygous variant in PRKN or PINK1. Methods ONT long-read sequencing was performed on patients with one reported PRKN/PINK1 pathogenic variant. EOPD patients with an age at onset younger than 50 were included in this study. As a positive control, we also included EOPD patients who had already been identified to carry two known PRKN pathogenic variants. Initial genetic testing was performed using either short-read targeted panel sequencing for single nucleotide variants and multiplex ligation-dependent probe amplification (MLPA) for copy number variants. Results 48 patients were included in this study (PRKN "one-variant" n = 24, PINK1 "one-variant" n = 12, PRKN "two-variants" n = 12). Using ONT long-read sequencing, we detected a second pathogenic variant in six PRKN "one-variant" patients (26%, 6/23) but none in the PINK1 "one-variant" patients (0%, 0/12). Long-read sequencing identified one case with a complex inversion, two instances of structural variant overlap, and three cases of duplication. In addition, in the positive control PRKN "two-variants" group, we were able to identify both pathogenic variants in PRKN in all the patients (100%, 12/12). Conclusions This data highlights that ONT long-read sequencing is a powerful tool to identify a pathogenic structural variant at the PRKN locus that is often missed by conventional methods. Therefore, for cases where conventional methods fail to detect a second variant for EOPD, long-read sequencing should be considered as an alternative and complementary approach.
Collapse
Affiliation(s)
- Kensuke Daida
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Faculty of Medicine, Juntendo University, Tokyo, Japan
| | - Hiroyo Yoshino
- Research Institute for Diseases of Old Age, Graduate School of Medicine, Juntendo University, Tokyo, Japan
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Breeana Baker
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Mayu Ishiguro
- Department of Neurology, Faculty of Medicine, Juntendo University, Tokyo, Japan
| | - Rylee Genner
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kimberly Paquette
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Yuanzhe Li
- Department of Neurology, Faculty of Medicine, Juntendo University, Tokyo, Japan
- Department of Diagnosis, Prevention and Treatment of Dementia, Graduate School of Medicine, Juntendo University, Tokyo, Japan
| | - Kenya Nishioka
- Department of Neurology, Juntendo Tokyo Koto Geriatric Medical Center, Koto-ku, Tokyo, Japan
| | | | - Makito Hirano
- Department of Neurology, Kindai University Faculty of Medicine, Osaka, Japan
| | - Kenta Takahashi
- Division of Neurology and Gerontology, Department of Internal Medicine, School of Medicine, Iwate Medical University, Morioka, Japan
| | - Mikhail Kolmogolv
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kimberley J Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Manabu Funayama
- Department of Neurology, Faculty of Medicine, Juntendo University, Tokyo, Japan
- Research Institute for Diseases of Old Age, Graduate School of Medicine, Juntendo University, Tokyo, Japan
| | - Cornelis Blauwendraat
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Nobutaka Hattori
- Department of Neurology, Faculty of Medicine, Juntendo University, Tokyo, Japan
- Research Institute for Diseases of Old Age, Graduate School of Medicine, Juntendo University, Tokyo, Japan
- Neurodegenerative Disorders Collaborative Laboratory, RIKEN Center for Brain Science, Wako, Saitama, Japan
| |
Collapse
|
29
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
30
|
Lautenschläger N, Schmidt K, Schiffer C, Wulff TF, Hahnke K, Finstermeier K, Mansour M, Elsholz AKW, Charpentier E. Expanding the genetic toolbox for the obligate human pathogen Streptococcus pyogenes. Front Bioeng Biotechnol 2024; 12:1395659. [PMID: 38911550 PMCID: PMC11190166 DOI: 10.3389/fbioe.2024.1395659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/06/2024] [Indexed: 06/25/2024] Open
Abstract
Genetic tools form the basis for the study of molecular mechanisms. Despite many recent advances in the field of genetic engineering in bacteria, genetic toolsets remain scarce for non-model organisms, such as the obligatory human pathogen Streptococcus pyogenes. To overcome this limitation and enable the straightforward investigation of gene functions in S. pyogenes, we have developed a comprehensive genetic toolset. By adapting and combining different tools previously applied in other Gram-positive bacteria, we have created new replicative and integrative plasmids for gene expression and genetic manipulation, constitutive and inducible promoters as well as fluorescence reporters for S. pyogenes. The new replicative plasmids feature low- and high-copy replicons combined with different resistance cassettes and a standardized multiple cloning site for rapid cloning procedures. We designed site-specific integrative plasmids and verified their integration by nanopore sequencing. To minimize the effect of plasmid integration on bacterial physiology, we screened publicly available RNA-sequencing datasets for transcriptionally silent sites. We validated this approach by designing the integrative plasmid pSpy0K6 targeting the transcriptionally silent gene SPy_1078. Analysis of the activity of different constitutive promoters indicated a wide variety of strengths, with the lactococcal promoter P 23 showing the strongest activity and the synthetic promoter P xylS2 showing the weakest activity. Further, we assessed the functionality of three inducible regulatory elements including a zinc- and an IPTG-inducible promoter as well as an erythromycin-inducible riboswitch that showed low-to-no background expression and high inducibility. Additionally, we demonstrated the applicability of two codon-optimized fluorescent proteins, mNeongreen and mKate2, as reporters in S. pyogenes. We therefore adapted the chemically defined medium called RPMI4Spy that showed reduced autofluorescence and enabled efficient signal detection in plate reader assays and fluorescence microscopy. Finally, we developed a plasmid-based system for genome engineering in S. pyogenes featuring the counterselection marker pheS*, which enabled the scarless deletion of the sagB gene. This new toolbox simplifies previously laborious genetic manipulation procedures and lays the foundation for new methodologies to study gene functions in S. pyogenes, leading to a better understanding of its virulence mechanisms and physiology.
Collapse
Affiliation(s)
| | - Katja Schmidt
- Max Planck Unit for the Science of Pathogens, Berlin, Germany
| | | | - Thomas F. Wulff
- Max Planck Unit for the Science of Pathogens, Berlin, Germany
| | - Karin Hahnke
- Max Planck Unit for the Science of Pathogens, Berlin, Germany
| | | | - Moïse Mansour
- Max Planck Unit for the Science of Pathogens, Berlin, Germany
| | | | - Emmanuelle Charpentier
- Max Planck Unit for the Science of Pathogens, Berlin, Germany
- Institut für Biologie, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
31
|
Wang H, Li C, Yu X, Gao J. Deletion variants calling in third-generation sequencing data based on a dual-attention mechanism. Brief Bioinform 2024; 25:bbae269. [PMID: 38851298 PMCID: PMC11162298 DOI: 10.1093/bib/bbae269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/10/2024] Open
Abstract
Deletion is a crucial type of genomic structural variation and is associated with numerous genetic diseases. The advent of third-generation sequencing technology has facilitated the analysis of complex genomic structures and the elucidation of the mechanisms underlying phenotypic changes and disease onset due to genomic variants. Importantly, it has introduced innovative perspectives for deletion variants calling. Here we propose a method named Dual Attention Structural Variation (DASV) to analyze deletion structural variations in sequencing data. DASV converts gene alignment information into images and integrates them with genomic sequencing data through a dual attention mechanism. Subsequently, it employs a multi-scale network to precisely identify deletion regions. Compared with four widely used genome structural variation calling tools: cuteSV, SVIM, Sniffles and PBSV, the results demonstrate that DASV consistently achieves a balance between precision and recall, enhancing the F1 score across various datasets. The source code is available at https://github.com/deconvolution-w/DASV.
Collapse
Affiliation(s)
- Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Chang Li
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Xinyu Yu
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, North Third Ring Road 15, 100029, Beijing, China
| |
Collapse
|
32
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
33
|
Shi T, Zhang X, Hou Y, Jia C, Dan X, Zhang Y, Jiang Y, Lai Q, Feng J, Feng J, Ma T, Wu J, Liu S, Zhang L, Long Z, Chen L, Street NR, Ingvarsson PK, Liu J, Yin T, Wang J. The super-pangenome of Populus unveils genomic facets for its adaptation and diversification in widespread forest trees. MOLECULAR PLANT 2024; 17:725-746. [PMID: 38486452 DOI: 10.1016/j.molp.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/22/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Understanding the underlying mechanisms and links between genome evolution and adaptive innovations stands as a key goal in evolutionary studies. Poplars, among the world's most widely distributed and cultivated trees, exhibit extensive phenotypic diversity and environmental adaptability. In this study, we present a genus-level super-pangenome comprising 19 Populus genomes, revealing the likely pivotal role of private genes in facilitating local environmental and climate adaptation. Through the integration of pangenomes with transcriptomes, methylomes, and chromatin accessibility mapping, we unveil that the evolutionary trajectories of pangenes and duplicated genes are closely linked to local genomic landscapes of regulatory and epigenetic architectures, notably CG methylation in gene-body regions. Further comparative genomic analyses have enabled the identification of 142 202 structural variants across species that intersect with a significant number of genes and contribute substantially to both phenotypic and adaptive divergence. We have experimentally validated a ∼180-bp presence/absence variant affecting the expression of the CUC2 gene, crucial for leaf serration formation. Finally, we developed a user-friendly web-based tool encompassing the multi-omics resources associated with the Populus super-pangenome (http://www.populus-superpangenome.com). Together, the present pioneering super-pangenome resource in forest trees not only aids in the advancement of breeding efforts of this globally important tree genus but also offers valuable insights into potential avenues for comprehending tree biology.
Collapse
Affiliation(s)
- Tingting Shi
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Xinxin Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yukang Hou
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Changfu Jia
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Xuming Dan
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yulin Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Yuanzhong Jiang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Qiang Lai
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jiajun Feng
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jianju Feng
- College of Horticulture and Forestry, Tarim University, Alar 843300, China
| | - Tao Ma
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jiali Wu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Shuyu Liu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Lei Zhang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Zhiqin Long
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Liyang Chen
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Nathaniel R Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Västerbotten, Sweden
| | - Pär K Ingvarsson
- Linnean Centre for Plant Biology, Department of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jianquan Liu
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.
| | - Tongming Yin
- The Key Laboratory of Tree Genetics and Biotechnology of Jiangsu Province and Education Department of China, Nanjing Forestry University, Nanjing, Jiangsu, China.
| | - Jing Wang
- Key Laboratory for Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
34
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
35
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
36
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
37
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2024:10.1038/s41587-024-02190-7. [PMID: 38519720 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
38
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
39
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|
40
|
Cui X, Lin Q, Chen M, Wang Y, Wang Y, Wang Y, Tao J, Yin H, Zhao T. Long-read sequencing unveils novel somatic variants and methylation patterns in the genetic information system of early lung cancer. Comput Biol Med 2024; 171:108174. [PMID: 38442557 DOI: 10.1016/j.compbiomed.2024.108174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 01/25/2024] [Accepted: 02/18/2024] [Indexed: 03/07/2024]
Abstract
Lung cancer poses a global health challenge, necessitating advanced diagnostics for improved outcomes. Intensive efforts are ongoing to pinpoint early detection biomarkers, such as genomic variations and DNA methylation, to elevate diagnostic precision. We conducted long-read sequencing on cancerous and adjacent non-cancerous tissues from a patient with lung adenocarcinoma. We identified somatic structural variations (SVs) specific to lung cancer by integrating data from various SV calling methods and differentially methylated regions (DMRs) that were distinct between these two tissue samples, revealing a unique methylation pattern associated with lung cancer. This study discovered over 40,000 somatic SVs and over 180,000 DMRs linked to lung cancer. We identified approximately 700 genes of significant relevance through comprehensive analysis, including genes intricately associated with many lung cancers, such as NOTCH1, SMOC2, CSMD2, and others. Furthermore, we observed that somatic SVs and DMRs were substantially enriched in several pathways, such as axon guidance signaling pathways, which suggests a comprehensive multi-omics impact on lung cancer progression across various biological investigation levels. These datasets can potentially serve as biomarkers for early lung cancer detection and may hold significant value in clinical diagnosis and treatment applications.
Collapse
Affiliation(s)
- Xinran Cui
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Qingyan Lin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Ming Chen
- Institute of Bioinformatics, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China
| | - Yidan Wang
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China
| | - Yiwen Wang
- Tanwei College, Tsinghua University, Shuangqing Road, Beijing, 100084, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Jiang Tao
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| | - Honglei Yin
- Department of Respiratory and Critical Care, Heilongjiang Provincial Hospital, 405 Gorokhovaya Street, Harbin, Heilongjiang, 150000, China.
| | - Tianyi Zhao
- School of Medicine, Harbin Institute of Technology, 92 West Da Zhi St, Harbin, Heilongjiang, 150000, China.
| |
Collapse
|
41
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
Affiliation(s)
- Wataru Nakamura
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Makoto Hirata
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoyo Oda
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Division of Laboratory Medicine, National Cancer Center Hospital, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Raúl Nicolás Mateos
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahiro Sugawa
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Naoko Iida
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Mineko Ushiama
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Noriko Tanabe
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
| | - Hiromi Sakamoto
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Shigeki Sekine
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Akira Hirasawa
- Department of Clinical Genetics and Genomic Medicine, Okayama University Hospital, Okayama, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Central Biobank, National Center Biobank Network, Tokyo, Japan
| | - Shin-Ichi Tsujimoto
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Norio Shiba
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Shuichi Ito
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Teruhiko Yoshida
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
42
|
Liu X, Zheng J, Ding J, Wu J, Zuo F, Zhang G. When Livestock Genomes Meet Third-Generation Sequencing Technology: From Opportunities to Applications. Genes (Basel) 2024; 15:245. [PMID: 38397234 PMCID: PMC10888458 DOI: 10.3390/genes15020245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/30/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
Third-generation sequencing technology has found widespread application in the genomic, transcriptomic, and epigenetic research of both human and livestock genetics. This technology offers significant advantages in the sequencing of complex genomic regions, the identification of intricate structural variations, and the production of high-quality genomes. Its attributes, including long sequencing reads, obviation of PCR amplification, and direct determination of DNA/RNA, contribute to its efficacy. This review presents a comprehensive overview of third-generation sequencing technologies, exemplified by single-molecule real-time sequencing (SMRT) and Oxford Nanopore Technology (ONT). Emphasizing the research advancements in livestock genomics, the review delves into genome assembly, structural variation detection, transcriptome sequencing, and epigenetic investigations enabled by third-generation sequencing. A comprehensive analysis is conducted on the application and potential challenges of third-generation sequencing technology for genome detection in livestock. Beyond providing valuable insights into genome structure analysis and the identification of rare genes in livestock, the review ventures into an exploration of the genetic mechanisms underpinning exemplary traits. This review not only contributes to our understanding of the genomic landscape in livestock but also provides fresh perspectives for the advancement of research in this domain.
Collapse
Affiliation(s)
- Xinyue Liu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Junyuan Zheng
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jialan Ding
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jiaxin Wu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Fuyuan Zuo
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| | - Gongwei Zhang
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| |
Collapse
|
43
|
Li H, Liu Y, Fan P, Dai Z, Hao J, Duan W, Liang Z, Wang Y. The Genome of Vitis zhejiang-adstricta Strengthens the Protection and Utilization of the Endangered Ancient Grape Endemic to China. PLANT & CELL PHYSIOLOGY 2024; 65:216-227. [PMID: 37930871 PMCID: PMC10873524 DOI: 10.1093/pcp/pcad140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 10/31/2023] [Accepted: 11/01/2023] [Indexed: 11/08/2023]
Abstract
Vitis zhejiang-adstricta (V. zhejiang-adstricta) is one of the most important and endangered wild grapes. It is a national key protected wild, rare and endangered ancient grape endemic to China and used as a candidate material for resistance breeding owing to its excellent significant disease resistance. Here, we present a high-quality chromosome-level assembly of V. zhejiang-adstricta (IB-VB-01), comprising 506.66 Mb assembled into 19 pseudo-chromosomes. The contig N50 length is 3.91 Mb with 31,196 annotated protein-coding genes. Comparative genome and evolutionary analyses illustrated that V. zhejiang-adstricta has a specific position in the evolution of East Asian Vitis and shared a common ancestor with Vitis vinifera during the divergence of the two species about 10.42 (between 9.34 and 11.12) Mya. The expanded gene families compared with those in plants were related to disease resistance, and constructed gene families were related to plant growth and primary metabolism. With the analysis of gene family expansion and contraction, the evolution of environmental adaptability and especially the NBS-LRR gene family of V. zhejiang-adstricta was elucidated based on the pathways of resistance genes (R genes), unique genes and structural variations. The near-complete and accurate diploid V. zhejiang-adstricta reference genome obtained herein serves as an important complement to wild grape genomes and will provide valuable genomic resources for investigating the genomic architecture of V. zhejiang-adstricta as well as for improving disease resistance breeding strategies in grape.
Collapse
Affiliation(s)
- Huayang Li
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- University of Chinese Academy of Sciences, 19 Yuquan Rd, Beijing 100049, PR China
| | - Yongbo Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, 8 Dayangfang, Beijing 100012, PR China
| | - Peige Fan
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| | - Zhanwu Dai
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| | - Jiachen Hao
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| | - Wei Duan
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| | - Zhenchang Liang
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| | - Yi Wang
- Beijing Key Laboratory of Grape Science and Enology, CAS Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
- China National Botanical Garden, 20 Nanxincun, Xiangshan, Beijing 100093, PR China
| |
Collapse
|
44
|
Cautereels C, Smets J, Bircham P, De Ruysscher D, Zimmermann A, De Rijk P, Steensels J, Gorkovskiy A, Masschelein J, Verstrepen KJ. Combinatorial optimization of gene expression through recombinase-mediated promoter and terminator shuffling in yeast. Nat Commun 2024; 15:1112. [PMID: 38326309 PMCID: PMC10850122 DOI: 10.1038/s41467-024-44997-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/12/2024] [Indexed: 02/09/2024] Open
Abstract
Microbes are increasingly employed as cell factories to produce biomolecules. This often involves the expression of complex heterologous biosynthesis pathways in host strains. Achieving maximal product yields and avoiding build-up of (toxic) intermediates requires balanced expression of every pathway gene. However, despite progress in metabolic modeling, the optimization of gene expression still heavily relies on trial-and-error. Here, we report an approach for in vivo, multiplexed Gene Expression Modification by LoxPsym-Cre Recombination (GEMbLeR). GEMbLeR exploits orthogonal LoxPsym sites to independently shuffle promoter and terminator modules at distinct genomic loci. This approach facilitates creation of large strain libraries, in which expression of every pathway gene ranges over 120-fold and each strain harbors a unique expression profile. When applied to the biosynthetic pathway of astaxanthin, an industrially relevant antioxidant, a single round of GEMbLeR improved pathway flux and doubled production titers. Together, this shows that GEMbLeR allows rapid and efficient gene expression optimization in heterologous biosynthetic pathways, offering possibilities for enhancing the performance of microbial cell factories.
Collapse
Affiliation(s)
- Charlotte Cautereels
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Jolien Smets
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Peter Bircham
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Dries De Ruysscher
- Molecular Biotechnology of Plants and Micro-organisms, Department of Biology, KU Leuven, Kasteelpark Arenberg 31, box 2438, Leuven, 3001, Belgium
- Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
| | - Anna Zimmermann
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Peter De Rijk
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, 2610, Belgium
- Neuromics Support Facility, Department of Biomedical Sciences, University of Antwerp, Antwerp, 2610, Belgium
| | - Jan Steensels
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Anton Gorkovskiy
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium
| | - Joleen Masschelein
- Molecular Biotechnology of Plants and Micro-organisms, Department of Biology, KU Leuven, Kasteelpark Arenberg 31, box 2438, Leuven, 3001, Belgium
- Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium
| | - Kevin J Verstrepen
- VIB Laboratory for Systems Biology, VIB-KU Leuven Center for Microbiology, Leuven, 3001, Belgium.
- Laboratory of Genetics and Genomics, Center of Microbial and Plant Genetics, Department M2S, KU Leuven, Gaston Geenslaan 1, Leuven, 3001, Belgium.
| |
Collapse
|
45
|
Zheng Z, Zhu M, Zhang J, Liu X, Hou L, Liu W, Yuan S, Luo C, Yao X, Liu J, Yang Y. A sequence-aware merger of genomic structural variations at population scale. Nat Commun 2024; 15:960. [PMID: 38307885 PMCID: PMC10837428 DOI: 10.1038/s41467-024-45244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open
Abstract
Merging structural variations (SVs) at the population level presents a significant challenge, yet it is essential for conducting comprehensive genotypic analyses, especially in the era of pangenomics. Here, we introduce PanPop, a tool that utilizes an advanced sequence-aware SV merging algorithm to efficiently merge SVs of various types. We demonstrate that PanPop can merge and optimize the majority of multiallelic SVs into informative biallelic variants. We show its superior precision and lower rates of missing data compared to alternative software solutions. Our approach not only enables the filtering of SVs by leveraging multiple SV callers for enhanced accuracy but also facilitates the accurate merging of large-scale population SVs. These capabilities of PanPop will help to accelerate future SV-related studies.
Collapse
Affiliation(s)
- Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Mingjia Zhu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jin Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinfeng Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Liqiang Hou
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Shuai Yuan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Changhong Luo
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinhao Yao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| |
Collapse
|
46
|
Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics 2024; 40:btae066. [PMID: 38366603 PMCID: PMC10898327 DOI: 10.1093/bioinformatics/btae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 01/16/2024] [Accepted: 02/14/2024] [Indexed: 02/18/2024] Open
Abstract
MOTIVATION Genomic variations comprise a spectrum of alterations, ranging from single nucleotide polymorphisms (SNPs) to large-scale structural variants (SVs), which play crucial roles in bacterial evolution and species diversification. Accurately identifying SNPs and SVs is beneficial for subsequent evolutionary and epidemiological studies. This study presents VariantDetective (VD), a novel, user-friendly, and all-in-one pipeline combining SNP and SV calling to generate consensus genomic variants using multiple tools. RESULTS The VD pipeline accepts various file types as input to initiate SNP and/or SV calling, and benchmarking results demonstrate VD's robustness and high accuracy across multiple tested datasets when compared to existing variant calling approaches. AVAILABILITY AND IMPLEMENTATION The source code, test data, and relevant information for VD are freely accessible at https://github.com/OLF-Bioinformatics/VariantDetective under the MIT License.
Collapse
Affiliation(s)
- Philippe Charron
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| | - Mingsong Kang
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| |
Collapse
|
47
|
Wang N, Chen P, Xu Y, Guo L, Li X, Yi H, Larkin RM, Zhou Y, Deng X, Xu Q. Phased genomics reveals hidden somatic mutations and provides insight into fruit development in sweet orange. HORTICULTURE RESEARCH 2024; 11:uhad268. [PMID: 38371640 PMCID: PMC10873711 DOI: 10.1093/hr/uhad268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 12/01/2023] [Indexed: 02/20/2024]
Abstract
Although revisiting the discoveries and implications of genetic variations using phased genomics is critical, such efforts are still lacking. Somatic mutations represent a crucial source of genetic diversity for breeding and are especially remarkable in heterozygous perennial and asexual crops. In this study, we focused on a diploid sweet orange (Citrus sinensis) and constructed a haplotype-resolved genome using high fidelity (HiFi) reads, which revealed 10.6% new sequences. Based on the phased genome, we elucidate significant genetic admixtures and haplotype differences. We developed a somatic detection strategy that reveals hidden somatic mutations overlooked in a single reference genome. We generated a phased somatic variation map by combining high-depth whole-genome sequencing (WGS) data from 87 sweet orange somatic varieties. Notably, we found twice as many somatic mutations relative to a single reference genome. Using these hidden somatic mutations, we separated sweet oranges into seven major clades and provide insight into unprecedented genetic mosaicism and strong positive selection. Furthermore, these phased genomics data indicate that genomic heterozygous variations contribute to allele-specific expression during fruit development. By integrating allelic expression differences and somatic mutations, we identified a somatic mutation that induces increases in fruit size. Applications of phased genomics will lead to powerful approaches for discovering genetic variations and uncovering their effects in highly heterozygous plants. Our data provide insight into the hidden somatic mutation landscape in the sweet orange genome, which will facilitate citrus breeding.
Collapse
Affiliation(s)
- Nan Wang
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Peng Chen
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
- Yuelu Mountain Laboratory, Changsha, China
| | - Yuanyuan Xu
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
- Yuelu Mountain Laboratory, Changsha, China
| | - Lingxia Guo
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
- Yuelu Mountain Laboratory, Changsha, China
| | - Xianxin Li
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
- Yuelu Mountain Laboratory, Changsha, China
| | - Hualin Yi
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Robert M Larkin
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Yongfeng Zhou
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- National Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China
| | - Xiuxin Deng
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Qiang Xu
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| |
Collapse
|
48
|
Lv Y, Liu C, Li X, Wang Y, He H, He W, Chen W, Yang L, Dai X, Cao X, Yu X, Liu J, Zhang B, Wei H, Zhang H, Qian H, Shi C, Leng Y, Liu X, Guo M, Wang X, Zhang Z, Wang T, Zhang B, Xu Q, Cui Y, Zhang Q, Yuan Q, Jahan N, Ma J, Zheng X, Zhou Y, Qian Q, Guo L, Shang L. A centromere map based on super pan-genome highlights the structure and function of rice centromeres. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2024; 66:196-207. [PMID: 38158885 DOI: 10.1111/jipb.13607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
Rice (Oryza sativa) is a significant crop worldwide with a genome shaped by various evolutionary factors. Rice centromeres are crucial for chromosome segregation, and contain some unreported genes. Due to the diverse and complex centromere region, a comprehensive understanding of rice centromere structure and function at the population level is needed. We constructed a high-quality centromere map based on the rice super pan-genome consisting of a 251-accession panel comprising both cultivated and wild species of Asian and African rice. We showed that rice centromeres have diverse satellite repeat CentO, which vary across chromosomes and subpopulations, reflecting their distinct evolutionary patterns. We also revealed that long terminal repeats (LTRs), especially young Gypsy-type LTRs, are abundant in the peripheral CentO-enriched regions and drive rice centromere expansion and evolution. Furthermore, high-quality genome assembly and complete telomere-to-telomere (T2T) reference genome enable us to obtain more centromeric genome information despite mapping and cloning of centromere genes being challenging. We investigated the association between structural variations and gene expression in the rice centromere. A centromere gene, OsMAB, which positively regulates rice tiller number, was further confirmed by expression quantitative trait loci, haplotype analysis and clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated protein 9 methods. By revealing the new insights into the evolutionary patterns and biological roles of rice centromeres, our finding will facilitate future research on centromere biology and crop improvement.
Collapse
Affiliation(s)
- Yang Lv
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Congcong Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaoxia Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yueying Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Huiying He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wenchuang He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wu Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Longbo Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaofan Dai
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xinglan Cao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiaoman Yu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Jiajia Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bin Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hua Wei
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hong Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Hongge Qian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chuanlin Shi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yue Leng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xiangpei Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Mingliang Guo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xianmeng Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Zhipeng Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Tianyi Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Bintao Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiang Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yan Cui
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qianqian Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiaoling Yuan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Noushin Jahan
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Jie Ma
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Xiaoming Zheng
- Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, 572024, China
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yongfeng Zhou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qian Qian
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, 572024, China
| | - Longbiao Guo
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
| | - Lianguang Shang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, 572024, China
| |
Collapse
|
49
|
Wei ZG, Zhang XD, Fan XG, Qian Y, Liu F, Wu FX. pathMap: a path-based mapping tool for long noisy reads with high sensitivity. Brief Bioinform 2024; 25:bbae107. [PMID: 38517696 PMCID: PMC10959152 DOI: 10.1093/bib/bbae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/25/2023] [Accepted: 02/28/2024] [Indexed: 03/24/2024] Open
Abstract
With the rapid development of single-molecule sequencing (SMS) technologies, the output read length is continuously increasing. Mapping such reads onto a reference genome is one of the most fundamental tasks in sequence analysis. Mapping sensitivity is becoming a major concern since high sensitivity can detect more aligned regions on the reference and obtain more aligned bases, which are useful for downstream analysis. In this study, we present pathMap, a novel k-mer graph-based mapper that is specifically designed for mapping SMS reads with high sensitivity. By viewing the alignment chain as a path containing as many anchors as possible in the matched k-mer graph, pathMap treats chaining as a path selection problem in the directed graph. pathMap iteratively searches the longest path in the remaining nodes; more candidate chains with high quality can be effectively detected and aligned. Compared to other state-of-the-art mapping methods such as minimap2 and Winnowmap2, experiment results on simulated and real-life datasets demonstrate that pathMap obtains the number of mapped chains at least 11.50% more than its closest competitor and increases the mapping sensitivity by 17.28% and 13.84% of bases over the next-best mapper for Pacific Biosciences and Oxford Nanopore sequencing data, respectively. In addition, pathMap is more robust to sequence errors and more sensitive to species- and strain-specific identification of pathogens using MinION reads.
Collapse
Affiliation(s)
- Ze-Gang Wei
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Xiao-Dan Zhang
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Xing-Guo Fan
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Yu Qian
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Fei Liu
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
50
|
Zhang Z, Jiang T, Li G, Cao S, Liu Y, Liu B, Wang Y. Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data. Brief Bioinform 2024; 25:bbae049. [PMID: 38385878 PMCID: PMC10883419 DOI: 10.1093/bib/bbae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.
Collapse
Affiliation(s)
- Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Gaoyang Li
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|