1
|
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Genome-wide association testing beyond SNPs. Nat Rev Genet 2024:10.1038/s41576-024-00778-y. [PMID: 39375560 DOI: 10.1038/s41576-024-00778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/09/2024]
Abstract
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
Collapse
Affiliation(s)
- Laura Harris
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ellen M McDonagh
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Xiaolei Zhang
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Katherine Fawcett
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Amy Foreman
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Petr Daneck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Panagiotis I Sergouniotis
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Francesco Mazzarotto
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Michael Inouye
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Ewan Birney
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
2
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
3
|
Nakamichi K, Huey J, Sangermano R, Place EM, Bujakowska KM, Marra M, Everett LA, Yang P, Chao JR, Van Gelder RN, Mustafi D. Targeted long-read sequencing enriches disease-relevant genomic regions of interest to provide complete Mendelian disease diagnostics. JCI Insight 2024; 9:e183902. [PMID: 39264853 PMCID: PMC11530123 DOI: 10.1172/jci.insight.183902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 09/10/2024] [Indexed: 09/14/2024] Open
Abstract
Despite advances in sequencing technologies, a molecular diagnosis remains elusive in many patients with Mendelian disease. Current short-read clinical sequencing approaches cannot provide chromosomal phase information or epigenetic information without further sample processing, which is not routinely done and can result in an incomplete molecular diagnosis in patients. The ability to provide phased genetic and epigenetic information from a single sequencing run would improve the diagnostic rate of Mendelian conditions. Here, we describe targeted long-read sequencing of Mendelian disease genes (TaLon-SeqMD) using a real-time adaptive sequencing approach. Optimization of bioinformatic targeting enabled selective enrichment of multiple disease-causing regions of the human genome. Haplotype-resolved variant calling and simultaneous resolution of epigenetic base modification could be achieved in a single sequencing run. The TaLon-SeqMD approach was validated in a cohort of 18 individuals with previous genetic testing targeting 373 inherited retinal disease (IRD) genes, yielding the complete molecular diagnosis in each case. This approach was then applied in 2 IRD cases with inconclusive testing, which uncovered noncoding and structural variants that were difficult to characterize by standard short-read sequencing. Overall, these results demonstrate TaLon-SeqMD as an approach to provide rapid phased-variant calling to provide the molecular basis of Mendelian diseases.
Collapse
Affiliation(s)
- Kenji Nakamichi
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Jennifer Huey
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Riccardo Sangermano
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Emily M. Place
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Kinga M. Bujakowska
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Molly Marra
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Lesley A. Everett
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Paul Yang
- Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Jennifer R. Chao
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
| | - Russell N. Van Gelder
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
- Departments of Laboratory Medicine and Pathology and Biological Structure, University of Washington, Seattle, Washington, USA
| | - Debarshi Mustafi
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA
- Roger and Karalis Johnson Retina Center, Seattle, Washington, USA
- Brotman Baty Institute for Precision Medicine, Seattle, Washington, USA
- Division of Ophthalmology, Seattle Children’s Hospital, Seattle, Washington, USA
| |
Collapse
|
4
|
Jiang T, Zhou Z, Zhang Z, Cao S, Wang Y, Liu Y. MEHunter: transformer-based mobile element variant detection from long reads. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae557. [PMID: 39287014 DOI: 10.1093/bioinformatics/btae557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/03/2024] [Accepted: 09/13/2024] [Indexed: 09/19/2024]
Abstract
SUMMARY Mobile genetic elements (MEs) are heritable mutagens that significantly contribute to genetic diseases. The advent of long-read sequencing technologies, capable of resolving large DNA fragments, offers promising prospects for the comprehensive detection of ME variants (MEVs). However, achieving high precision while maintaining recall performance remains challenging mainly brought by the variable length and similar content of MEV signatures, which are often obscured by the noise in long reads. Here, we propose MEHunter, a high-performance MEV detection approach utilizing a fine-tuned transformer model adept at identifying potential MEVs with fragmented features. Benchmark experiments on both simulated and real datasets demonstrate that MEHunter consistently achieves higher accuracy and sensitivity than the state-of-the-art tools. Furthermore, it is capable of detecting novel potentially individual-specific MEVs that have been overlooked in published population projects. AVAILABILITY AND IMPLEMENTATION MEHunter is available from https://github.com/120L021101/MEHunter.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Zuji Zhou
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China
| |
Collapse
|
5
|
González-Prendes R, Pena RN, Richart C, Nadal J, Ros-Freixedes R. Long-read de novo assembly of the red-legged partridge (Alectoris rufa) genome. Sci Data 2024; 11:908. [PMID: 39191744 PMCID: PMC11349902 DOI: 10.1038/s41597-024-03659-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 07/18/2024] [Indexed: 08/29/2024] Open
Abstract
The red-legged partridge (Alectoris rufa) is a popular game bird species that is in decline in several regions of southwestern Europe. The introduction of farm-reared individuals of a distinct genetic make-up in hunting reserves can result in genetic swamping of wild populations. Here we present a de novo genome assembly for the red-legged partridge based on long-read sequencing technology. The assembled genome size is 1.14 Gb, with scaffold N50 of 37.6 Mb and contig N50 of 29.5 Mb. Our genome is highly contiguous and contains 97.06% of complete avian core genes. Overall, the quality of this genome assembly is equivalent to those available for other close relatives such as the Japanese quail or the chicken. This genome assembly will contribute to the understanding of genetic dynamics of wild populations of red-legged partridges with releases of farm-reared reinforcements and to appropriate management decisions of such populations.
Collapse
Affiliation(s)
- Rayner González-Prendes
- Animal Breeding and Genomics, Wageningen University & Research, 6708PB, Wageningen, The Netherlands
| | - Ramona Natacha Pena
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain
- Agrotecnio-CERCA Center, Lleida, Spain
| | - Cristóbal Richart
- Departament de Medicina i Cirurgia, Universitat Rovira i Virgili, Tarragona, Spain
| | - Jesús Nadal
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain.
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida, Lleida, Spain.
- Agrotecnio-CERCA Center, Lleida, Spain.
| |
Collapse
|
6
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
7
|
Chan YF, Lu CW, Kuo HC, Hung CM. A chromosome-level genome assembly of the Asian house martin implies potential genes associated with the feathered-foot trait. G3 (BETHESDA, MD.) 2024; 14:jkae077. [PMID: 38607414 PMCID: PMC11152083 DOI: 10.1093/g3journal/jkae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 03/04/2024] [Accepted: 03/27/2024] [Indexed: 04/13/2024]
Abstract
The presence of feathers is a vital characteristic among birds, yet most modern birds had no feather on their feet. The discoveries of feathers on the hind limbs of basal birds and dinosaurs have sparked an interest in the evolutionary origin and genetic mechanism of feathered feet. However, the majority of studies investigating the genes associated with this trait focused on domestic populations. Understanding the genetic mechanism underpinned feathered-foot development in wild birds is still in its infancy. Here, we assembled a chromosome-level genome of the Asian house martin (Delichon dasypus) using the long-read High Fidelity sequencing approach to initiate the search for genes associated with its feathered feet. We employed the whole-genome alignment of D. dasypus with other swallow species to identify high-SNP regions and chromosomal inversions in the D. dasypus genome. After filtering out variations unrelated to D. dasypus evolution, we found six genes related to feather development near the high-SNP regions. We also detected three feather development genes in chromosomal inversions between the Asian house martin and the barn swallow genomes. We discussed their association with the wingless/integrated (WNT), bone morphogenetic protein, and fibroblast growth factor pathways and their potential roles in feathered-foot development. Future studies are encouraged to utilize the D. dasypus genome to explore the evolutionary process of the feathered-foot trait in avian species. This endeavor will shed light on the evolutionary path of feathers in birds.
Collapse
Affiliation(s)
- Yuan-Fu Chan
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Wei Lu
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Hao-Chih Kuo
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chih-Ming Hung
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
8
|
Hu H, Gao R, Gao W, Gao B, Jiang Z, Zhou M, Wang G, Jiang T. SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies. Brief Bioinform 2024; 25:bbae336. [PMID: 38980375 PMCID: PMC11232458 DOI: 10.1093/bib/bbae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/03/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Collapse
Affiliation(s)
- Heng Hu
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Runtian Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Wentao Gao
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China
| | - Zhongjun Jiang
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Murong Zhou
- College of Life Sciences, Northeast Forestry University, Harbin 150000, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
- State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China
| | - Tao Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| |
Collapse
|
9
|
Wang S, Lin J, Jia P, Xu T, Li X, Liu Y, Xu D, Bush SJ, Meng D, Ye K. De novo and somatic structural variant discovery with SVision-pro. Nat Biotechnol 2024:10.1038/s41587-024-02190-7. [PMID: 38519720 DOI: 10.1038/s41587-024-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024]
Abstract
Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
Collapse
Affiliation(s)
- Songbo Wang
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Peng Jia
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiujuan Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yuezhuangnan Liu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Dan Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Deyu Meng
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
- Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau
- Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
| | - Kai Ye
- Department of Gynecology and Obstetrics, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
10
|
Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun 2024; 15:2447. [PMID: 38503752 PMCID: PMC10951360 DOI: 10.1038/s41467-024-46614-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 03/04/2024] [Indexed: 03/21/2024] Open
Abstract
Long-read sequencing offers long contiguous DNA fragments, facilitating diploid genome assembly and structural variant (SV) detection. Efficient and robust algorithms for SV identification are crucial with increasing data availability. Alignment-based methods, favored for their computational efficiency and lower coverage requirements, are prominent. Alternative approaches, relying solely on available reads for de novo genome assembly and employing assembly-based tools for SV detection via comparison to a reference genome, demand significantly more computational resources. However, the lack of comprehensive benchmarking constrains our comprehension and hampers further algorithm development. Here we systematically compare 14 read alignment-based SV calling methods (including 4 deep learning-based methods and 1 hybrid method), and 4 assembly-based SV calling methods, alongside 4 upstream aligners and 7 assemblers. Assembly-based tools excel in detecting large SVs, especially insertions, and exhibit robustness to evaluation parameter changes and coverage fluctuations. Conversely, alignment-based tools demonstrate superior genotyping accuracy at low sequencing coverage (5-10×) and excel in detecting complex SVs, like translocations, inversions, and duplications. Our evaluation provides performance insights, highlighting the absence of a universally superior tool. We furnish guidelines across 31 criteria combinations, aiding users in selecting the most suitable tools for diverse scenarios and offering directions for further method development.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Can Luo
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Staunton G Golding
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA
| | - Jacob B Ioffe
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, 37235, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, 37235, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, 37235, Nashville, TN, USA.
| |
Collapse
|
11
|
Lesack KJ, Wasmuth JD. The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data. PeerJ 2024; 12:e17101. [PMID: 38500526 PMCID: PMC10946394 DOI: 10.7717/peerj.17101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 02/21/2024] [Indexed: 03/20/2024] Open
Abstract
Background Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization. Conclusion The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.
Collapse
Affiliation(s)
- Kyle J. Lesack
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
12
|
Zheng Y, Shang X. SVvalidation: A long-read-based validation method for genomic structural variation. PLoS One 2024; 19:e0291741. [PMID: 38181020 PMCID: PMC10769053 DOI: 10.1371/journal.pone.0291741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/05/2023] [Indexed: 01/07/2024] Open
Abstract
Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
13
|
Xiao Y, Xiao Z, Liu L, Ma Y, Zhao H, Wu Y, Huang J, Xu P, Liu J, Li J. Innovative approach for high-throughput exploiting sex-specific markers in Japanese parrotfish Oplegnathus fasciatus. Gigascience 2024; 13:giae045. [PMID: 39028586 PMCID: PMC11258905 DOI: 10.1093/gigascience/giae045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/21/2024] [Accepted: 06/22/2024] [Indexed: 07/21/2024] Open
Abstract
BACKGROUND The use of sex-specific molecular markers has become a prominent method in enhancing fish production and economic value, as well as providing a foundation for understanding the complex molecular mechanisms involved in fish sex determination. Over the past decades, research on male and female sex identification has predominantly employed molecular biology methodologies such as restriction fragment length polymorphism, random amplification of polymorphic DNA, simple sequence repeat, and amplified fragment length polymorphism. The emergence of high-throughput sequencing technologies, particularly Illumina, has led to the utilization of single nucleotide polymorphism and insertion/deletion variants as significant molecular markers for investigating sex identification in fish. The advancement of sex-controlled breeding encounters numerous challenges, including the inefficiency of current methods, intricate experimental protocols, high costs of development, elevated rates of false positives, marker instability, and cumbersome field-testing procedures. Nevertheless, the emergence and swift progress of PacBio high-throughput sequencing technology, characterized by its long-read output capabilities, offers novel opportunities to overcome these obstacles. FINDINGS Utilizing male/female assembled genome information in conjunction with short-read sequencing data survey and long-read PacBio sequencing data, a catalog of large-segment (>100 bp) insertion/deletion genetic variants was generated through a genome-wide variant site-scanning approach with bidirectional comparisons. The sequence tagging sites were ranked based on the long-read depth of the insertion/deletion site, with markers exhibiting lower long-read depth being considered more effective for large-segment deletion variants. Subsequently, a catalog of bulk primers and simulated PCR for the male/female variant loci was developed, incorporating primer design for the target region and electronic PCR (e-PCR) technology. The Japanese parrotfish (Oplegnathus fasciatus), belonging to the Oplegnathidae family within the Centrarchiformes order, holds significant economic value as a rocky reef fish indigenous to East Asia. The criteria for rapid identification of male and female differences in Japanese parrotfish were established through agarose gel electrophoresis, which revealed 2 amplified bands for males and 1 amplified band for females. A high-throughput identification catalog of sex-specific markers was then constructed using this method, resulting in the identification of 3,639 (2,786 INS/853 DEL, ♀ as reference) and 3,672 (2,876 INS/833 DEL, ♂ as reference) markers in conjunction with 1,021 and 894 high-quality genetic sex identification markers, respectively. Sixteen differential loci were randomly chosen from the catalog for validation, with 11 of them meeting the criteria for male/female distinctions. The implementation of cost-effective and efficient technological processes would facilitate the rapid advancement of genetic breeding through expediting the high-throughput development of sex genetic markers for various species. CONCLUSIONS Our study utilized assembled genome information from male and female individuals obtained from PacBio, in addition to data from short-read sequencing data survey and long-read PacBio sequencing data. We extensively employed genome-wide variant site scanning and identification, high-throughput primer design of target regions, and e-PCR batch amplification, along with statistical analysis and ranking of the long-read depth of the variant sites. Through this integrated approach, we successfully compiled a catalog of large insertion/deletion sites (>100 bp) in both male and female Japanese parrotfish.
Collapse
Affiliation(s)
- Yongshuang Xiao
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Zhizhong Xiao
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Weihai Hao Huigan Marine Biotechnology Co., Weihai, 26449, China
| | - Lin Liu
- Wuhan Frasergen Bioinformatics Co., Ltd, East Lake High-Tech Zone, Wuhan, 430073, China
| | - Yuting Ma
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Haixia Zhao
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Yanduo Wu
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Jinwei Huang
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Pingrui Xu
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Jing Liu
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Jun Li
- Center for Ocean Mega-Science, Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao, 266071, China
- Shandong Province Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| |
Collapse
|
14
|
Brannan EO, Hartley GA, O’Neill RJ. Mechanisms of Rapid Karyotype Evolution in Mammals. Genes (Basel) 2023; 15:62. [PMID: 38254952 PMCID: PMC10815390 DOI: 10.3390/genes15010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 12/27/2023] [Accepted: 12/28/2023] [Indexed: 01/24/2024] Open
Abstract
Chromosome reshuffling events are often a foundational mechanism by which speciation can occur, giving rise to highly derivative karyotypes even amongst closely related species. Yet, the features that distinguish lineages prone to such rapid chromosome evolution from those that maintain stable karyotypes across evolutionary time are still to be defined. In this review, we summarize lineages prone to rapid karyotypic evolution in the context of Simpson's rates of evolution-tachytelic, horotelic, and bradytelic-and outline the mechanisms proposed to contribute to chromosome rearrangements, their fixation, and their potential impact on speciation events. Furthermore, we discuss relevant genomic features that underpin chromosome variation, including patterns of fusions/fissions, centromere positioning, and epigenetic marks such as DNA methylation. Finally, in the era of telomere-to-telomere genomics, we discuss the value of gapless genome resources to the future of research focused on the plasticity of highly rearranged karyotypes.
Collapse
Affiliation(s)
- Emry O. Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA; (E.O.B.); (G.A.H.)
| | - Gabrielle A. Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA; (E.O.B.); (G.A.H.)
| | - Rachel J. O’Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA; (E.O.B.); (G.A.H.)
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
15
|
Ghielmetti G, Loubser J, Kerr TJ, Stuber T, Thacker T, Martin LC, O'Hare MA, Mhlophe SK, Okunola A, Loxton AG, Warren RM, Moseley MH, Miller MA, Goosen WJ. Advancing animal tuberculosis surveillance using culture-independent long-read whole-genome sequencing. Front Microbiol 2023; 14:1307440. [PMID: 38075895 PMCID: PMC10699144 DOI: 10.3389/fmicb.2023.1307440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 10/23/2023] [Indexed: 02/12/2024] Open
Abstract
Animal tuberculosis is a significant infectious disease affecting both livestock and wildlife populations worldwide. Effective disease surveillance and characterization of Mycobacterium bovis (M. bovis) strains are essential for understanding transmission dynamics and implementing control measures. Currently, sequencing of genomic information has relied on culture-based methods, which are time-consuming, resource-demanding, and concerning in terms of biosafety. This study explores the use of culture-independent long-read whole-genome sequencing (WGS) for a better understanding of M. bovis epidemiology in African buffaloes (Syncerus caffer). By comparing two sequencing approaches, we evaluated the efficacy of Illumina WGS performed on culture extracts and culture-independent Oxford Nanopore adaptive sampling (NAS). Our objective was to assess the potential of NAS to detect genomic variants without sample culture. In addition, culture-independent amplicon sequencing, targeting mycobacterial-specific housekeeping and full-length 16S rRNA genes, was applied to investigate the presence of microorganisms, including nontuberculous mycobacteria. The sequencing quality obtained from DNA extracted directly from tissues using NAS is comparable to the sequencing quality of reads generated from culture-derived DNA using both NAS and Illumina technologies. We present a new approach that provides complete and accurate genome sequence reconstruction, culture independently, and using an economically affordable technique.
Collapse
Affiliation(s)
- Giovanni Ghielmetti
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Section of Veterinary Bacteriology, Institute for Food Safety and Hygiene, Vetsuisse Faculty, University of Zurich, Zurich, Switzerland
| | - Johannes Loubser
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Tanya J. Kerr
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Tod Stuber
- National Veterinary Services Laboratories, Veterinary Services, Animal and Plant Health Inspection Service, United States Department of Agriculture, Ames, IA, United States
| | - Tyler Thacker
- National Veterinary Services Laboratories, Veterinary Services, Animal and Plant Health Inspection Service, United States Department of Agriculture, Ames, IA, United States
| | - Lauren C. Martin
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Michaela A. O'Hare
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Sinegugu K. Mhlophe
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Abisola Okunola
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Andre G. Loxton
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Robin M. Warren
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Mark H. Moseley
- School of Biological Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Michele A. Miller
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Wynand J. Goosen
- Division of Molecular Biology and Human Genetics, South African Medical Research Council Centre for Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| |
Collapse
|
16
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 PMCID: PMC11208083 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
17
|
Ren J, Gu B, Chaisson MJP. vamos: variable-number tandem repeats annotation using efficient motif sets. Genome Biol 2023; 24:175. [PMID: 37501141 PMCID: PMC10373352 DOI: 10.1186/s13059-023-03010-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity. Using vamos we estimate 7.4-16.7 alleles per locus when applied to 74 haplotype-resolved human assemblies, compared to breakpoint-based approaches that estimate 4.0-5.5 alleles per locus.
Collapse
Affiliation(s)
- Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, US
| |
Collapse
|
18
|
Drazdauskienė U, Kapustina Ž, Medžiūnė J, Dubovskaja V, Sabaliauskaitė R, Jarmalaitė S, Lubys A. Fusion sequencing via terminator-assisted synthesis (FTAS-seq) identifies TMPRSS2 fusion partners in prostate cancer. Mol Oncol 2023; 17:993-1006. [PMID: 37300660 DOI: 10.1002/1878-0261.13428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/26/2023] [Accepted: 04/03/2023] [Indexed: 06/12/2023] Open
Abstract
Genetic rearrangements that fuse an androgen-regulated promoter area with a protein-coding portion of an originally androgen-unaffected gene are frequent in prostate cancer, with the fusion between transmembrane serine protease 2 (TMPRSS2) and ETS transcription factor ERG (ERG) (TMPRSS2-ERG fusion) being the most prevalent. Conventional hybridization- or amplification-based methods can test for the presence of expected gene fusions, but the exploratory analysis of currently unknown fusion partners is often cost-prohibitive. Here, we developed an innovative next-generation sequencing (NGS)-based approach for gene fusion analysis termed fusion sequencing via terminator-assisted synthesis (FTAS-seq). FTAS-seq can be used to enrich the gene of interest while simultaneously profiling the whole spectrum of its 3'-terminal fusion partners. Using this novel semi-targeted RNA-sequencing technique, we were able to identify 11 previously uncharacterized TMPRSS2 fusion partners and capture a range of TMPRSS2-ERG isoforms. We tested the performance of FTAS-seq with well-characterized prostate cancer cell lines and utilized the technique for the analysis of patient RNA samples. FTAS-seq chemistry combined with appropriate primer panels holds great potential as a tool for biomarker discovery that can support the development of personalized cancer therapies.
Collapse
Affiliation(s)
| | | | | | | | | | - Sonata Jarmalaitė
- National Cancer Institute, Vilnius, Lithuania
- Institute of Biosciences, Life Sciences Center, Vilnius University, Lithuania
| | - Arvydas Lubys
- Thermo Fisher Scientific Baltics, Vilnius, Lithuania
| |
Collapse
|