1
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
2
|
Malamon JS, Farrell JJ, Xia LC, Dombroski BA, Das RG, Way J, Kuzma AB, Valladares O, Leung YY, Scanlon AJ, Lopez IAB, Brehony J, Worley KC, Zhang NR, Wang LS, Farrer LA, Schellenberg GD, Lee WP, Vardarajan BN. A comparative study of structural variant calling in WGS from Alzheimer's disease families. Life Sci Alliance 2024; 7:e202302181. [PMID: 38418088 PMCID: PMC10902710 DOI: 10.26508/lsa.202302181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 03/01/2024] Open
Abstract
Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.
Collapse
Affiliation(s)
- John S Malamon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John J Farrell
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Li Charlie Xia
- https://ror.org/03mtd9a03 Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Rueben G Das
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jessica Way
- Broad Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Amanda B Kuzma
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Allison J Scanlon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Irving Antonio Barrera Lopez
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jack Brehony
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kim C Worley
- https://ror.org/02pttbw34 Human Genome Sequencing Center, and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lindsay A Farrer
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Neurology and Ophthalmology, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Epidemiology and Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Badri N Vardarajan
- https://ror.org/01esghr10 Gertrude H. Sergievsky Center and Taub Institute of Aging Brain, Department of Neurology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
3
|
Zhang Z, Jiang T, Li G, Cao S, Liu Y, Liu B, Wang Y. Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data. Brief Bioinform 2024; 25:bbae049. [PMID: 38385878 PMCID: PMC10883419 DOI: 10.1093/bib/bbae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/12/2024] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.
Collapse
Affiliation(s)
- Zhendong Zhang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tao Jiang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Gaoyang Li
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuqi Cao
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, 450000, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
4
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.02.573821. [PMID: 38260545 PMCID: PMC10802302 DOI: 10.1101/2024.01.02.573821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, TX, USA
| |
Collapse
|
5
|
Salava H, Deák T, Czepe C, Maghuly F. Sample and Library Preparation for PacBio Long-Read Sequencing in Grapevine. Methods Mol Biol 2024; 2787:183-197. [PMID: 38656490 DOI: 10.1007/978-1-0716-3778-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
PacBio long-read sequencing is a third-generation technology that generates long reads up to 20 kilobases (kb), unlike short-read sequencing instruments that produce up to 600 bases. Long-read sequencing is particularly advantageous in higher organisms, such as humans and plants, where repetitive regions in the genome are more abundant. The PacBio long-read sequencing uses a single molecule, real-time approach where the SMRT cells contain several zero-mode waveguides (ZMWs). Each ZMW contains a single DNA molecule bound by a DNA polymerase. All ZMWs are flushed with deoxy nucleotides with a fluorophore specific to each nucleotide. As the sequencing proceeds, the detector detects the wavelength of the fluorescence and the nucleotides are read in real-time. This chapter describes the sample and library preparation for PacBio long-read sequencing for grapevine.
Collapse
Affiliation(s)
- Hymavathi Salava
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Tamás Deák
- Institute of Viticulture and Oenology, Hungarian University of Agriculture and Life Sciences (MATE), Budapest, Hungary
| | - Carmen Czepe
- Next Generation Sequencing Unit, Vienna Biocenter Core Facilities (VBCF), Vienna, Austria
| | - Fatemeh Maghuly
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
6
|
Hansen MH, Cédile O, Kjeldsen MLG, Thomassen M, Preiss B, von Neuhoff N, Abildgaard N, Nyvold CG. Toward Cytogenomics: Technical Assessment of Long-Read Nanopore Whole-Genome Sequencing for Detecting Large Chromosomal Alterations in Mantle Cell Lymphoma. J Mol Diagn 2023; 25:796-805. [PMID: 37683892 DOI: 10.1016/j.jmoldx.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 06/20/2023] [Accepted: 08/14/2023] [Indexed: 09/10/2023] Open
Abstract
The current advances and success of next-generation sequencing hold the potential for the transition of cancer cytogenetics toward comprehensive cytogenomics. However, the conventional use of short reads impedes the resolution of chromosomal aberrations. Thus, this study evaluated the detection and reproducibility of extensive copy number alterations and chromosomal translocations using long-read Oxford Nanopore Technologies whole-genome sequencing compared with short-read Illumina sequencing. Using the mantle cell lymphoma cell line Granta-519, almost 99% copy-number reproducibility at the 100-kilobase resolution between replicates was demonstrated, with 98% concordance to Illumina. Collectively, the performance of copy number calling from 1.5 million to 7.5 million long reads was comparable to 1 billion Illumina-based reads (50× coverage). Expectedly, the long-read resolution of canonical translocation t(11;14)(q13;q32) was superior, with a sequence similarity of 89% to the already published CCND1/IGH junction (9× coverage), spanning up to 69 kilobases. The cytogenetic profile of Granta-519 was in general agreement with the literature and karyotype, although several differences remained unresolved. In conclusion, contemporary long-read sequencing is primed for future cytogenomics or sequencing-guided cytogenetics. The combined strength of long- and short-read sequencing is apparent, where the high-precision junctional mapping complements and splits paired-end reads. The potential is emphasized by the flexible single-sample genomic data acquisition of Oxford Nanopore Technologies with the high resolution of allelic imbalances using Illumina short-read sequencing.
Collapse
Affiliation(s)
- Marcus H Hansen
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark.
| | - Oriane Cédile
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; OPEN, Odense Patient Data Explorative Network, Odense University Hospital, Odense, Denmark
| | - Marie L G Kjeldsen
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital, Odense, Denmark
| | - Birgitte Preiss
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Pathology, Odense University Hospital, Odense, Denmark
| | - Nils von Neuhoff
- Department of Pediatric Hematology and Oncology, Essen University Hospital and University of Duisburg-Essen, Essen, Germany
| | - Niels Abildgaard
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark
| | - Charlotte G Nyvold
- Hematology-Pathology Research Laboratory, Research Unit of Hematology and Research Unit of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; OPEN, Odense Patient Data Explorative Network, Odense University Hospital, Odense, Denmark
| |
Collapse
|
7
|
Nasca A, Mencacci NE, Invernizzi F, Zech M, Keller Sarmiento IJ, Legati A, Frascarelli C, Bustos BI, Romito LM, Krainc D, Winkelmann J, Carecchio M, Nardocci N, Zorzi G, Prokisch H, Lubbe SJ, Garavaglia B, Ghezzi D. Variants in ATP5F1B are associated with dominantly inherited dystonia. Brain 2023; 146:2730-2738. [PMID: 36860166 PMCID: PMC10316767 DOI: 10.1093/brain/awad068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 12/31/2022] [Accepted: 02/05/2023] [Indexed: 03/03/2023] Open
Abstract
ATP5F1B is a subunit of the mitochondrial ATP synthase or complex V of the mitochondrial respiratory chain. Pathogenic variants in nuclear genes encoding assembly factors or structural subunits are associated with complex V deficiency, typically characterized by autosomal recessive inheritance and multisystem phenotypes. Movement disorders have been described in a subset of cases carrying autosomal dominant variants in structural subunits genes ATP5F1A and ATP5MC3. Here, we report the identification of two different ATP5F1B missense variants (c.1000A>C; p.Thr334Pro and c.1445T>C; p.Val482Ala) segregating with early-onset isolated dystonia in two families, both with autosomal dominant mode of inheritance and incomplete penetrance. Functional studies in mutant fibroblasts revealed no decrease of ATP5F1B protein amount but severe reduction of complex V activity and impaired mitochondrial membrane potential, suggesting a dominant-negative effect. In conclusion, our study describes a new candidate gene associated with isolated dystonia and confirms that heterozygous variants in genes encoding subunits of the mitochondrial ATP synthase may cause autosomal dominant isolated dystonia with incomplete penetrance, likely through a dominant-negative mechanism.
Collapse
Affiliation(s)
- Alessia Nasca
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
| | - Niccolò E Mencacci
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago 60611, IL, USA
| | - Federica Invernizzi
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
| | - Michael Zech
- Institute of Human Genetics, School of Medicine, Technical University of Munich, 81675 Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, 85764 Munich, Germany
| | - Ignacio J Keller Sarmiento
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago 60611, IL, USA
| | - Andrea Legati
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
| | - Chiara Frascarelli
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
| | - Bernabe I Bustos
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago 60611, IL, USA
| | - Luigi M Romito
- Parkinson and Movement Disorders Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20133 Milan, Italy
| | - Dimitri Krainc
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago 60611, IL, USA
| | - Juliane Winkelmann
- Institute of Human Genetics, School of Medicine, Technical University of Munich, 81675 Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, 85764 Munich, Germany
- Lehrstuhl für Neurogenetik, Technische Universität München, 81675 Munich, Germany
- Munich Cluster for Systems Neurology, SyNergy, 81377 Munich, Germany
| | - Miryam Carecchio
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
- Department Neuroscience, University of Padua, 35128 Padua, Italy
- Department of Pediatric Neuroscience, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20133 Milan, Italy
| | - Nardo Nardocci
- Department of Pediatric Neuroscience, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20133 Milan, Italy
| | - Giovanna Zorzi
- Department of Pediatric Neuroscience, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20133 Milan, Italy
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technical University of Munich, 81675 Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, 85764 Munich, Germany
| | - Steven J Lubbe
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago 60611, IL, USA
| | - Barbara Garavaglia
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
| | - Daniele Ghezzi
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20126 Milan, Italy
- Department of Pathophysiology and Transplantation (DEPT), University of Milan, 20122 Milan, Italy
| |
Collapse
|
8
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
9
|
Lee H, Kim J, Lee J. Benchmarking datasets for assembly-based variant calling using high-fidelity long reads. BMC Genomics 2023; 24:148. [PMID: 36973656 PMCID: PMC10045170 DOI: 10.1186/s12864-023-09255-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 03/17/2023] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. RESULTS We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. CONCLUSIONS By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level.
Collapse
Affiliation(s)
- Hyunji Lee
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 08826, Korea
- Department of Biological Sciences, Seoul National University, Seoul, 08826, Korea
| | - Jun Kim
- Department of Biological Sciences, Seoul National University, Seoul, 08826, Korea.
- Research Institute of Basic Sciences, Seoul National University, Seoul, 08826, Korea.
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Korea.
| | - Junho Lee
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 08826, Korea.
- Department of Biological Sciences, Seoul National University, Seoul, 08826, Korea.
- Research Institute of Basic Sciences, Seoul National University, Seoul, 08826, Korea.
| |
Collapse
|
10
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
11
|
Zhang J, Nie C, Li X, Zhao X, Jia Y, Han J, Chen Y, Wang L, Lv X, Yang W, Li K, Zhang J, Ning Z, Bao H, Zhao C, Li J, Qu L. Comprehensive analysis of structural variants in chickens using PacBio sequencing. Front Genet 2022; 13:971588. [PMID: 36338955 PMCID: PMC9632285 DOI: 10.3389/fgene.2022.971588] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Structural variants (SVs) are one of the main sources of genetic variants and have a greater impact on phenotype evolution, disease susceptibility, and environmental adaptations than single nucleotide polymorphisms (SNPs). However, SVs remain challenging to accurately type, with several detection methods showing different limitations. Here, we explored SVs from 10 different chickens using PacBio technology and detected 49,501 high-confidence SVs. The results showed that the PacBio long-read detected more SVs than Illumina short-read technology genomes owing to some SV sites on chromosomes, which are related to chicken growth and development. During chicken domestication, some SVs beneficial to the breed or without any effect on the genomic function of the breed were retained, whereas deleterious SVs were generally eliminated. This study could facilitate the analysis of the genetic characteristics of different chickens and provide a better understanding of their phenotypic characteristics at the SV level, based on the long-read sequencing method. This study enriches our knowledge of SVs in chickens and improves our understanding of chicken genomic diversity.
Collapse
Affiliation(s)
- Jinxin Zhang
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Changsheng Nie
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xinghua Li
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiurong Zhao
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Yaxiong Jia
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jianlin Han
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yu Chen
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Liang Wang
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Xueze Lv
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Weifang Yang
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Kaiyang Li
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Jianwei Zhang
- Beijing Municipal General Station of Animal Science, Beijing, China
| | - Zhonghua Ning
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Haigang Bao
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Chunjiang Zhao
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Junying Li
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lujiang Qu
- Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Lujiang Qu,
| |
Collapse
|
12
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Center for Bioinformatics, Saarland University, Saarbrücken, Germany,
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium,Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | |
Collapse
|
13
|
Gao Y, Ma L, Liu GE. Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods. Genes (Basel) 2022; 13:genes13050828. [PMID: 35627213 PMCID: PMC9142105 DOI: 10.3390/genes13050828] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/01/2022] [Accepted: 05/04/2022] [Indexed: 02/01/2023] Open
Abstract
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.
Collapse
Affiliation(s)
- Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Correspondence: ; Tel.: +1-301-504-9843
| |
Collapse
|
14
|
Yang J, Chaisson MJP. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol 2022; 23:110. [PMID: 35524317 PMCID: PMC9077962 DOI: 10.1186/s13059-022-02666-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 03/30/2022] [Indexed: 01/30/2023] Open
Abstract
Variant benchmarking is often performed by comparing a test callset to a gold standard set of variants. In repetitive regions of the genome, it may be difficult to establish what is the truth for a call, for example, when different alignment scoring metrics provide equally supported but different variant calls on the same data. Here, we provide an alternative approach, TT-Mars, that takes advantage of the recent production of high-quality haplotype-resolved genome assemblies by providing false discovery rates for variant calls based on how well their call reflects the content of the assembly, rather than comparing calls themselves.
Collapse
Affiliation(s)
- Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
15
|
Xu SY. Engineering Infrequent DNA Nicking Endonuclease by Fusion of a BamHI Cleavage-Deficient Mutant and a DNA Nicking Domain. Front Microbiol 2022; 12:787073. [PMID: 35178039 PMCID: PMC8845596 DOI: 10.3389/fmicb.2021.787073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/17/2021] [Indexed: 11/13/2022] Open
Abstract
Strand-specific DNA nicking endonucleases (NEases) typically nick 3–7 bp sites. Our goal is to engineer infrequent NEase with a >8 bp recognition sequence. A BamHI catalytic-deficient mutant D94N/E113K was constructed, purified, and shown to bind and protect the GGATCC site from BamHI restriction. The mutant was fused to a 76-amino acid (aa) DNA nicking domain of phage Gamma HNH (gHNH) NEase. The chimeric enzyme was purified, and it was shown to nick downstream of a composite site 5′ GGATCC-N(4-6)-AC↑CGR 3′ (R, A, or G) or to nick both sides of BamHI site at the composite site 5′ CCG↓GT-N5-GGATCC-N5-AC↑CGG 3′ (the down arrow ↓ indicates the strand shown is nicked; the up arrow↑indicates the bottom strand is nicked). Due to the attenuated activity of the small nicking domain, the fusion nickase is active in the presence of Mn2+ or Ni2+, and it has low activity in Mg2+ buffer. This work provided a proof-of-concept experiment in which a chimeric NEase could be engineered utilizing the binding specificity of a Type II restriction endonucleases (REases) in fusion with a nicking domain to generate infrequent nickase, which bridges the gap between natural REases and homing endonucleases. The engineered chimeric NEase provided a framework for further optimization in molecular diagnostic applications.
Collapse
|
16
|
Methods to Improve Molecular Diagnosis in Genomic Cold Cases in Pediatric Neurology. Genes (Basel) 2022; 13:genes13020333. [PMID: 35205378 PMCID: PMC8871714 DOI: 10.3390/genes13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
During the last decade, genetic testing has emerged as an important etiological diagnostic tool for Mendelian diseases, including pediatric neurological conditions. A genetic diagnosis has a considerable impact on disease management and treatment; however, many cases remain undiagnosed after applying standard diagnostic sequencing techniques. This review discusses various methods to improve the molecular diagnostic rates in these genomic cold cases. We discuss extended analysis methods to consider, non-Mendelian inheritance models, mosaicism, dual/multiple diagnoses, periodic re-analysis, artificial intelligence tools, and deep phenotyping, in addition to integrating various omics methods to improve variant prioritization. Last, novel genomic technologies, including long-read sequencing, artificial long-read sequencing, and optical genome mapping are discussed. In conclusion, a more comprehensive molecular analysis and a timely re-analysis of unsolved cases are imperative to improve diagnostic rates. In addition, our current understanding of the human genome is still limited due to restrictions in technologies. Novel technologies are now available that improve upon some of these limitations and can capture all human genomic variation more accurately. Last, we recommend a more routine implementation of high molecular weight DNA extraction methods that is coherent with the ability to use and/or optimally benefit from these novel genomic methods.
Collapse
|
17
|
Abstract
Summary StructuralVariantAnnotation is an R/Bioconductor package that provides a framework for decoupling downstream analysis of structural variant breakpoints from upstream variant calling methods. It standardizes the representational format from BEDPE, or any of the three different notations supported by VCF into a breakpoint GRanges data structure suitable for use by the wider Bioconductor ecosystem. It handles both transitive breakpoints and duplication/insertion notational differences of identical variants—both common scenarios when comparing short/long read-based call sets that confound downstream analysis. StructuralVariantAnnotation provides the caller-agnostic foundation needed for a R/Bioconductor ecosystem of structural variant annotation, classification and interpretation tools able to handle both simple and complex genomic rearrangements. Availability and implementation StructuralVariantAnnotation is implemented in R and available for download as the Bioconductor StructuralVariantAnnotation package. Details can be found at https://www.bioconductor.org/packages/release/bioc/html/StructuralVariantAnnotation.html. It has been released under a GPL license.
Collapse
Affiliation(s)
| | - Ruining Dong
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC 3052, Australia
| | | |
Collapse
|
18
|
Luo J, Ding H, Shen J, Zhai H, Wu Z, Yan C, Luo H. BreakNet: detecting deletions using long reads and a deep learning approach. BMC Bioinformatics 2021; 22:577. [PMID: 34856923 PMCID: PMC8641175 DOI: 10.1186/s12859-021-04499-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/29/2022] Open
Abstract
Background Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs. Results In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet. Conclusions Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04499-5.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Hongyu Ding
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Jiquan Shen
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Haixia Zhai
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Chaokun Yan
- School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Huimin Luo
- School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China
| |
Collapse
|
19
|
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data. BMC Genomics 2021; 22:826. [PMID: 34789167 PMCID: PMC8596897 DOI: 10.1186/s12864-021-08082-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
Collapse
|
20
|
Mahmoud M, Doddapaneni H, Timp W, Sedlazeck FJ. PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation. Genome Biol 2021; 22:268. [PMID: 34521442 PMCID: PMC8442460 DOI: 10.1186/s13059-021-02486-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022] Open
Abstract
Long-read sequencing has been shown to have advantages in structural variation (SV) detection and methylation calling. Many studies focus either on SV, methylation, or phasing of SNV; however, only the combination of variants provides a comprehensive insight into the sample and thus enables novel findings in biology or medicine. PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection. PRINCESS is publicly available at https://github.com/MeHelmy/princess under the MIT license.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | | | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
21
|
Gingras MC, Sabo A, Cardenas M, Rana A, Dhingra S, Meng Q, Hu J, Muzny DM, Doddapaneni H, Perez L, Korchina V, Nessner C, Liu X, Chao H, Goss J, Gibbs RA. Sequencing of a central nervous system tumor demonstrates cancer transmission in an organ transplant. Life Sci Alliance 2021; 4:4/9/e202000941. [PMID: 34301805 PMCID: PMC8321656 DOI: 10.26508/lsa.202000941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 07/09/2021] [Accepted: 07/12/2021] [Indexed: 11/24/2022] Open
Abstract
This study uses DNA sequencing to trace a donor organ transplant–mediated cancer transmission and illustrates how precise molecular pathology profiles might reduce future risk for transplant recipients. Four organ transplant recipients from an organ donor diagnosed with anaplastic pleomorphic xanthoastrocytoma developed fatal malignancies for which the origin could not be confirmed by standard methods. We identified the somatic mutational profiles of the neoplasms using next-generation sequencing technologies and tracked the relationship between the different samples. The data were consistent with the presence of an aggressive clonal entity in the donor and the subsequent proliferation of descendent tumors in each recipient. Deleterious mutations in BRAF, PIK3CA, SDHC, DDR2, and FANCD2, and a chromosomal deletion spanning the CDKN2A/B genes, were shared between the recipients’ lesions. In addition to demonstrating that DNA sequencing tracked a donor/recipient cancer transmission, this study established that the genetic profile of a donor tumor and its potential aggressive phenotype could have been determined before transplantation was considered. As the genetic correlates of tumor invasion and metastases become better known, adding genetic profiling by DNA sequencing to the data considered for transplant safety should be considered.
Collapse
Affiliation(s)
- Marie-Claude Gingras
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA .,Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Maria Cardenas
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Abbas Rana
- Abdominal Transplant Center, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Sadhna Dhingra
- Department of Pathology, Baylor College of Medicine, Houston, TX, USA
| | - Qingchang Meng
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jianhong Hu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Harshavardhan Doddapaneni
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Lesette Perez
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Viktoriya Korchina
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Caitlin Nessner
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Xiuping Liu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Hsu Chao
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - John Goss
- Abdominal Transplant Center, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
22
|
Gu W, Zhou A, Wang L, Sun S, Cui X, Zhu D. SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data. J Comput Biol 2021; 28:774-788. [PMID: 33973820 DOI: 10.1089/cmb.2021.0048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Genome structural variants (SVs) have great impacts on human phenotype and diversity, and have been linked to numerous diseases. Long-read sequencing technologies arise to make it possible to find SVs of as long as 10,000 nucleotides. Thus, long read-based SV detection has been drawing attention of many recent research projects, and many tools have been developed for long reads to detect SVs recently. In this article, we present a new method, called SVLR, to detect SVs based on long-read sequencing data. Comparing with existing methods, SVLR can detect three new kinds of SVs: block replacements, block interchanges, and translocations. Although these new SVs are structurally more complicated, SVLR achieves accuracies that are comparable with those of the classic SVs. Moreover, for the classic SVs that can be detected by state-of-the-art methods (e.g., SVIM and Sniffles), our experiments demonstrate recall improvements of up to 38% without harming the precisions (i.e., >78%). We also point out three directions to further improve SV detection in the future. Source codes: https://github.com/GWYSDU/SVLR.
Collapse
Affiliation(s)
- Wenyan Gu
- School of Computer Science and Technology, Shandong University, Qindao, China
| | - Aizhong Zhou
- School of Computer Science and Technology, Shandong University, Qindao, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Shiwei Sun
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qindao, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Qindao, China
| |
Collapse
|
23
|
Kronenberg ZN, Rhie A, Koren S, Concepcion GT, Peluso P, Munson KM, Porubsky D, Kuhn K, Mueller KA, Low WY, Hiendleder S, Fedrigo O, Liachko I, Hall RJ, Phillippy AM, Eichler EE, Williams JL, Smith TPL, Jarvis ED, Sullivan ST, Kingan SB. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat Commun 2021; 12:1935. [PMID: 33911078 PMCID: PMC8081726 DOI: 10.1038/s41467-020-20536-y] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/12/2020] [Indexed: 01/27/2023] Open
Abstract
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Phase Genomics, Seattle, WA, USA.
- Pacific Biosciences, Menlo Park, CA, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kristen Kuhn
- US Meat Animal Research Center, ARS USDA, Clay Center, NE, USA
| | | | - Wai Yee Low
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Olivier Fedrigo
- Vertebrate Genomes Laboratory, The Rockefeller University, New York, NY, USA
| | | | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - John L Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
- Dipartimento di Scienze Animali, della Nutrizione e degli Alimenti, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | | | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | | |
Collapse
|
24
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Minoche AE, Lundie B, Peters GB, Ohnesorg T, Pinese M, Thomas DM, Zankl A, Roscioli T, Schonrock N, Kummerfeld S, Burnett L, Dinger ME, Cowley MJ. ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data. Genome Med 2021; 13:32. [PMID: 33632298 PMCID: PMC7908648 DOI: 10.1186/s13073-021-00841-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 02/02/2021] [Indexed: 01/09/2023] Open
Abstract
Whole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritization, and visualization framework, which identified 99.8% of simulated pathogenic ClinVar CNVs > 10 kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5-4.5%) and reproducibility high (95-99%). In clinical practice, ClinSV identified reportable variants in 22 of 485 patients (4.7%) of which 35-63% were not detectable by current clinical microarray designs. ClinSV is available at https://github.com/KCCG/ClinSV .
Collapse
Affiliation(s)
- Andre E Minoche
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
| | - Ben Lundie
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Greg B Peters
- Sydney Genome Diagnostics, The Children's Hospital at Westmead, Hawkesbury Road & Hainsworth Street, Westmead, NSW, Australia
| | - Thomas Ohnesorg
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia
| | - David M Thomas
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- The Kinghorn Cancer Centre and Cancer Division, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
| | - Andreas Zankl
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Department of Clinical Genetics, The Children's Hospital at Westmead, Hawkesbury Road, Westmead, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Tony Roscioli
- NSW Health Pathology Randwick, Sydney, NSW, Australia
- Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia
- Prince of Wales Clinical School, University of New South Wales, Sydney, NSW, Australia
- Neuroscience Research Australia, University of New South Wales, Randwick, Sydney, NSW, Australia
| | - Nicole Schonrock
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
| | - Sarah Kummerfeld
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
| | - Leslie Burnett
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia
- Genome.One, Darlinghurst, NSW, Australia
- Sydney Medical School, The University of Sydney, Camperdown, NSW, Australia
| | - Marcel E Dinger
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW, Australia
| | - Mark J Cowley
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria Street, Darlinghurst, NSW, Australia.
- St Vincent's Clinical School, UNSW, Sydney, NSW, Australia.
- Children's Cancer Institute, University of New South Wales, Randwick, Sydney, NSW, Australia.
- School of Women's and Children's Health, UNSW, Sydney, NSW, Australia.
| |
Collapse
|
26
|
Chen L, Pryce JE, Hayes BJ, Daetwyler HD. Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11020541. [PMID: 33669735 PMCID: PMC7922624 DOI: 10.3390/ani11020541] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/09/2021] [Accepted: 02/12/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary Structural variants are large changes to the DNA sequences that differ from individual to individual. We discovered and quality-controlled a set of 24,908 structural variants and used a technique called imputation to infer them into 35,588 Holstein and Jersey cattle. We then investigated whether the structural variants affected key dairy cattle traits such as milk production, fertility and overall conformation. Structural variants explained generally less than 10 percent of the phenotypic variation in these traits. Four of the structural variants were significantly associated with dairy cattle production traits. However, the inclusion of the structural variants in the genomic prediction model did not increase genomic prediction accuracy. Abstract Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.
Collapse
Affiliation(s)
- Long Chen
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Jennie E. Pryce
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Ben J. Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Hans D. Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
- Correspondence:
| |
Collapse
|
27
|
Goldrich DY, LaBarge B, Chartrand S, Zhang L, Sadowski HB, Zhang Y, Pham K, Way H, Lai CYJ, Pang AWC, Clifford B, Hastie AR, Oldakowski M, Goldenberg D, Broach JR. Identification of Somatic Structural Variants in Solid Tumors by Optical Genome Mapping. J Pers Med 2021; 11:142. [PMID: 33670576 PMCID: PMC7921992 DOI: 10.3390/jpm11020142] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 02/12/2021] [Accepted: 02/15/2021] [Indexed: 12/12/2022] Open
Abstract
Genomic structural variants comprise a significant fraction of somatic mutations driving cancer onset and progression. However, such variants are not readily revealed by standard next-generation sequencing. Optical genome mapping (OGM) surpasses short-read sequencing in detecting large (>500 bp) and complex structural variants (SVs) but requires isolation of ultra-high-molecular-weight DNA from the tissue of interest. We have successfully applied a protocol involving a paramagnetic nanobind disc to a wide range of solid tumors. Using as little as 6.5 mg of input tumor tissue, we show successful extraction of high-molecular-weight genomic DNA that provides a high genomic map rate and effective coverage by optical mapping. We demonstrate the system's utility in identifying somatic SVs affecting functional and cancer-related genes for each sample. Duplicate/triplicate analysis of select samples shows intra-sample reliability but also intra-sample heterogeneity. We also demonstrate that simply filtering SVs based on a GRCh38 human control database provides high positive and negative predictive values for true somatic variants. Our results indicate that the solid tissue DNA extraction protocol, OGM and SV analysis can be applied to a wide variety of solid tumors to capture SVs across the entire genome with functional importance in cancer prognosis and treatment.
Collapse
Affiliation(s)
- David Y. Goldrich
- Department of Otolaryngology—Head and Neck Surgery, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (D.Y.G.); (B.L.); (D.G.)
| | - Brandon LaBarge
- Department of Otolaryngology—Head and Neck Surgery, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (D.Y.G.); (B.L.); (D.G.)
| | - Scott Chartrand
- Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (S.C.); (L.Z.)
| | - Lijun Zhang
- Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (S.C.); (L.Z.)
| | - Henry B. Sadowski
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Yang Zhang
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Khoa Pham
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Hannah Way
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Chi-Yu Jill Lai
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Andy Wing Chun Pang
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Benjamin Clifford
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Alex R. Hastie
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - Mark Oldakowski
- Bionano Genomics, San Diego, CA 92121, USA; (H.B.S.); (Y.Z.); (K.P.); (H.W.); (C.-Y.J.L.); (A.W.C.P.); (B.C.); (A.R.H.); (M.O.)
| | - David Goldenberg
- Department of Otolaryngology—Head and Neck Surgery, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (D.Y.G.); (B.L.); (D.G.)
| | - James R. Broach
- Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA; (S.C.); (L.Z.)
| |
Collapse
|
28
|
Zarate S, Carroll A, Mahmoud M, Krasheninina O, Jun G, Salerno WJ, Schatz MC, Boerwinkle E, Gibbs RA, Sedlazeck FJ. Parliament2: Accurate structural variant calling at scale. Gigascience 2020; 9:giaa145. [PMID: 33347570 PMCID: PMC7751401 DOI: 10.1093/gigascience/giaa145] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/17/2020] [Accepted: 11/18/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Collapse
Affiliation(s)
- Samantha Zarate
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Carroll
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Krasheninina
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Goo Jun
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - William J Salerno
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael C Schatz
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
29
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
30
|
Singchat W, Ahmad SF, Laopichienpong N, Suntronpong A, Panthum T, Griffin DK, Srikulnath K. Snake W Sex Chromosome: The Shadow of Ancestral Amniote Super-Sex Chromosome. Cells 2020; 9:cells9112386. [PMID: 33142713 PMCID: PMC7692289 DOI: 10.3390/cells9112386] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 10/27/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open
Abstract
: Heteromorphic sex chromosomes, particularly the ZZ/ZW sex chromosome system of birds and some reptiles, undergo evolutionary dynamics distinct from those of autosomes. The W sex chromosome is a unique karyological member of this heteromorphic pair, which has been extensively studied in snakes to explore the origin, evolution, and genetic diversity of amniote sex chromosomes. The snake W sex chromosome offers a fascinating model system to elucidate ancestral trajectories that have resulted in genetic divergence of amniote sex chromosomes. Although the principal mechanism driving evolution of the amniote sex chromosome remains obscure, an emerging hypothesis, supported by studies of W sex chromosomes of squamate reptiles and snakes, suggests that sex chromosomes share varied genomic blocks across several amniote lineages. This implies the possible split of an ancestral super-sex chromosome via chromosomal rearrangements. We review the major findings pertaining to sex chromosomal profiles in amniotes and discuss the evolution of an ancestral super-sex chromosome by collating recent evidence sourced mainly from the snake W sex chromosome analysis. We highlight the role of repeat-mediated sex chromosome conformation and present a genomic landscape of snake Z and W chromosomes, which reveals the relative abundance of major repeats, and identifies the expansion of certain transposable elements. The latest revolution in chromosomics, i.e., complete telomere-to-telomere assembly, offers mechanistic insights into the evolutionary origin of sex chromosomes.
Collapse
Affiliation(s)
- Worapong Singchat
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Syed Farhan Ahmad
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Nararat Laopichienpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Aorarat Suntronpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
| | | | - Kornsorn Srikulnath
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand; (W.S.); (S.F.A.); (N.L.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, 50 Ngamwongwan, Chatuchak, Bangkok 10900, Thailand
- Center for Advanced Studies in Tropical Natural Resources, National Research University-Kasetsart University, Kasetsart University, (CASTNAR, NRU-KU, Thailand), Bangkok 10900, Thailand
- Center of Excellence on Agricultural Biotechnology (AG-BIO/PERDO-CHE), Bangkok 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand
- Amphibian Research Center, Hiroshima University, 1-3-1, Kagamiyama, Higashihiroshima 739-8526, Japan
- Correspondence: ; Tel.: +66-2562-5644
| |
Collapse
|
31
|
Koboldt DC. Best practices for variant calling in clinical sequencing. Genome Med 2020; 12:91. [PMID: 33106175 PMCID: PMC7586657 DOI: 10.1186/s13073-020-00791-w] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 10/08/2020] [Indexed: 02/08/2023] Open
Abstract
Next-generation sequencing technologies have enabled a dramatic expansion of clinical genetic testing both for inherited conditions and diseases such as cancer. Accurate variant calling in NGS data is a critical step upon which virtually all downstream analysis and interpretation processes rely. Just as NGS technologies have evolved considerably over the past 10 years, so too have the software tools and approaches for detecting sequence variants in clinical samples. In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio sequencing for inherited disorders and somatic mutation detection in cancer patients. I describe the relative strengths and weaknesses of panel, exome, and whole-genome sequencing for variant detection. Recommended tools and strategies for calling variants of different classes are also provided, along with guidance on variant review, validation, and benchmarking to ensure optimal performance. Although NGS technologies are continually evolving, and new capabilities (such as long-read single-molecule sequencing) are emerging, the “best practice” principles in this review should be relevant to clinical variant calling in the long term.
Collapse
Affiliation(s)
- Daniel C Koboldt
- Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children's Hospital, Columbus, OH, USA. .,Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
32
|
Jia H, Wei H, Zhu D, Ma J, Yang H, Wang R, Feng X. PASA: Identifying More Credible Structural Variants of Hedou12. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1493-1503. [PMID: 31425044 DOI: 10.1109/tcbb.2019.2934463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Although plenty of structural variant detecting approaches for human genomes can be looked up in the literatures, little has been acknowledged on the effectiveness of those structural variant softwares for plant genomes. Moreover, it has been demonstrated frequent occurrences for those structural variant detecting softwares to find too many false structural variants. In this paper, we devote to detect deletions, insertions, and inversions, in total of three kinds of structural variants occurring in Hedou12 genome in contrast to Williams82 genome. To find more potential structural variants, we try to develop new principles to detect discordant and split read map sets supporting structural variants. Aiming to enhance the precision of structural variant detections, we propose two new sequencing characteristic based probability models, which use the sequencing parameters of Hedou12 genome as well as the parameters for Hedou12 paired-end reads to be aligned onto Williams82, to evaluate the probability for a potential structural variant to occur in. To remove the false members from those potential structural variants, we propose a set cover problem model to describe formally on which potential structural variants it should accept to achieve as high as possible a probability summation. This will achieve a solution with more credible structural variants, which can be verified by comparing with DELLY version 0.5.8 and LUMPY version 0.2.2.3. Our algorithm has been verified to be able to find deletions, insertions, and inversions in Hedou12 in contrast to Williams82 DELLY as well as LUMPY fails to find.
Collapse
|
33
|
Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020; 21:189. [PMID: 32746918 PMCID: PMC7477834 DOI: 10.1186/s13059-020-02107-y] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/14/2020] [Indexed: 01/01/2023] Open
Abstract
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yue Jiang
- Nebula Genomics, Harbin, 150030, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Yan Gao
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Zhe Cui
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
34
|
A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 2020; 38:1347-1355. [PMID: 32541955 PMCID: PMC8454654 DOI: 10.1038/s41587-020-0538-8] [Citation(s) in RCA: 175] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 04/28/2020] [Indexed: 12/19/2022]
Abstract
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed the first sequence-resolved benchmark set for identification of both false negative and false positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12745 isolated, sequence-resolved insertion (7281) and deletion (5464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5262 insertions and 4095 deletions supported by ≥1 diploid assembly. We demonstrate the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
Collapse
|
35
|
Heller D, Vingron M. SVIM: structural variant identification using mapped long reads. Bioinformatics 2020; 35:2907-2915. [PMID: 30668829 PMCID: PMC6735718 DOI: 10.1093/bioinformatics/btz041] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 01/04/2019] [Accepted: 01/22/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
36
|
Peng Y, Yuan C, Tao X, Zhao Y, Yao X, Zhuge L, Huang J, Zheng Q, Zhang Y, Hong H, Chen H, Sun Y. Integrated analysis of optical mapping and whole-genome sequencing reveals intratumoral genetic heterogeneity in metastatic lung squamous cell carcinoma. Transl Lung Cancer Res 2020; 9:670-681. [PMID: 32676329 PMCID: PMC7354123 DOI: 10.21037/tlcr-19-401] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Intratumoral heterogeneity is a crucial factor to the outcome of patients and resistance to therapies, in which structural variants play an indispensable but undiscovered role. Methods We performed an integrated analysis of optical mapping and whole-genome sequencing on a primary tumor (PT) and matched metastases including lymph node metastasis (LNM) and tumor thrombus in the pulmonary vein (TPV). Single nucleotide variants, indels and structural variants were analyzed to reveal intratumoral genetic heterogeneity among tumor cells in different sites. Results Our results demonstrated there were less nonsynonymous somatic variants shared with PT in LNM than in TPV, while there were more structural variants shared with PT in LNM than in TPV. More private variants and its affected genes associated with tumorigenesis and progression were identified in TPV than in LNM. It should be noticed that optical mapping detected an average of 77.1% (74.5-78.5%) large structural variants (>5,000 bp) not detected by whole-genome sequencing and identified several structural variants private to metastases. Conclusions Our study does demonstrate structural variants, especially large structural variants play a crucial role in intratumoral genetic heterogeneity and optical mapping could make up for the deficiency of whole-genome sequencing to identify structural variants.
Collapse
Affiliation(s)
- Yizhou Peng
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Chongze Yuan
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xiaoting Tao
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yue Zhao
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xingxin Yao
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Lingdun Zhuge
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | | | - Qiang Zheng
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China.,Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai 200032, China
| | - Yue Zhang
- Berry Genomics Corporation, Beijing 100015, China
| | - Hui Hong
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Haiquan Chen
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yihua Sun
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| |
Collapse
|
37
|
SVXplorer: Three-tier approach to identification of structural variants via sequential recombination of discordant cluster signatures. PLoS Comput Biol 2020; 16:e1007737. [PMID: 32182236 PMCID: PMC7100977 DOI: 10.1371/journal.pcbi.1007737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 03/27/2020] [Accepted: 02/18/2020] [Indexed: 11/19/2022] Open
Abstract
The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer.
Collapse
|
38
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
39
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
40
|
Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing. PLoS One 2020; 15:e0226340. [PMID: 31940362 PMCID: PMC6961855 DOI: 10.1371/journal.pone.0226340] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/25/2019] [Indexed: 12/29/2022] Open
Abstract
Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.
Collapse
|
41
|
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ 2020; 8:e8214. [PMID: 31934500 PMCID: PMC6951283 DOI: 10.7717/peerj.8214] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 11/14/2019] [Indexed: 12/19/2022] Open
Abstract
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
Collapse
Affiliation(s)
| | | | | | - Luca Santuari
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Carl Shneider
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Wigard P Kloosterman
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| |
Collapse
|
42
|
Yokoyama TT, Kasahara M. Visualization tools for human structural variations identified by whole-genome sequencing. J Hum Genet 2020; 65:49-60. [PMID: 31666648 PMCID: PMC8075883 DOI: 10.1038/s10038-019-0687-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 09/27/2019] [Accepted: 10/02/2019] [Indexed: 01/02/2023]
Abstract
Visualizing structural variations (SVs) is a critical step for finding associations between SVs and human traits or diseases. Given that there are many sequencing platforms used for SV identification and given that how best to visualize SVs together with other data, such as read alignments and annotations, depends on research goals, there are dozens of SV visualization tools designed for different research goals and sequencing platforms. Here, we provide a comprehensive survey of over 30 SV visualization tools to help users choose which tools to use. This review targets users who wish to visualize a set of SVs identified from the massively parallel sequencing reads of an individual human genome. We first categorize the ways in which SV visualization tools display SVs into ten major categories, which we denote as view modules. View modules allow readers to understand the features of each SV visualization tool quickly. Next, we introduce the features of individual SV visualization tools from several aspects, including whether SV views are integrated with annotations, whether long-read alignment is displayed, whether underlying data structures are graph-based, the type of SVs shown, whether auditing is possible, whether bird's eye view is available, sequencing platforms, and the number of samples. We hope that this review will serve as a guide for readers on the currently available SV visualization tools and lead to the development of new SV visualization tools in the near future.
Collapse
Affiliation(s)
- Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahiro Kasahara
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|
43
|
Zhou A, Lin T, Xing J. Evaluating nanopore sequencing data processing pipelines for structural variation identification. Genome Biol 2019; 20:237. [PMID: 31727126 PMCID: PMC6857234 DOI: 10.1186/s13059-019-1858-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/10/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. RESULTS Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers' performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. CONCLUSIONS We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.
Collapse
Affiliation(s)
- Anbo Zhou
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Timothy Lin
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetics Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
44
|
Li R, Tian X, Yang P, Fan Y, Li M, Zheng H, Wang X, Jiang Y. Recovery of non-reference sequences missing from the human reference genome. BMC Genomics 2019; 20:746. [PMID: 31619167 PMCID: PMC6796347 DOI: 10.1186/s12864-019-6107-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/20/2019] [Indexed: 01/12/2023] Open
Abstract
Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yingzhi Fan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Hongxiang Zheng
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
45
|
Yang N, Wu S, Yan J. Structural variation in complex genome: detection, integration and function. SCIENCE CHINA-LIFE SCIENCES 2019; 62:1098-1100. [PMID: 31376014 DOI: 10.1007/s11427-019-9664-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 07/16/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Ning Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shenshen Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
46
|
Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat Commun 2019; 10:3240. [PMID: 31324872 PMCID: PMC6642177 DOI: 10.1038/s41467-019-11146-4] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 06/26/2019] [Indexed: 01/12/2023] Open
Abstract
In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers. A number of computational methods have been developed for calling structural variants (SVs) using short read sequencing data. Here, the authors perform a comprehensive benchmarking analysis comparing 10 general-purpose callers and provide recommendations for both users and methods developers.
Collapse
|
47
|
Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2019; 19:329-346. [PMID: 29599501 DOI: 10.1038/s41576-018-0003-4] [Citation(s) in RCA: 281] [Impact Index Per Article: 56.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hayan Lee
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
48
|
TSD: A Computational Tool To Study the Complex Structural Variants Using PacBio Targeted Sequencing Data. G3-GENES GENOMES GENETICS 2019; 9:1371-1376. [PMID: 30850377 PMCID: PMC6505135 DOI: 10.1534/g3.118.200900] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
PacBio sequencing is a powerful approach to study DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of endogenous or exogenous sequences. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd.
Collapse
|
49
|
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, De La Vega FM, Xiao C, Sherry S, Salit M. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 2019; 37:561-566. [PMID: 30936564 PMCID: PMC6500473 DOI: 10.1038/s41587-019-0074-6] [Citation(s) in RCA: 183] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 02/19/2019] [Indexed: 12/30/2022]
Abstract
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
Collapse
Affiliation(s)
- Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Hemang Parikh
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Haynes Heaton
- 10x Genomics, Pleasanton, CA, USA
- Wellcome Trust Sanger Institute,, Hinxton, Cambridge, UK
| | | | - Len Trigg
- Real Time Genomics, Hamilton, New Zealand
| | | | - Cory Y McLean
- Verily Life Sciences, South San Francisco, CA, USA
- Google Inc., Mountain View, CA, USA
| | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Marc Salit
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
- Joint Initiative for Metrology in Biology, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
50
|
Parivesh A, Barseghyan H, Délot E, Vilain E. Translating genomics to the clinical diagnosis of disorders/differences of sex development. Curr Top Dev Biol 2019; 134:317-375. [PMID: 30999980 PMCID: PMC7382024 DOI: 10.1016/bs.ctdb.2019.01.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The medical and psychosocial challenges faced by patients living with Disorders/Differences of Sex Development (DSD) and their families can be alleviated by a rapid and accurate diagnostic process. Clinical diagnosis of DSD is limited by a lack of standardization of anatomical and endocrine phenotyping and genetic testing, as well as poor genotype/phenotype correlation. Historically, DSD genes have been identified through positional cloning of disease-associated variants segregating in families and validation of candidates in animal and in vitro modeling of variant pathogenicity. Owing to the complexity of conditions grouped under DSD, genome-wide scanning methods are better suited for identifying disease causing gene variant(s) and providing a clinical diagnosis. Here, we review a number of established genomic tools (karyotyping, chromosomal microarrays and exome sequencing) used in clinic for DSD diagnosis, as well as emerging genomic technologies such as whole-genome (short-read) sequencing, long-read sequencing, and optical mapping used for novel DSD gene discovery. These, together with gene expression and epigenetic studies can potentiate the clinical diagnosis of DSD diagnostic rates and enhance the outcomes for patients and families.
Collapse
Affiliation(s)
- Abhinav Parivesh
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States
| | - Emmanuèle Délot
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States.
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC, United States; Department of Genomics and Precision Medicine, The George Washington University, Washington, DC, United States.
| |
Collapse
|