101
|
Nicholas TJ, Al‐Sweel N, Farrell A, Mao R, Bayrak‐Toydemir P, Miller CE, Bentley D, Palmquist R, Moore B, Hernandez EJ, Cormier MJ, Fredrickson E, Noble K, Rynearson S, Holt C, Karren M, Bonkowsky JL, Tristani‐Firouzi M, Yandell M, Marth G, Quinlan AR, Brunelli L, Toydemir R, Shayota BJ, Carey JC, Boyden SE, Malone Jenkins S. Comprehensive variant calling from whole-genome sequencing identifies a complex inversion that disrupts ZFPM2 in familial congenital diaphragmatic hernia. Mol Genet Genomic Med 2022; 10:e1888. [PMID: 35119225 PMCID: PMC9000945 DOI: 10.1002/mgg3.1888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Genetic disorders contribute to significant morbidity and mortality in critically ill newborns. Despite advances in genome sequencing technologies, a majority of neonatal cases remain unsolved. Complex structural variants (SVs) often elude conventional genome sequencing variant calling pipelines and will explain a portion of these unsolved cases. METHODS As part of the Utah NeoSeq project, we used a research-based, rapid whole-genome sequencing (WGS) protocol to investigate the genomic etiology for a newborn with a left-sided congenital diaphragmatic hernia (CDH) and cardiac malformations, whose mother also had a history of CDH and atrial septal defect. RESULTS Using both a novel, alignment-free and traditional alignment-based variant callers, we identified a maternally inherited complex SV on chromosome 8, consisting of an inversion flanked by deletions. This complex inversion, further confirmed using orthogonal molecular techniques, disrupts the ZFPM2 gene, which is associated with both CDH and various congenital heart defects. CONCLUSIONS Our results demonstrate that complex structural events, which often are unidentifiable or not reported by clinically validated testing procedures, can be discovered and accurately characterized with conventional, short-read sequencing and underscore the utility of WGS as a first-line diagnostic tool.
Collapse
Affiliation(s)
- Thomas J. Nicholas
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Najla Al‐Sweel
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Andrew Farrell
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Rong Mao
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Pinar Bayrak‐Toydemir
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | | | - Dawn Bentley
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Rachel Palmquist
- Division of Pediatric Neurology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
- Primary Children's Center for Personalized MedicineSalt Lake CityUSA
| | - Barry Moore
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Edgar J. Hernandez
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Michael J. Cormier
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | | | | | - Shawn Rynearson
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Carson Holt
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Mary Anne Karren
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Joshua L. Bonkowsky
- Division of Pediatric Neurology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
- Primary Children's Center for Personalized MedicineSalt Lake CityUSA
| | - Martin Tristani‐Firouzi
- Division of Pediatric Cardiology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Mark Yandell
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Gabor Marth
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Aaron R. Quinlan
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
- Department of Biomedical InformaticsUniversity of UtahSalt Lake CityUSA
| | - Luca Brunelli
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Reha M. Toydemir
- ARUP LaboratoriesSalt Lake CityUSA
- Department of PathologyUniversity of UtahSalt Lake CityUSA
| | - Brian J. Shayota
- Division of Medical Genetics, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - John C. Carey
- Division of Medical Genetics, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| | - Steven E. Boyden
- Department of Human Genetics, Utah Center for Genetic DiscoveryUniversity of UtahSalt Lake CityUSA
| | - Sabrina Malone Jenkins
- Division of Neonatology, Department of PediatricsUniversity of Utah School of MedicineSalt Lake CityUSA
| |
Collapse
|
102
|
Jobson E, Roberts R. Genomic structural variation in tomato and its role in plant immunity. MOLECULAR HORTICULTURE 2022; 2:7. [PMID: 37789472 PMCID: PMC10515242 DOI: 10.1186/s43897-022-00029-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 02/22/2022] [Indexed: 10/05/2023]
Abstract
It is well known that large genomic variations can greatly impact the phenotype of an organism. Structural Variants (SVs) encompass any genomic variation larger than 30 base pairs, and include changes caused by deletions, inversions, duplications, transversions, and other genome modifications. Due to their size and complex nature, until recently, it has been difficult to truly capture these variations. Recent advances in sequencing technology and computational analyses now permit more extensive studies of SVs in plant genomes. In tomato, advances in sequencing technology have allowed researchers to sequence hundreds of genomes from tomatoes, and tomato relatives. These studies have identified SVs related to fruit size and flavor, as well as plant disease response, resistance/susceptibility, and the ability of plants to detect pathogens (immunity). In this review, we discuss the implications for genomic structural variation in plants with a focus on its role in tomato immunity. We also discuss how advances in sequencing technology have led to new discoveries of SVs in more complex genomes, the current evidence for the role of SVs in biotic and abiotic stress responses, and the outlook for genetic modification of SVs to advance plant breeding objectives.
Collapse
Affiliation(s)
- Emma Jobson
- Montana State University Extension, Montana State University, Bozeman, MT, 59717, United States
| | - Robyn Roberts
- Agricultural Biology Department, College of Agricultural Sciences, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
103
|
Assessment of linkage disequilibrium patterns between structural variants and single nucleotide polymorphisms in three commercial chicken populations. BMC Genomics 2022; 23:193. [PMID: 35264116 PMCID: PMC8908679 DOI: 10.1186/s12864-022-08418-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/24/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Structural variants (SV) are causative for some prominent phenotypic traits of livestock as different comb types in chickens or color patterns in pigs. Their effects on production traits are also increasingly studied. Nevertheless, accurately calling SV remains challenging. It is therefore of interest, whether close-by single nucleotide polymorphisms (SNPs) are in strong linkage disequilibrium (LD) with SVs and can serve as markers. Literature comes to different conclusions on whether SVs are in LD to SNPs on the same level as SNPs to other SNPs. The present study aimed to generate a precise SV callset from whole-genome short-read sequencing (WGS) data for three commercial chicken populations and to evaluate LD patterns between the called SVs and surrounding SNPs. It is thereby the first study that assessed LD between SVs and SNPs in chickens. RESULTS The final callset consisted of 12,294,329 bivariate SNPs, 4,301 deletions (DEL), 224 duplications (DUP), 218 inversions (INV) and 117 translocation breakpoints (BND). While average LD between DELs and SNPs was at the same level as between SNPs and SNPs, LD between other SVs and SNPs was strongly reduced (DUP: 40%, INV: 27%, BND: 19% of between-SNP LD). A main factor for the reduced LD was the presence of local minor allele frequency differences, which accounted for 50% of the difference between SNP - SNP and DUP - SNP LD. This was potentially accompanied by lower genotyping accuracies for DUP, INV and BND compared with SNPs and DELs. An evaluation of the presence of tag SNPs (SNP in highest LD to the variant of interest) further revealed DELs to be slightly less tagged by WGS SNPs than WGS SNPs by other SNPs. This difference, however, was no longer present when reducing the pool of potential tag SNPs to SNPs located on four different chicken genotyping arrays. CONCLUSIONS The results implied that genomic variance due to DELs in the chicken populations studied can be captured by different SNP marker sets as good as variance from WGS SNPs, whereas separate SV calling might be advisable for DUP, INV, and BND effects.
Collapse
|
104
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK
- University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
105
|
Long-read sequencing on the SMRT platform enables efficient haplotype linkage analysis in preimplantation genetic testing for β-thalassemia. J Assist Reprod Genet 2022; 39:739-746. [PMID: 35141813 PMCID: PMC8995213 DOI: 10.1007/s10815-022-02415-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 01/26/2022] [Indexed: 10/19/2022] Open
Abstract
PURPOSE This study aimed to evaluate the value of long-read sequencing for preimplantation haplotype linkage analysis. METHODS The genetic material of the three β-thalassemia mutation carrier couples was sequenced using single-molecule real-time sequencing in the 7.7-kb region of the HBB gene and a 7.4-kb region that partially overlapped with it to detect the presence of 17 common HBB gene mutations in the Chinese population and the haplotypes formed by the continuous array of single-nucleotide polymorphisms linked to these mutations. By using the same method to analyze multiple displacement amplification products of embryos from three families and comparing the results with those of the parents, it could be revealed whether the embryos carry disease-causing mutations without the need for a proband. RESULTS The HBB gene mutations of the three couples were accurately detected, and the haplotype linked to the pathogenic site was successfully obtained without the need for a proband. A total of 68.75% (22/32) of embryos from the three families successfully underwent haplotype linkage analysis, and the results were consistent with the results of NGS-based mutation site detection. CONCLUSION This study supports long-read sequencing as a potential tool for preimplantation haplotype linkage analysis.
Collapse
|
106
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 105] [Impact Index Per Article: 52.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
Affiliation(s)
- Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
| | - Joshua W Knowles
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Medicine, Diabetes Research Center, Cardiovascular Institute and Prevention Research Center, Stanford, CA, USA
| | - Euan A Ashley
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
107
|
Lemay MA, Sibbesen JA, Torkamaneh D, Hamel J, Levesque RC, Belzile F. Combined use of Oxford Nanopore and Illumina sequencing yields insights into soybean structural variation biology. BMC Biol 2022; 20:53. [PMID: 35197050 PMCID: PMC8867729 DOI: 10.1186/s12915-022-01255-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 02/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Structural variants (SVs), including deletions, insertions, duplications, and inversions, are relatively long genomic variations implicated in a diverse range of processes from human disease to ecology and evolution. Given their complex signatures, tendency to occur in repeated regions, and large size, discovering SVs based on short reads is challenging compared to single-nucleotide variants. The increasing availability of long-read technologies has greatly facilitated SV discovery; however, these technologies remain too costly to apply routinely to population-level studies. Here, we combined short-read and long-read sequencing technologies to provide a comprehensive population-scale assessment of structural variation in a panel of Canadian soybean cultivars. RESULTS We used Oxford Nanopore long-read sequencing data (~12× mean coverage) for 17 samples to both benchmark SV calls made from Illumina short-read data and predict SVs that were subsequently genotyped in a population of 102 samples using Illumina data. Benchmarking results show that variants discovered using Oxford Nanopore can be accurately genotyped from the Illumina data. We first use the genotyped deletions and insertions for population genetics analyses and show that results are comparable to those based on single-nucleotide variants. We observe that the population frequency and distribution within the genome of deletions and insertions are constrained by the location of genes. Gene Ontology and PFAM domain enrichment analyses also confirm previous reports that genes harboring high-frequency deletions and insertions are enriched for functions in defense response. Finally, we discover polymorphic transposable elements from the deletions and insertions and report evidence of the recent activity of a Stowaway MITE. CONCLUSIONS We show that structural variants discovered using Oxford Nanopore data can be genotyped with high accuracy from Illumina data. Our results demonstrate that long-read and short-read sequencing technologies can be efficiently combined to enhance SV analysis in large populations, providing a reusable framework for their study in a wider range of samples and non-model species.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| | | | - Davoud Torkamaneh
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| | - Jérémie Hamel
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
- Département de microbiologie-infectiologie et d’immunologie, Université Laval, Quebec, Canada
| | - Roger C. Levesque
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
- Département de microbiologie-infectiologie et d’immunologie, Université Laval, Quebec, Canada
| | - François Belzile
- Département de phytologie, Université Laval, Quebec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Quebec, Canada
| |
Collapse
|
108
|
Menon VK, Okhuysen PC, Chappell CL, Mahmoud M, Mahmoud M, Meng Q, Doddapaneni H, Vee V, Han Y, Salvi S, Bhamidipati S, Kottapalli K, Weissenberger G, Shen H, Ross MC, Hoffman KL, Cregeen SJ, Muzny DM, Metcalf GA, Gibbs RA, Petrosino JF, Sedlazeck FJ. Fully resolved assembly of Cryptosporidium parvum. Gigascience 2022; 11:giac010. [PMID: 35166336 PMCID: PMC8848321 DOI: 10.1093/gigascience/giac010] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 12/07/2021] [Accepted: 01/20/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Cryptosporidium parvum is an apicomplexan parasite commonly found across many host species with a global infection prevalence in human populations of 7.6%. Understanding its diversity and genomic makeup can help in fighting established infections and prohibiting further transmission. The basis of every genomic study is a high-quality reference genome that has continuity and completeness, thus enabling comprehensive comparative studies. FINDINGS Here, we provide a highly accurate and complete reference genome of Cryptosporidium parvum. The assembly is based on Oxford Nanopore reads and was improved using Illumina reads for error correction. We also outline how to evaluate and choose from different assembly methods based on 2 main approaches that can be applied to other Cryptosporidium species. The assembly encompasses 8 chromosomes and includes 13 telomeres that were resolved. Overall, the assembly shows a high completion rate with 98.4% single-copy BUSCO genes. CONCLUSIONS This high-quality reference genome of a zoonotic IIaA17G2R1 C. parvum subtype isolate provides the basis for subsequent comparative genomic studies across the Cryptosporidium clade. This will enable improved understanding of diversity, functional, and association studies.
Collapse
Affiliation(s)
- Vipin K Menon
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Pablo C Okhuysen
- Department of Infectious Diseases, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Cynthia L Chappell
- Center for Infectious Diseases, The University of Texas School of Public Health, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Qingchang Meng
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Harsha Doddapaneni
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Vanesa Vee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sejal Salvi
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sravya Bhamidipati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kavya Kottapalli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - George Weissenberger
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hua Shen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Matthew C Ross
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Kristi L Hoffman
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Sara Javornik Cregeen
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joseph F Petrosino
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
109
|
Methods to Improve Molecular Diagnosis in Genomic Cold Cases in Pediatric Neurology. Genes (Basel) 2022; 13:genes13020333. [PMID: 35205378 PMCID: PMC8871714 DOI: 10.3390/genes13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/06/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
During the last decade, genetic testing has emerged as an important etiological diagnostic tool for Mendelian diseases, including pediatric neurological conditions. A genetic diagnosis has a considerable impact on disease management and treatment; however, many cases remain undiagnosed after applying standard diagnostic sequencing techniques. This review discusses various methods to improve the molecular diagnostic rates in these genomic cold cases. We discuss extended analysis methods to consider, non-Mendelian inheritance models, mosaicism, dual/multiple diagnoses, periodic re-analysis, artificial intelligence tools, and deep phenotyping, in addition to integrating various omics methods to improve variant prioritization. Last, novel genomic technologies, including long-read sequencing, artificial long-read sequencing, and optical genome mapping are discussed. In conclusion, a more comprehensive molecular analysis and a timely re-analysis of unsolved cases are imperative to improve diagnostic rates. In addition, our current understanding of the human genome is still limited due to restrictions in technologies. Novel technologies are now available that improve upon some of these limitations and can capture all human genomic variation more accurately. Last, we recommend a more routine implementation of high molecular weight DNA extraction methods that is coherent with the ability to use and/or optimally benefit from these novel genomic methods.
Collapse
|
110
|
Murdock DR, Rosenfeld JA, Lee B. What Has the Undiagnosed Diseases Network Taught Us About the Clinical Applications of Genomic Testing? Annu Rev Med 2022; 73:575-585. [PMID: 35084988 PMCID: PMC10874501 DOI: 10.1146/annurev-med-042120-014904] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genetic testing has undergone a revolution in the last decade, particularly with the advent of next-generation sequencing and its associated reductions in costs and increases in efficiencies. The Undiagnosed Diseases Network (UDN) has been a leader in the application of such genomic testing for rare disease diagnosis. This review discusses the current state of genomic testing performed within the UDN, with a focus on the strengths and limitations of whole-exome and whole-genome sequencing in clinical diagnostics and the importance of ongoing data reanalysis. The role of emerging technologies such as RNA and long-read sequencing to further improve diagnostic rates in the UDN is also described. This review concludes with a discussion of the challenges faced in insurance coverage of comprehensive genomic testing as well as the opportunities for a larger role of testing in clinical medicine.
Collapse
Affiliation(s)
- David R Murdock
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
- Texas Children's Hospital, Houston, Texas 77030, USA
| |
Collapse
|
111
|
Affiliation(s)
- Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA, 6009, Australia.
| | - Baohong Zhang
- Department of Biology, East Caroline University, Greenville, NC, 27858, USA.
| |
Collapse
|
112
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
113
|
Jiang T, Liu S, Cao S, Wang Y. Structural Variant Detection from Long-Read Sequencing Data with cuteSV. Methods Mol Biol 2022; 2493:137-151. [PMID: 35751813 DOI: 10.1007/978-1-0716-2293-3_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Structural Variation (SV) represents genomic rearrangements and is strongly associated with human health and disease. Recently, long-read sequencing technologies provide the opportunity to more comprehensive identification of SVs at an ever-high resolution. However, under the circumstance of high sequencing errors and the complexity of SVs, there remains lots of technical issues to be settled. Hence, we propose cuteSV, a sensitive, fast, and scalable alignment-based SV detection approach to complete comprehensive discovery of diverse SVs. The benchmarking results indicate cuteSV is suitable for large-scale genome project since its excellent SV yields and ultra-fast speed. Here, we explain the overall framework for providing a detailed outline for users to apply cuteSV correctly and comprehensively. More details are available at https://github.com/tjiangHIT/cuteSV .
Collapse
Affiliation(s)
- Tao Jiang
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shiqi Liu
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Shuqi Cao
- Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
114
|
Lemay MA, Malle S. A Practical Guide to Using Structural Variants for Genome-Wide Association Studies. Methods Mol Biol 2022; 2481:161-172. [PMID: 35641764 DOI: 10.1007/978-1-0716-2237-7_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Structural variants (SVs) are known to have large functional impacts on phenotypes of agricultural interest, but they have yet to be routinely used for GWAS. Apart from the difficulty in obtaining high-quality SV genotype data for large populations, one of the main hurdles to using SVs for GWAS lies in formatting of genotype data for use with popular GWAS programs. This protocol describes how typical SV genotype data can be formatted for input to three GWAS programs commonly used by the plant genetics community: TASSEL, GAPIT, and mrMLM.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie and Institut de biologie intégrative et des systèmes, Université Laval, Quebec City, QC, Canada.
| | - Sidiki Malle
- Institut Polytechnique Rural de Formation et de Recherche Appliquée De Katibougou, Koulikoro, Mali
| |
Collapse
|
115
|
Khayat MM, Sahraeian SME, Zarate S, Carroll A, Hong H, Pan B, Shi L, Gibbs RA, Mohiyuddin M, Zheng Y, Sedlazeck FJ. Hidden biases in germline structural variant detection. Genome Biol 2021; 22:347. [PMID: 34930391 PMCID: PMC8686633 DOI: 10.1186/s13059-021-02558-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/24/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. RESULTS In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. CONCLUSIONS This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.
Collapse
Affiliation(s)
- Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Bohu Pan
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
- Institute of Thoracic Oncology, Fudan University, Shanghai, China
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
- Institute of Thoracic Oncology, Fudan University, Shanghai, China.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
116
|
Baslan T, Kovaka S, Sedlazeck FJ, Zhang Y, Wappel R, Tian S, Lowe SW, Goodwin S, Schatz MC. High resolution copy number inference in cancer using short-molecule nanopore sequencing. Nucleic Acids Res 2021; 49:e124. [PMID: 34551429 PMCID: PMC8643650 DOI: 10.1093/nar/gkab812] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/19/2021] [Accepted: 09/09/2021] [Indexed: 01/23/2023] Open
Abstract
Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.
Collapse
Affiliation(s)
- Timour Baslan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yanming Zhang
- Cytogenetics Laboratory, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sha Tian
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
117
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
118
|
Hall CL, Kesharwani RK, Phillips NR, Planz JV, Sedlazeck FJ, Zascavage RR. Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device. Forensic Sci Int Genet 2021; 56:102629. [PMID: 34837788 DOI: 10.1016/j.fsigen.2021.102629] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 09/28/2021] [Accepted: 11/01/2021] [Indexed: 01/23/2023]
Abstract
The high variability characteristic of short tandem repeat (STR) markers is harnessed for human identification in forensic genetic analyses. Despite the power and reliability of current typing techniques, sequence-level information both within and around STRs are masked in the length-based profiles generated. Forensic STR typing using next generation sequencing (NGS) has therefore gained attention as an alternative to traditional capillary electrophoresis (CE) approaches. In this proof-of-principle study, we evaluate the forensic applicability of the newest and smallest NGS platform available - the Oxford Nanopore Technologies (ONT) MinION device. Although nanopore sequencing on the handheld MinION offers numerous advantages, including low startup cost and on-site sample processing, the relatively high error rate and lack of forensic-specific analysis software has prevented accurate profiling across STR panels in previous studies. Here we present STRspy, a streamlined method capable of producing length- and sequence-based STR allele designations from noisy, error-prone third generation sequencing reads. To assess the capabilities of STRspy, seven reference samples (female: n = 2; male: n = 5) were amplified at 15 and 30 PCR cycles using the Promega PowerSeq 46GY System and sequenced on the ONT MinION device in triplicate. Basecalled reads were then processed with STRspy using a custom database containing alleles reported in the STRSeq BioProject NIST 1036 dataset. Resultant STR allele designations and flanking region single nucleotide polymorphism (SNP) calls were compared to the manufacturer-validated genotypes for each sample. STRspy generated robust and reliable genotypes across all autosomal STR loci amplified with 30 PCR cycles, achieving 100% concordance based on both length and sequence. Furthermore, we were able to identify flanking region SNPs in the 15-cycle dataset with > 90% accuracy. These results demonstrate that when analyzed with STRspy ONT reads can reveal additional variation in and around STR loci depending on read coverage. As the first and only third generation sequencing platform-specific method to successfully profile the entire panel of autosomal STRs amplified by a commercially available multiplex, STRspy significantly increases the feasibility of nanopore sequencing in forensic applications.
Collapse
Affiliation(s)
- Courtney L Hall
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA.
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
| | - Nicole R Phillips
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA
| | - John V Planz
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
| | - Roxanne R Zascavage
- Department of Microbiology, Immunology & Genetics, University of North Texas Health Science Center, 3400 Camp Bowie Blvd, Fort Worth, TX 76107, USA; Department of Criminology and Criminal Justice, University of Texas at Arlington, 701 S Nedderman Dr, Arlington, TX 76109, USA
| |
Collapse
|
119
|
Liu H, Yan XM, Wang XR, Zhang DX, Zhou Q, Shi TL, Jia KH, Tian XC, Zhou SS, Zhang RG, Yun QZ, Wang Q, Xiang Q, Mannapperuma C, Van Zalen E, Street NR, Porth I, El-Kassaby YA, Zhao W, Wang XR, Guan W, Mao JF. Centromere-Specific Retrotransposons and Very-Long-Chain Fatty Acid Biosynthesis in the Genome of Yellowhorn ( Xanthoceras sorbifolium, Sapindaceae), an Oil-Producing Tree With Significant Drought Resistance. FRONTIERS IN PLANT SCIENCE 2021; 12:766389. [PMID: 34880890 PMCID: PMC8647845 DOI: 10.3389/fpls.2021.766389] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/18/2021] [Indexed: 05/17/2023]
Abstract
In-depth genome characterization is still lacking for most of biofuel crops, especially for centromeres, which play a fundamental role during nuclear division and in the maintenance of genome stability. This study applied long-read sequencing technologies to assemble a highly contiguous genome for yellowhorn (Xanthoceras sorbifolium), an oil-producing tree, and conducted extensive comparative analyses to understand centromere structure and evolution, and fatty acid biosynthesis. We produced a reference-level genome of yellowhorn, ∼470 Mb in length with ∼95% of contigs anchored onto 15 chromosomes. Genome annotation identified 22,049 protein-coding genes and 65.7% of the genome sequence as repetitive elements. Long terminal repeat retrotransposons (LTR-RTs) account for ∼30% of the yellowhorn genome, which is maintained by a moderate birth rate and a low removal rate. We identified the centromeric regions on each chromosome and found enrichment of centromere-specific retrotransposons of LINE1 and Gypsy in these regions, which have evolved recently (∼0.7 MYA). We compared the genomes of three cultivars and found frequent inversions. We analyzed the transcriptomes from different tissues and identified the candidate genes involved in very-long-chain fatty acid biosynthesis and their expression profiles. Collinear block analysis showed that yellowhorn shared the gamma (γ) hexaploidy event with Vitis vinifera but did not undergo any further whole-genome duplication. This study provides excellent genomic resources for understanding centromere structure and evolution and for functional studies in this important oil-producing plant.
Collapse
Affiliation(s)
- Hui Liu
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xue-Mei Yan
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xin-rui Wang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Dong-Xu Zhang
- Protected Agricultural Technology, R&D Center, Shanxi Datong University, Datong, China
| | - Qingyuan Zhou
- Key Laboratory of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Tian-Le Shi
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Kai-Hua Jia
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Xue-Chan Tian
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Shan-Shan Zhou
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Ren-Gang Zhang
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, China
| | - Quan-Zheng Yun
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, China
| | - Qing Wang
- Key Laboratory of Forest Ecology and Environment of the National Forestry and Grassland Administration, Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing, China
| | - Qiuhong Xiang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Chanaka Mannapperuma
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Elena Van Zalen
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Nathaniel R. Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
| | - Ilga Porth
- Départment des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et de Géomatique, Université Laval Québec, Quebec City, QC, Canada
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, Canada
| | - Wei Zhao
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Xiao-Ru Wang
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Wenbin Guan
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Jian-Feng Mao
- National Engineering Laboratory for Tree Breeding, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, School of Ecology and Nature Conservation, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| |
Collapse
|
120
|
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data. BMC Genomics 2021; 22:826. [PMID: 34789167 PMCID: PMC8596897 DOI: 10.1186/s12864-021-08082-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
Collapse
|
121
|
Schielzeth H, Wolf JBW. Community genomics: a community-wide perspective on within-species genetic diversity. AMERICAN JOURNAL OF BOTANY 2021; 108:2108-2111. [PMID: 34767249 DOI: 10.1002/ajb2.1796] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 09/07/2021] [Indexed: 06/13/2023]
Affiliation(s)
- Holger Schielzeth
- Institute of Ecology and Evolution, Friedrich Schiller University Jena, Germany
| | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Germany
| |
Collapse
|
122
|
Coombe L, Li JX, Lo T, Wong J, Nikolic V, Warren RL, Birol I. LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics 2021; 22:534. [PMID: 34717540 PMCID: PMC8557608 DOI: 10.1186/s12859-021-04451-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/19/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .
Collapse
Affiliation(s)
- Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada.
| | - Janet X Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolic
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
123
|
Abstract
De novo assembled genomes serve as the backbone for modern genomics. In an article in this issue of Cell Systems, Ekim et al. present the mdBG assembler that can assemble genomes 100-fold faster than previous methods, including a human genome in under 10 min, which unlocks pan-genomics for many species.
Collapse
|
124
|
Luo X, Cui K, Wang Z, Li Z, Wu Z, Huang W, Zhu XQ, Ruan J, Zhang W, Liu Q. High-quality reference genome of Fasciola gigantica: Insights into the genomic signatures of transposon-mediated evolution and specific parasitic adaption in tropical regions. PLoS Negl Trop Dis 2021; 15:e0009750. [PMID: 34610021 PMCID: PMC8519440 DOI: 10.1371/journal.pntd.0009750] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 10/15/2021] [Accepted: 08/23/2021] [Indexed: 12/31/2022] Open
Abstract
Fasciola gigantica and Fasciola hepatica are causative pathogens of fascioliasis, with the widest latitudinal, longitudinal, and altitudinal distribution; however, among parasites, they have the largest sequenced genomes, hindering genomic research. In the present study, we used various sequencing and assembly technologies to generate a new high-quality Fasciola gigantica reference genome. We improved the integration of gene structure prediction, and identified two independent transposable element expansion events contributing to (1) the speciation between Fasciola and Fasciolopsis during the Cretaceous-Paleogene boundary mass extinction, and (2) the habitat switch to the liver during the Paleocene-Eocene Thermal Maximum, accompanied by gene length increment. Long interspersed element (LINE) duplication contributed to the second transposon-mediated alteration, showing an obvious trend of insertion into gene regions, regardless of strong purifying effect. Gene ontology analysis of genes with long LINE insertions identified membrane-associated and vesicle secretion process proteins, further implicating the functional alteration of the gene network. We identified 852 predicted excretory/secretory proteins and 3300 protein-protein interactions between Fasciola gigantica and its host. Among them, copper/zinc superoxide dismutase genes, with specific gene copy number variations, might play a central role in the phase I detoxification process. Analysis of 559 single-copy orthologs suggested that Fasciola gigantica and Fasciola hepatica diverged at 11.8 Ma near the Middle and Late Miocene Epoch boundary. We identified 98 rapidly evolving gene families, including actin and aquaporin, which might explain the large body size and the parasitic adaptive character resulting in these liver flukes becoming epidemic in tropical and subtropical regions.
Collapse
Affiliation(s)
- Xier Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Kuiqing Cui
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhiqiang Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhipeng Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Zhengjiao Wu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Weiyi Huang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Xing-Quan Zhu
- College of Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Weiyu Zhang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| | - Qingyou Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China
| |
Collapse
|
125
|
Revollo JR, Miranda JA, Dobrovolsky VN. PacBio sequencing detects genome-wide ultra-low-frequency substitution mutations resulting from exposure to chemical mutagens. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2021; 62:438-445. [PMID: 34424574 DOI: 10.1002/em.22462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/18/2021] [Accepted: 08/20/2021] [Indexed: 06/13/2023]
Abstract
Genetic toxicology uses several assays to identity mutagens and protects the public. Most of these assays, however, rely on reporter genes, can only measure mutation indirectly based on phenotype, and often require specific cell lines or animal models-features that impede their integration with existing and emerging toxicological models, such as organoids. In this study, we show that PacBio Single-Molecule, Real-Time (PB SMRT) sequencing identified substitution mutations caused by chemical mutagens in Escherichia coli by generating nearly error-free consensus reads after repeatedly inspecting both strands of circular DNA molecules. Using DNA from E. coli exposed to ethyl methanosulfonate (EMS) or N-ethyl-N-nitrosourea (ENU), PB SMRT sequencing detected mutation frequencies (MFs) and spectra comparable to those obtained by clone-sequencing from the same exposures. The optimized background MF of PB SMRT sequencing was ≤ 1 × 10-7 mutations per base pair (mut/bp).
Collapse
Affiliation(s)
- Javier R Revollo
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Jaime A Miranda
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Vasily N Dobrovolsky
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| |
Collapse
|
126
|
Fu Y, Mahmoud M, Muraliraman VV, Sedlazeck FJ, Treangen TJ. Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience 2021; 10:6375129. [PMID: 34561697 PMCID: PMC8463296 DOI: 10.1093/gigascience/giab063] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/22/2021] [Accepted: 08/29/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. FINDINGS We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. CONCLUSIONS Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX 77251-1892, USA
| |
Collapse
|
127
|
Lima L, Marchet C, Caboche S, Da Silva C, Istace B, Aury JM, Touzet H, Chikhi R. Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data. Brief Bioinform 2021; 21:1164-1181. [PMID: 31232449 DOI: 10.1093/bib/bbz058] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 04/05/2019] [Accepted: 04/22/2019] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames and creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error correction of Nanopore RNA-sequencing long reads remain limited. RESULTS In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error correction metrics but also the effect of correction on gene families, isoform diversity, bias toward the major isoform and splice site detection. We find that long read error correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error correction tools should be used, depending on the application type. BENCHMARKING SOFTWARE https://gitlab.com/leoisl/LR_EC_analyser.
Collapse
Affiliation(s)
- Leandro Lima
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR Villeurbanne, France.,EPI ERABLE - Inria Grenoble, Rhône-Alpes, France.,Università di Roma 'Tor Vergata', Roma, Italy
| | | | - Ségolène Caboche
- Université de Lille, CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, UMR, Center for Infection and Immunity of Lille, Lille, France
| | - Corinne Da Silva
- Genoscope, Institut de biologie Francois-Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, Evry, France
| | - Benjamin Istace
- Genoscope, Institut de biologie Francois-Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, Evry, France
| | - Jean-Marc Aury
- Genoscope, Institut de biologie Francois-Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, Evry, France
| | - Hélène Touzet
- CNRS, Université de Lille, CRIStAL UMR, Lille, France
| | - Rayan Chikhi
- CNRS, Université de Lille, CRIStAL UMR, Lille, France.,Institut Pasteur, C3BI - USR 3756, 25-28 rue du Docteur Roux, Paris, France
| |
Collapse
|
128
|
Genomic and transcriptomic analyses reveal a tandem amplification unit of 11 genes and mutations in mismatch repair genes in methotrexate-resistant HT-29 cells. Exp Mol Med 2021; 53:1344-1355. [PMID: 34521988 PMCID: PMC8492700 DOI: 10.1038/s12276-021-00668-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 06/04/2021] [Accepted: 06/21/2021] [Indexed: 12/16/2022] Open
Abstract
DHFR gene amplification is commonly present in methotrexate (MTX)-resistant colon cancer cells and acute lymphoblastic leukemia. In this study, we proposed an integrative framework to characterize the amplified region by using a combination of single-molecule real-time sequencing, next-generation optical mapping, and chromosome conformation capture (Hi-C). We identified an amplification unit spanning 11 genes, from the DHFR gene to the ATP6AP1L gene position, with high adjusted interaction frequencies on chromosome 5 (~2.2 Mbp) and a twenty-fold tandemly amplified region, and novel inversions at the start and end positions of the amplified region as well as frameshift insertions in most of the MSH and MLH genes were detected. These mutations might stimulate chromosomal breakage and cause the dysregulation of mismatch repair. Characterizing the tandem gene-amplified unit may be critical for identifying the mechanisms that trigger genomic rearrangements. These findings may provide new insight into the mechanisms underlying the amplification process and the evolution of drug resistance. Sequencing a large region of DNA containing many surplus copies of genes linked to drug resistance in colon cancer cells may illuminate how these genomic rearrangements arise. Such regions of gene amplification are highly repetitive, making them impossible to sequence using ordinary methods, and little is known about how they are generated. Using advanced methods, Jeong-Sun Seo at Seoul National University Bundang Hospital in South Korea and co-workers sequenced a region of gene amplification in colon cancer cells. The amplified region was approximately 20 times the length of that in healthy cells and contained many copies of an eleven-gene segment, including a gene implicated in drug resistance. The region also contained mutations in chromosomal repair genes which would disrupt repair pathways. These results illuminate the genetic changes that lead to gene amplification and drug resistance in cancer cells.
Collapse
|
129
|
Yan SM, Sherman RM, Taylor DJ, Nair DR, Bortvin AN, Schatz MC, McCoy RC. Local adaptation and archaic introgression shape global diversity at human structural variant loci. eLife 2021; 10:e67615. [PMID: 34528508 PMCID: PMC8492059 DOI: 10.7554/elife.67615] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/14/2021] [Indexed: 12/13/2022] Open
Abstract
Large genomic insertions and deletions are a potent source of functional variation, but are challenging to resolve with short-read sequencing, limiting knowledge of the role of such structural variants (SVs) in human evolution. Here, we used a graph-based method to genotype long-read-discovered SVs in short-read data from diverse human genomes. We then applied an admixture-aware method to identify 220 SVs exhibiting extreme patterns of frequency differentiation - a signature of local adaptation. The top two variants traced to the immunoglobulin heavy chain locus, tagging a haplotype that swept to near fixation in certain southeast Asian populations, but is rare in other global populations. Further investigation revealed evidence that the haplotype traces to gene flow from Neanderthals, corroborating the role of immune-related genes as prominent targets of adaptive introgression. Our study demonstrates how recent technical advances can help resolve signatures of key evolutionary events that remained obscured within technically challenging regions of the genome.
Collapse
Affiliation(s)
- Stephanie M Yan
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Divya R Nair
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Andrew N Bortvin
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, BaltimoreBaltimoreUnited States
| |
Collapse
|
130
|
Mahmoud M, Doddapaneni H, Timp W, Sedlazeck FJ. PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation. Genome Biol 2021; 22:268. [PMID: 34521442 PMCID: PMC8442460 DOI: 10.1186/s13059-021-02486-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022] Open
Abstract
Long-read sequencing has been shown to have advantages in structural variation (SV) detection and methylation calling. Many studies focus either on SV, methylation, or phasing of SNV; however, only the combination of variants provides a comprehensive insight into the sample and thus enables novel findings in biology or medicine. PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection. PRINCESS is publicly available at https://github.com/MeHelmy/princess under the MIT license.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | | | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
131
|
Délot EC, Vilain E. Towards improved genetic diagnosis of human differences of sex development. Nat Rev Genet 2021; 22:588-602. [PMID: 34083777 PMCID: PMC10598994 DOI: 10.1038/s41576-021-00365-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2021] [Indexed: 02/05/2023]
Abstract
Despite being collectively among the most frequent congenital developmental conditions worldwide, differences of sex development (DSD) lack recognition and research funding. As a result, what constitutes optimal management remains uncertain. Identification of the individual conditions under the DSD umbrella is challenging and molecular genetic diagnosis is frequently not achieved, which has psychosocial and health-related repercussions for patients and their families. New genomic approaches have the potential to resolve this impasse through better detection of protein-coding variants and ascertainment of under-recognized aetiology, such as mosaic, structural, non-coding or epigenetic variants. Ultimately, it is hoped that better outcomes data, improved understanding of the molecular causes and greater public awareness will bring an end to the stigma often associated with DSD.
Collapse
Affiliation(s)
- Emmanuèle C Délot
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, USA
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, USA
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, USA.
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, USA.
| |
Collapse
|
132
|
Abstract
Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.
Collapse
Affiliation(s)
- Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
133
|
Hu D, Jing J, Snowdon RJ, Mason AS, Shen J, Meng J, Zou J. Exploring the gene pool of Brassica napus by genomics-based approaches. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:1693-1712. [PMID: 34031989 PMCID: PMC8428838 DOI: 10.1111/pbi.13636] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 05/08/2023]
Abstract
De novo allopolyploidization in Brassica provides a very successful model for reconstructing polyploid genomes using progenitor species and relatives to broaden crop gene pools and understand genome evolution after polyploidy, interspecific hybridization and exotic introgression. B. napus (AACC), the major cultivated rapeseed species and the third largest oilseed crop in the world, is a young Brassica species with a limited genetic base resulting from its short history of domestication, cultivation, and intensive selection during breeding for target economic traits. However, the gene pool of B. napus has been significantly enriched in recent decades that has been benefit from worldwide effects by the successful introduction of abundant subgenomic variation and novel genomic variation via intraspecific, interspecific and intergeneric crosses. An important question in this respect is how to utilize such variation to breed crops adapted to the changing global climate. Here, we review the genetic diversity, genome structure, and population-level differentiation of the B. napus gene pool in relation to known exotic introgressions from various species of the Brassicaceae, especially those elucidated by recent genome-sequencing projects. We also summarize progress in gene cloning, trait-marker associations, gene editing, molecular marker-assisted selection and genome-wide prediction, and describe the challenges and opportunities of these techniques as molecular platforms to exploit novel genomic variation and their value in the rapeseed gene pool. Future progress will accelerate the creation and manipulation of genetic diversity with genomic-based improvement, as well as provide novel insights into the neo-domestication of polyploid crops with novel genetic diversity from reconstructed genomes.
Collapse
Affiliation(s)
- Dandan Hu
- National Key Laboratory of Crop Genetic ImprovementCollege of Plant Science & TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Jinjie Jing
- National Key Laboratory of Crop Genetic ImprovementCollege of Plant Science & TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Rod J. Snowdon
- Department of Plant BreedingIFZ Research Centre for Biosystems, Land Use and NutritionJustus Liebig UniversityGiessenGermany
| | - Annaliese S. Mason
- Department of Plant BreedingIFZ Research Centre for Biosystems, Land Use and NutritionJustus Liebig UniversityGiessenGermany
- Plant Breeding DepartmentINRESThe University of BonnBonnGermany
| | - Jinxiong Shen
- National Key Laboratory of Crop Genetic ImprovementCollege of Plant Science & TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Jinling Meng
- National Key Laboratory of Crop Genetic ImprovementCollege of Plant Science & TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Jun Zou
- National Key Laboratory of Crop Genetic ImprovementCollege of Plant Science & TechnologyHuazhong Agricultural UniversityWuhanChina
| |
Collapse
|
134
|
Abstract
The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute and Department of Biomedical Engineering, University of California, Santa Cruz, California 95064, USA;
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genome Sciences and Systems Biology, and McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
| |
Collapse
|
135
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
136
|
Locke RK, Greig DR, Jenkins C, Dallman TJ, Cowley LA. Acquisition and loss of CTX-M plasmids in Shigella species associated with MSM transmission in the UK. Microb Genom 2021; 7. [PMID: 34427554 PMCID: PMC8549364 DOI: 10.1099/mgen.0.000644] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Shigellosis in men who have sex with men (MSM) is caused by multidrug resistant Shigellae, exhibiting resistance to antimicrobials including azithromycin, ciprofloxacin and more recently the third-generation cephalosporins. We sequenced four blaCTX-M-27-positive MSM Shigella isolates (2018–20) using Oxford Nanopore Technologies; three S. sonnei (identified as two MSM clade 2, one MSM clade 5) and one S. flexneri 3a, to explore AMR context. All S. sonnei isolates harboured Tn7/Int2 chromosomal integrons, whereas S. flexneri 3a contained the Shigella Resistance Locus. All strains harboured IncFII pKSR100-like plasmids (67-83kbp); where present blaCTX-M-27 was located on these plasmids flanked by IS26 and IS903B, however blaCTX-M-27 was lost in S. flexneri 3a during storage between Illumina and Nanopore sequencing. IncFII AMR regions were mosaic and likely reorganised by IS26; three of the four plasmids contained azithromycin-resistance genes erm(B) and mph(A) and one harboured the pKSR100 integron. Additionally, all S. sonnei isolates possessed a large IncB/O/K/Z plasmid, two of which carried aph(3’)-Ib/aph(6)-Id/sul2 and tet(A). Monitoring the transmission of mobile genetic elements with co-located AMR determinants is necessary to inform empirical treatment guidance and clinical management of MSM-associated shigellosis.
Collapse
Affiliation(s)
| | - David R Greig
- Gastrointestinal Reference Services, Public Health England, London, UK.,Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, UK
| | - Claire Jenkins
- Gastrointestinal Reference Services, Public Health England, London, UK
| | - Tim J Dallman
- Gastrointestinal Reference Services, Public Health England, London, UK.,Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, UK
| | | |
Collapse
|
137
|
Reddy S, Hung LH, Sala-Torra O, Radich JP, Yeung CC, Yeung KY. A graphical, interactive and GPU-enabled workflow to process long-read sequencing data. BMC Genomics 2021; 22:626. [PMID: 34425749 PMCID: PMC8381503 DOI: 10.1186/s12864-021-07927-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/10/2021] [Indexed: 12/18/2022] Open
Abstract
Background Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Results We present a graphical cloud-enabled workflow for fast, interactive analysis of nanopore sequencing data using GPUs. Users customize parameters, monitor execution and visualize results through an accessible graphical interface. The workflow and its components are completely containerized to ensure reproducibility and facilitate installation of the GPU-enabled software. We also provide an Amazon Machine Image (AMI) with all software and drivers pre-installed for GPU computing on the cloud. Most importantly, we demonstrate the potential of applying our software tools to reduce the turnaround time of cancer diagnostics by generating blood cancer (NB4, K562, ME1, 238 MV4;11) cell line Nanopore data using the Flongle adapter. We observe a 29x speedup and a 93x reduction in costs for the rate-limiting basecalling step in the analysis of blood cancer cell line data. Conclusions Our interactive and efficient software tools will make analyses of Nanopore data using GPU and cloud computing accessible to biomedical and clinical scientists, thus facilitating the adoption of cost effective, fast, portable and real-time long-read sequencing. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07927-1.
Collapse
Affiliation(s)
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, 98402, Tacoma, WA, USA
| | - Olga Sala-Torra
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA
| | - Jerald P Radich
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Clinical Research Division, Kurt Enslein Endowed Chair, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Department of Medicine, University of Washington, 98109, Seattle, WA, USA
| | - Cecilia Cs Yeung
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Department of Laboratory Medicine and Pathology, University of Washington, 98109, Seattle, WA, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, 98402, Tacoma, WA, USA.
| |
Collapse
|
138
|
Wold J, Koepfli KP, Galla SJ, Eccles D, Hogg CJ, Le Lec MF, Guhlin J, Santure AW, Steeves TE. Expanding the conservation genomics toolbox: Incorporating structural variants to enhance genomic studies for species of conservation concern. Mol Ecol 2021; 30:5949-5965. [PMID: 34424587 PMCID: PMC9290615 DOI: 10.1111/mec.16141] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/28/2021] [Accepted: 08/18/2021] [Indexed: 12/28/2022]
Abstract
Structural variants (SVs) are large rearrangements (>50 bp) within the genome that impact gene function and the content and structure of chromosomes. As a result, SVs are a significant source of functional genomic variation, that is, variation at genomic regions underpinning phenotype differences, that can have large effects on individual and population fitness. While there are increasing opportunities to investigate functional genomic variation in threatened species via single nucleotide polymorphism (SNP) data sets, SVs remain understudied despite their potential influence on fitness traits of conservation interest. In this future-focused Opinion, we contend that characterizing SVs offers the conservation genomics community an exciting opportunity to complement SNP-based approaches to enhance species recovery. We also leverage the existing literature-predominantly in human health, agriculture and ecoevolutionary biology-to identify approaches for readily characterizing SVs and consider how integrating these into the conservation genomics toolbox may transform the way we manage some of the world's most threatened species.
Collapse
Affiliation(s)
- Jana Wold
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, Front Royal, Virginia, USA.,Centre for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA.,Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Stephanie J Galla
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - David Eccles
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Marissa F Le Lec
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand
| | - Joseph Guhlin
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand.,Genomics Aotearoa, Dunedin, Otago, New Zealand
| | - Anna W Santure
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
139
|
Zhang S, Liu W, Liu X, Du X, Zhang K, Zhang Y, Song Y, Zi Y, Qiu Q, Lenstra JA, Liu J. Structural Variants Selected during Yak Domestication Inferred from Long-Read Whole-Genome Sequencing. Mol Biol Evol 2021; 38:3676-3680. [PMID: 33944937 PMCID: PMC8382902 DOI: 10.1093/molbev/msab134] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Structural variants (SVs) represent an important genetic resource for both natural and artificial selection. Here we present a chromosome-scale reference genome for domestic yak (Bos grunniens) that has longer contigs and scaffolds (N50 44.72 and 114.39 Mb, respectively) than reported for any other ruminant genome. We further obtained long-read resequencing data for 6 wild and 23 domestic yaks and constructed a genetic SV map of 372,220 SVs that covers the geographic range of the yaks. The majority of the SVs contains repetitive sequences and several are in or near genes. By comparing SVs in domestic and wild yaks, we identified genes that are predominantly related to the nervous system, behavior, immunity, and reproduction and may have been targeted by artificial selection during yak domestication. These findings provide new insights in the domestication of animals living at high altitude and highlight the importance of SVs in animal domestication.
Collapse
Affiliation(s)
- Shangzhe Zhang
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Xinfeng Liu
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Xin Du
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Ke Zhang
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Yang Zhang
- The Supercomputing Center, Lanzhou University, Lanzhou, China
| | - Yongwu Song
- Animal Disease Prevention and Control Center of Gangcha County, Haibei Tibetan Autonomous Prefecture, China
| | - Yunnan Zi
- Animal Husbandry Workstation of Xiahe County, Gannan Tibetan Autonomous Prefecture, China
| | - Qiang Qiu
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Jianquan Liu
- State Key Laboratory of Grassland and Agro-ecosystem, Institute of Innovation Ecology and School of Life Science, Lanzhou University, Lanzhou, China
| |
Collapse
|
140
|
Hirakawa H, Toyoda A, Itoh T, Suzuki Y, Nagano AJ, Sugiyama S, Onodera Y. A spinach genome assembly with remarkable completeness, and its use for rapid identification of candidate genes for agronomic traits. DNA Res 2021; 28:6303609. [PMID: 34142133 PMCID: PMC8231376 DOI: 10.1093/dnares/dsab004] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Indexed: 01/23/2023] Open
Abstract
Spinach (Spinacia oleracea) is grown as a nutritious leafy vegetable worldwide. To accelerate spinach breeding efficiency, a high-quality reference genome sequence with great completeness and continuity is needed as a basic infrastructure. Here, we used long-read and linked-read technologies to construct a de novo spinach genome assembly, designated SOL_r1.1, which was comprised of 287 scaffolds (total size: 935.7 Mb; N50 = 11.3 Mb) with a low proportion of undetermined nucleotides (Ns = 0.34%) and with high gene completeness (BUSCO complete 96.9%). A genome-wide survey of resistance gene analogues identified 695 genes encoding nucleotide-binding site domains, receptor-like protein kinases, receptor-like proteins and transmembrane-coiled coil domains. Based on a high-density double-digest restriction-site associated DNA sequencing-based linkage map, the genome assembly was anchored to six pseudomolecules representing ∼73.5% of the whole genome assembly. In addition, we used SOL_r1.1 to identify quantitative trait loci for bolting timing and fruit/seed shape, which harbour biologically plausible candidate genes, such as homologues of the FLOWERING LOCUS T and EPIDERMAL PATTERNING FACTOR-LIKE genes. The new genome assembly, SOL_r1.1, will serve as a useful resource for identifying loci associated with important agronomic traits and for developing molecular markers for spinach breeding/selection programs.
Collapse
Affiliation(s)
- Hideki Hirakawa
- The Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Atsushi Toyoda
- Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima 411-8540, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Yutaka Suzuki
- The Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8568, Japan
| | - Atsushi J Nagano
- Faculty of Agriculture, Ryukoku University, Otsu, Shiga 520-2194, Japan
| | - Suguru Sugiyama
- School of Agriculture, Hokkaido University, Sapporo 060-8589, Japan
| | - Yasuyuki Onodera
- The Research Faculty of Agriculture, Hokkaido University, Sapporo 060-8589, Japan
| |
Collapse
|
141
|
Karousis ED, Gypas F, Zavolan M, Mühlemann O. Nanopore sequencing reveals endogenous NMD-targeted isoforms in human cells. Genome Biol 2021; 22:223. [PMID: 34389041 PMCID: PMC8361881 DOI: 10.1186/s13059-021-02439-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. RESULTS To identify and analyze endogenous targets of NMD, we apply cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identifies, most derive from alternative exon usage. The isoform-aware analysis reveals many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD seems to be ridding the transcriptome of isoforms resulting from spurious splicing events. CONCLUSIONS Long-read sequencing enables the identification of many novel NMD-sensitive mRNAs and reveals both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.
Collapse
Affiliation(s)
- Evangelos D Karousis
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Foivos Gypas
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058, Basel, Switzerland
| | - Mihaela Zavolan
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056, Basel, Switzerland
| | - Oliver Mühlemann
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
142
|
Sakamoto Y, Zaha S, Suzuki Y, Seki M, Suzuki A. Application of long-read sequencing to the detection of structural variants in human cancer genomes. Comput Struct Biotechnol J 2021; 19:4207-4216. [PMID: 34527193 PMCID: PMC8350331 DOI: 10.1016/j.csbj.2021.07.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/20/2021] [Accepted: 07/25/2021] [Indexed: 01/02/2023] Open
Abstract
In recent years, the so-called long-read sequencing technology has had a substantial impact on various aspects of genome sciences. Here, we introduce recent studies of cancerous structural variants (SVs) using long-read sequencing technologies, namely Pacific Biosciences (PacBio) sequencers, Oxford Nanopore Technologies (ONT) sequencers, and linked-read methods. By taking advantage of long-read lengths, these technologies have enabled the precise detection of SVs, including long insertions by transposable elements, such as LINE-1. In addition to SV detection, the epigenome status (including DNA methylation and haplotype information) surrounding SV loci has also been unveiled by long-read sequencing technologies, to identify the effects of SVs. Among the various research fields in which long-read sequencing has been applied, cancer genomics has shown the most remarkable advances. In fact, many studies are beginning to shed light on the detection of SVs and the elucidation of their complex structures in various types of cancer. In the particular case of cancers, we summarize the technical limitations of the application of this technology to the analysis of clinical samples. We will introduce recent achievements from this viewpoint. However, a similar approach will be started for other applications in the near future. Therefore, by complementing the current short-read sequencing analysis, long-read sequencing should reveal the complex nature of human genomes in their healthy and disease states, which will open a new opportunity for a better understanding of disease development and for a novel strategy for drug development.
Collapse
Affiliation(s)
- Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Suzuko Zaha
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
143
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
144
|
Kamil G, Yoon JY, Yoo S, Cheon CK. Clinical relevance of targeted exome sequencing in patients with rare syndromic short stature. Orphanet J Rare Dis 2021; 16:297. [PMID: 34217350 PMCID: PMC8254301 DOI: 10.1186/s13023-021-01937-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/27/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Large-scale genomic analyses have provided insight into the genetic complexity of short stature (SS); however, only a portion of genetic causes have been identified. In this study, we identified disease-causing mutations in a cohort of Korean patients with suspected syndromic SS by targeted exome sequencing (TES). METHODS Thirty-four patients in South Korea with suspected syndromic disorders based on abnormal growth and dysmorphic facial features, developmental delay, or accompanying anomalies were enrolled in 2018-2020 and evaluated by TES. RESULTS For 17 of 34 patients with suspected syndromic SS, a genetic diagnosis was obtained by TES. The mean SDS values for height, IGF-1, and IGFBP-3 for these 17 patients were - 3.27 ± 1.25, - 0.42 ± 1.15, and 0.36 ± 1.31, respectively. Most patients displayed distinct facial features (16/17) and developmental delay or intellectual disability (12/17). In 17 patients, 19 genetic variants were identified, including 13 novel heterozygous variants, associated with 15 different genetic diseases, including many inherited rare skeletal disorders and connective tissue diseases (e.g., cleidocranial dysplasia, Hajdu-Cheney syndrome, Sheldon-Hall, acromesomelic dysplasia Maroteaux type, and microcephalic osteodysplastic primordial dwarfism type II). After re-classification by clinical reassessment, including family member testing and segregation studies, 42.1% of variants were pathogenic, 42.1% were likely pathogenic variant, and 15.7% were variants of uncertain significance. Ultra-rare diseases accounted for 12 out of 15 genetic diseases (80%). CONCLUSIONS A high positive result from genetic testing suggests that TES may be an effective diagnostic approach for patients with syndromic SS, with implications for genetic counseling. These results expand the mutation spectrum for rare genetic diseases related to SS in Korea.
Collapse
Affiliation(s)
- Gilyazetdinov Kamil
- Department of Pediatrics, National Children's Medical Center, Tashkent, Uzbekistan.,Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Korea
| | - Ju Young Yoon
- Division of Pediatric Endocrinology, Department of Pediatrics, Pusan National University Children's Hospital, Yangsan, Korea
| | - Sukdong Yoo
- Division of Pediatric Endocrinology, Department of Pediatrics, Pusan National University Children's Hospital, Yangsan, Korea
| | - Chong Kun Cheon
- Division of Pediatric Endocrinology, Department of Pediatrics, Pusan National University Children's Hospital, Yangsan, Korea. .,Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Korea.
| |
Collapse
|
145
|
Methods to Study Translated Pseudogenes: Recombinant Expression and Complementation, Targeted Proteomics, and RNA Profiling. Methods Mol Biol 2021. [PMID: 34165719 DOI: 10.1007/978-1-0716-1503-4_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2024]
Abstract
The technical challenge in proving that a given expressed pseudogene is in fact translated into a functional protein is specificity. To circumvent this challenge, one approach is to use PCR in order to generate a series of clones that allow one to exogenously express the pseudogenic protein of interest, either native or fused to a tag, which can facilitate purification, detection, and complementation in both bacterial and mammalian cells. This approach allows an assessment of whether a putative pseudogenic protein possesses enzymatic activity, to identify its subcellular localization and to test its capacity to complement the parental homolog. An alternative approach is to detect the endogenous protein using targeted proteomics analysis and to assess the full range of endogenous RNA isoforms, in order to consider additional coding and noncoding RNA functionality.
Collapse
|
146
|
Tunjić-Cvitanić M, Pasantes JJ, García-Souto D, Cvitanić T, Plohl M, Šatović-Vukšić E. Satellitome Analysis of the Pacific Oyster Crassostrea gigas Reveals New Pattern of Satellite DNA Organization, Highly Scattered across the Genome. Int J Mol Sci 2021; 22:ijms22136798. [PMID: 34202698 PMCID: PMC8268682 DOI: 10.3390/ijms22136798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 06/18/2021] [Accepted: 06/19/2021] [Indexed: 12/22/2022] Open
Abstract
Several features already qualified the invasive bivalve species Crassostrea gigas as a valuable non-standard model organism in genome research. C. gigas is characterized by the low contribution of satellite DNAs (satDNAs) vs. mobile elements and has an extremely low amount of heterochromatin, predominantly built of DNA transposons. In this work, we have identified 52 satDNAs composing the satellitome of C. gigas and constituting about 6.33% of the genome. Satellitome analysis reveals unusual, highly scattered organization of relatively short satDNA arrays across the whole genome. However, peculiar chromosomal distribution and densities are specific for each satDNA. The inspection of the organizational forms of the 11 most abundant satDNAs shows association with constitutive parts of Helitron mobile elements. Nine of the inspected satDNAs are dominantly found in mobile element-associated form, two mostly appear standalone, and only one is present exclusively as Helitron-associated sequence. The Helitron-related satDNAs appear in more chromosomes than other satDNAs, indicating that these mobile elements could be leading satDNA propagation in C. gigas. No significant accumulation of satDNAs on certain chromosomal positions was detected in C. gigas, thus establishing a novel pattern of satDNA organization on the genome level.
Collapse
Affiliation(s)
- Monika Tunjić-Cvitanić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Juan J. Pasantes
- Centro de Investigación Mariña, Universidade de Vigo, Dpto de Bioquímica, Xenética e Inmunoloxía, 36310 Vigo, Spain;
| | - Daniel García-Souto
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain;
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
| | - Tonči Cvitanić
- Rimac Automobili d.o.o., Ljubljanska ulica 7, 10431 Sveta Nedelja, Croatia;
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
- Correspondence:
| |
Collapse
|
147
|
Tvedte ES, Gasser M, Sparklin BC, Michalski J, Hjelmen CE, Johnston JS, Zhao X, Bromley R, Tallon LJ, Sadzewicz L, Rasko DA, Dunning Hotopp JC. Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 (BETHESDA, MD.) 2021; 11:jkab083. [PMID: 33768248 PMCID: PMC8495745 DOI: 10.1093/g3journal/jkab083] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/07/2021] [Indexed: 12/14/2022]
Abstract
The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.
Collapse
Affiliation(s)
- Eric S Tvedte
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Mark Gasser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Benjamin C Sparklin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Jane Michalski
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Carl E Hjelmen
- Department of Biology, Texas A&M University, College Station, TX 77843, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Xuechu Zhao
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Robin Bromley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - David A Rasko
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
148
|
Suh A, Dion-Côté AM. New Perspectives on the Evolution of Within-Individual Genome Variation and Germline/Soma Distinction. Genome Biol Evol 2021; 13:evab095. [PMID: 33963843 PMCID: PMC8245192 DOI: 10.1093/gbe/evab095] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2021] [Indexed: 12/19/2022] Open
Abstract
Genomes can vary significantly even within the same individual. The underlying mechanisms are manifold, ranging from somatic mutation and recombination, development-associated ploidy changes and genetic bottlenecks, over to programmed DNA elimination during germline/soma differentiation. In this perspective piece, we briefly review recent developments in the study of within-individual genome variation in eukaryotes and prokaryotes. We highlight a Society for Molecular Biology and Evolution 2020 virtual symposium entitled "Within-individual genome variation and germline/soma distinction" and the present Special Section of the same name in Genome Biology and Evolution, together fostering cross-taxon synergies in the field to identify and tackle key open questions in the understanding of within-individual genome variation.
Collapse
Affiliation(s)
- Alexander Suh
- School of Biological Sciences—Organisms and the Environment, University of East Anglia, Norwich, United Kingdom
- Department of Organismal Biology—Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Sweden
| | | |
Collapse
|
149
|
Guiglielmoni N, Houtain A, Derzelle A, Van Doninck K, Flot JF. Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms. BMC Bioinformatics 2021; 22:303. [PMID: 34090340 PMCID: PMC8178825 DOI: 10.1186/s12859-021-04118-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/02/2021] [Indexed: 12/21/2022] Open
Abstract
Background Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. Results We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. Conclusions We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04118-3.
Collapse
Affiliation(s)
- Nadège Guiglielmoni
- Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium.
| | - Antoine Houtain
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium
| | - Alessandro Derzelle
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium
| | - Karine Van Doninck
- Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium.,Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| | - Jean-François Flot
- Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels - (IB)², Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| |
Collapse
|
150
|
Ono Y, Asai K, Hamada M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 2021; 37:589-595. [PMID: 32976553 PMCID: PMC8097687 DOI: 10.1093/bioinformatics/btaa835] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/20/2020] [Accepted: 09/11/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|