1
|
Luan T, Commichaux S, Hoffmann M, Jayeola V, Jang JH, Pop M, Rand H, Luo Y. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates. BMC Genomics 2024; 25:679. [PMID: 38978005 PMCID: PMC11232133 DOI: 10.1186/s12864-024-10582-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Seth Commichaux
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Maria Hoffmann
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Victor Jayeola
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Jae Hee Jang
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Yan Luo
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| |
Collapse
|
2
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
3
|
Hansen MH, Maagaard M, Cédile O, Nyvold CG. SWIGH-SCORE: A translational light-weight approach in computational detection of rearranged immunoglobulin heavy chain to be used in monoclonal lymphoproliferative disorders. MethodsX 2024; 12:102741. [PMID: 38846434 PMCID: PMC11154698 DOI: 10.1016/j.mex.2024.102741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 04/23/2024] [Accepted: 05/02/2024] [Indexed: 06/09/2024] Open
Abstract
We present a lightweight tool for clonotyping and measurable residual disease (MRD) assessment in monoclonal lymphoproliferative disorders. It is a translational method that enables computational detection of rearranged immunoglobulin heavy chain gene sequences.•The swigh-score clonotyping tool emphasizes parallelization and applicability across sequencing platforms.•The algorithm is based on an adaptation of the Smith-Waterman algorithm for local alignment of reads generated by 2nd and 3rd generation of sequencers.For method validation, we demonstrate the targeted sequences of immunoglobulin heavy chain genes from diagnostic bone marrow using serial dilutions of CD138+ plasma cells from a patient with multiple myeloma. Sequencing libraries from diagnostic samples were prepared for the three sequencing platforms, Ion S5 (Thermo Fisher Scientific), MiSeq (Illumina), and MinION (Oxford Nanopore), using the LymphoTrack assay. Basic quality filtering was performed, and a Smith-Waterman-based swigh-score algorithm was developed in shell and C for clonotyping and MRD assessment using FASTQ data files. Performance is demonstrated across the three different sequencing platforms.
Collapse
Affiliation(s)
- Marcus Høy Hansen
- Haematology-Pathology Research Laboratory, Research Unit of Haematology, Department of Hematology, and Research Unit of Pathology, Department of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
| | - Markus Maagaard
- Haematology-Pathology Research Laboratory, Research Unit of Haematology, Department of Hematology, and Research Unit of Pathology, Department of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
| | - Oriane Cédile
- Haematology-Pathology Research Laboratory, Research Unit of Haematology, Department of Hematology, and Research Unit of Pathology, Department of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
- OPEN, Odense Patient data Explorative Network, Haematology-Pathology Research Laboratory, Odense University Hospital, Odense, Denmark
| | - Charlotte Guldborg Nyvold
- Haematology-Pathology Research Laboratory, Research Unit of Haematology, Department of Hematology, and Research Unit of Pathology, Department of Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark
- OPEN, Odense Patient data Explorative Network, Haematology-Pathology Research Laboratory, Odense University Hospital, Odense, Denmark
| |
Collapse
|
4
|
Castellana S, De Laurentiis V, Bianco A, Del Sambro L, Grassi M, De Leonardis F, Derobertis AM, De Carlo C, Sparapano E, Mosca A, Stolfa S, Ronga L, Santacroce L, Chironna M, Parisi M, Capozzi L, Parisi A. Pannonibacter anstelovis sp. nov. Isolated from Two Cases of Bloodstream Infections in Paediatric Patients. Microorganisms 2024; 12:799. [PMID: 38674743 PMCID: PMC11051880 DOI: 10.3390/microorganisms12040799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/05/2024] [Accepted: 04/12/2024] [Indexed: 04/28/2024] Open
Abstract
This study describes two cases of bacteraemia sustained by a new putative Pannonibacter species isolated at the U.O.C. of Microbiology and Virology of the Policlinico of Bari (Bari, Italy) from the blood cultures of two patients admitted to the Paediatric Oncohaematology Unit. Pannonibacter spp. is an environmental Gram-negative bacterium not commonly associated with nosocomial infections. Species identification was performed using Sanger sequencing of the 16S rRNA gene and Whole-Genome Sequencing (WGS) for both strains. Genomic analyses for the two isolates, BLAST similarity search, and phylogeny for the 16S rDNA sequences lead to an assignment to the species Pannonibacter phragmitetus. However, by performing ANIb, ANIm, tetranucleotide correlation, and DNA-DNA digital hybridization, analyses of the two draft genomes showed that they were very different from those of the species P. phragmitetus. MALDI-TOF analysis, assessment of antimicrobial susceptibility by E-test method, and Analytical Profile Index (API) tests were also performed. This result highlights how environmental bacterial species can easily adapt to the human host and, especially in nosocomial environments, also gain pathogenic potential through antimicrobial resistance.
Collapse
Affiliation(s)
- Stefano Castellana
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| | - Vittoriana De Laurentiis
- UOC Microbiology and Virology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (V.D.L.); (C.D.C.); (E.S.); (S.S.); (L.R.)
| | - Angelica Bianco
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| | - Laura Del Sambro
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| | - Massimo Grassi
- Division of Paediatric Haematology and Oncology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (M.G.); (F.D.L.)
| | - Francesco De Leonardis
- Division of Paediatric Haematology and Oncology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (M.G.); (F.D.L.)
| | - Anna Maria Derobertis
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| | - Carmen De Carlo
- UOC Microbiology and Virology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (V.D.L.); (C.D.C.); (E.S.); (S.S.); (L.R.)
| | - Eleonora Sparapano
- UOC Microbiology and Virology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (V.D.L.); (C.D.C.); (E.S.); (S.S.); (L.R.)
| | - Adriana Mosca
- Department of Interdisciplinary Medicine, School of Medicine, University of Bari “Aldo Moro”, 70124 Bari, Italy; (A.M.); (L.S.)
| | - Stefania Stolfa
- UOC Microbiology and Virology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (V.D.L.); (C.D.C.); (E.S.); (S.S.); (L.R.)
| | - Luigi Ronga
- UOC Microbiology and Virology, Azienda Ospedaliera-Universitaria Policlinico of Bari, 70124 Bari, Italy; (V.D.L.); (C.D.C.); (E.S.); (S.S.); (L.R.)
| | - Luigi Santacroce
- Department of Interdisciplinary Medicine, School of Medicine, University of Bari “Aldo Moro”, 70124 Bari, Italy; (A.M.); (L.S.)
| | - Maria Chironna
- Department of Interdisciplinary Medicine, Hygiene Section, University of Bari “Aldo Moro”, 70124 Bari, Italy;
| | - Michela Parisi
- University-Hospital Pediatric Department, Bambino Gesù Paediatric Hospital, 00165 Rome, Italy;
| | - Loredana Capozzi
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| | - Antonio Parisi
- Istituto Zooprofilattico Sperimentale della Puglia e della Basilicata, 71121 Foggia, Italy; (S.C.); (A.B.); (L.D.S.); (A.M.D.); (A.P.)
| |
Collapse
|
5
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
6
|
Greig DR, Do Nascimento V, Gally DL, Gharbia SE, Dallman TJ, Jenkins C. Re-analysis of an outbreak of Shiga toxin-producing Escherichia coli O157:H7 associated with raw drinking milk using Nanopore sequencing. Sci Rep 2024; 14:5821. [PMID: 38461188 PMCID: PMC10925052 DOI: 10.1038/s41598-024-54662-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/15/2024] [Indexed: 03/11/2024] Open
Abstract
The aim of this study was to compare Illumina and Oxford Nanopore Technology (ONT) sequencing data to quantify genetic variation to assess within-outbreak strain relatedness and characterise microevolutionary events in the accessory genomes of a cluster of 23 genetically and epidemiologically linked isolates related to an outbreak of Shiga toxin-producing Escherichia coli O157:H7 caused by the consumption of raw drinking milk. There were seven discrepant variants called between the two technologies, five were false-negative or false-positive variants in the Illumina data and two were false-negative calls in ONT data. After masking horizontally acquired sequences such as prophages, analysis of both short and long-read sequences revealed the 20 isolates linked to the outbreak in 2017 had a maximum SNP distance of one SNP between each other, and a maximum of five SNPs when including three additional strains identified in 2019. Analysis of the ONT data revealed a 47 kbp deletion event in a terminal compound prophage within one sample relative to the remaining samples, and a 0.65 Mbp large chromosomal rearrangement (inversion), within one sample relative to the remaining samples. Furthermore, we detected two bacteriophages encoding the highly pathogenic Shiga toxin (Stx) subtype, Stx2a. One was typical of Stx2a-phage in this sub-lineage (Ic), the other was atypical and inserted into a site usually occupied by Stx2c-encoding phage. Finally, we observed an increase in the size of the pO157 IncFIB plasmid (1.6 kbp) in isolates from 2019 compared to those from 2017, due to the duplication of insertion elements within the plasmids from the more recently isolated strains. The ability to characterize the accessory genome in this way is the first step to understanding the significance of these microevolutionary events and their impact on the genome plasticity and virulence between strains of this zoonotic, foodborne pathogen.
Collapse
Affiliation(s)
- David R Greig
- National Infection Service, United Kingdom Health Security Agency, London, NW9 5EQ, UK.
- NIRH Health Protection Research Unit for Gastrointestinal Pathogens, Liverpool, UK.
- Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, UK.
| | | | - David L Gally
- Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, UK
| | - Saheer E Gharbia
- National Infection Service, United Kingdom Health Security Agency, London, NW9 5EQ, UK
- NIHR Health Protection Research Unit in Genomes and Enabling Data, Warwick, UK
| | - Timothy J Dallman
- Institute for Risk Assessment Sciences (IRAS), Faculty of Veterinary Medicine, Utrecht University, 3584 CL, Utrecht, The Netherlands
| | - Claire Jenkins
- National Infection Service, United Kingdom Health Security Agency, London, NW9 5EQ, UK
- NIRH Health Protection Research Unit for Gastrointestinal Pathogens, Liverpool, UK
| |
Collapse
|
7
|
Cheng O, Ling MH, Wang C, Wu S, Ritchie ME, Göke J, Amin N, Davidson NM. Flexiplex: a versatile demultiplexer and search tool for omics data. Bioinformatics 2024; 40:btae102. [PMID: 38379414 PMCID: PMC10914444 DOI: 10.1093/bioinformatics/btae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 01/11/2024] [Accepted: 02/20/2024] [Indexed: 02/22/2024] Open
Abstract
MOTIVATION The process of analyzing high throughput sequencing data often requires the identification and extraction of specific target sequences. This could include tasks, such as identifying cellular barcodes and UMIs in single-cell data, and specific genetic variants for genotyping. However, existing tools, which perform these functions are often task-specific, such as only demultiplexing barcodes for a dedicated type of experiment, or are not tolerant to noise in the sequencing data. RESULTS To overcome these limitations, we developed Flexiplex, a versatile and fast sequence searching and demultiplexing tool for omics data, which is based on the Levenshtein distance and thus allows imperfect matches. We demonstrate Flexiplex's application on three use cases, identifying cell-line-specific sequences in Illumina short-read single-cell data, and discovering and demultiplexing cellular barcodes from noisy long-read single-cell RNA-seq data. We show that Flexiplex achieves an excellent balance of accuracy and computational efficiency compared to leading task-specific tools. AVAILABILITY AND IMPLEMENTATION Flexiplex is available at https://davidsongroup.github.io/flexiplex/.
Collapse
Affiliation(s)
- Oliver Cheng
- Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Min Hao Ling
- Department for Epigenetic and Epitranscriptomic Regulation, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore 138672, Republic of Singapore
| | - Changqing Wang
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Shuyi Wu
- Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Jonathan Göke
- Department for Epigenetic and Epitranscriptomic Regulation, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore 138672, Republic of Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Republic of Singapore
| | - Noorul Amin
- Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Nadia M Davidson
- Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
8
|
Hansen MH, Cédile O, Abildgaard N, Nyvold CG. The potential of 3rd-generation nanopore sequencing for B-cell clonotyping in lymphoproliferative disorders. EJHAEM 2024; 5:290-293. [PMID: 38406528 PMCID: PMC10887334 DOI: 10.1002/jha2.815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 09/29/2023] [Accepted: 10/24/2023] [Indexed: 02/27/2024]
Abstract
Lymphoid malignancies are characterized by clonal cell expansion, often identifiable by unique immunoglobulin rearrangements. Heavy (IGH) and light-chain gene usage offers diagnostic insights and enables sensitive residual disease detection via next-generation sequencing. With its adaptable throughput and variable read lengths, Oxford Nanopore thirdgeneration sequencing now holds promise for clonotyping. This study analyzed CD138+ plasma-cell DNA from eight multiple myeloma patients, comparing clonotyping performance between Nanopore sequencing, Illumina MiSeq, and Ion Torrent S5. We demonstrated clonotype consistency across platforms through Smith-Waterman local alignment of nanopore reads. The mean clonal percentage of IGH V and J gene usage in the CD138+ cells was 69% for Nanopore, 67% for S5, and 76% for MiSeq. When aligned with known clonotypes, clonal cells averaged a 91% similarity, exceeding 85%. In summary, Nanopore sequencing, with its capacity for generating millions of high-quality reads, proves effective for detecting clonal IGH rearrangements. This versatile platform offers the potential for measuring residual disease down to a sensitivity level of 10-6 at a lower cost, marking a significant advancement in clonotyping techniques.
Collapse
Affiliation(s)
- Marcus H. Hansen
- Haematology‐Pathology Research Laboratory, Research Unit of HaematologyDepartment of Haematology, and Research Unit of PathologyDepartment of PathologyUniversity of Southern Denmark and Odense University HospitalOdenseDenmark
| | - Oriane Cédile
- Haematology‐Pathology Research Laboratory, Research Unit of HaematologyDepartment of Haematology, and Research Unit of PathologyDepartment of PathologyUniversity of Southern Denmark and Odense University HospitalOdenseDenmark
- OPEN, Odense Patient data Explorative Network, Odense University HospitalOdenseDenmark
| | - Niels Abildgaard
- Haematology‐Pathology Research Laboratory, Research Unit of HaematologyDepartment of Haematology, and Research Unit of PathologyDepartment of PathologyUniversity of Southern Denmark and Odense University HospitalOdenseDenmark
| | - Charlotte G. Nyvold
- Haematology‐Pathology Research Laboratory, Research Unit of HaematologyDepartment of Haematology, and Research Unit of PathologyDepartment of PathologyUniversity of Southern Denmark and Odense University HospitalOdenseDenmark
- OPEN, Odense Patient data Explorative Network, Odense University HospitalOdenseDenmark
| |
Collapse
|
9
|
Duarte VDS, Porcellato D. Host DNA depletion methods and genome-centric metagenomics of bovine hindmilk microbiome. mSphere 2024; 9:e0047023. [PMID: 38054728 PMCID: PMC10826364 DOI: 10.1128/msphere.00470-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/20/2023] [Indexed: 12/07/2023] Open
Abstract
Bovine mastitis is a multi-etiological and complex disease, resulting in serious economic consequences for dairy farmers and industry. In recent years, the microbiological evaluation of raw milk has been investigated in-depth using next-generation sequencing approaches such as metataxonomic analysis. Despite this, host DNA is a major concern in the shotgun metagenomic sequencing of microbial communities in milk samples, and it represents a big challenge. In this study, we aimed to evaluate different methods for host DNA depletion and/or microbial DNA enrichment and assess the use of PCR-based whole genome amplification in milk samples with high somatic cell count (SCC) by using short- and long-read sequencing technologies. Our results evidenced that DNA extraction performed differently in terms of host DNA removal, impacting metagenome composition and functional profiles.. Moreover, the ratio of SCC/bacteria ultimately impacts microbial DNA yield, and samples with low SCC (SCC below 100,000 cells/mL) are the most problematic. When milk samples with high SCC (SCC above 200,000 cells/mL) underwent multiple-displacement amplification (MDA), we successfully recovered high-quality metagenome-assembled genomes (MAGs), and long-read sequencing was feasible even for samples with low DNA concentration. By associating MDA and short-read sequencing, we recovered two times more MAGs than in untreated samples, and an ongoing co-infection not reported by traditional methods was detected for mastitis pathogen. Overall, this new approach will improve the detection of mastitis-associated microorganisms and make it possible to examine host-microbiome interactions in bovine mastitis.IMPORTANCENext-generation sequencing technologies have been widely used to gain new insights into the diversity of the microbial community of milk samples and dairy products for different purposes such as microbial safety, profiling of starter cultures, and host-microbiome interactions. Milk is a complex food matrix, and additionally, the presence of host nucleic acid sequences is considered a contaminant in untargeted high-throughput sequencing studies. Therefore, genomic-centric metagenomic studies of milk samples focusing on the health-disease status in dairy cattle are still scarce, which makes it difficult to evaluate the microbial ecophysiology of bovine hindmilk. This study provides an alternative method for genome-centric metagenome studies applied to hindmilk samples with high somatic cell content, which is indispensable to examining host-microbiome interactions in bovine mastitis.
Collapse
Affiliation(s)
- Vinícius da Silva Duarte
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Davide Porcellato
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
10
|
Safar HA, Alatar F, Mustafa AS. Three Rounds of Read Correction Significantly Improve Eukaryotic Protein Detection in ONT Reads. Microorganisms 2024; 12:247. [PMID: 38399651 PMCID: PMC10893331 DOI: 10.3390/microorganisms12020247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/25/2024] Open
Abstract
BACKGROUND Eukaryotes' whole-genome sequencing is crucial for species identification, gene detection, and protein annotation. Oxford Nanopore Technology (ONT) is an affordable and rapid platform for sequencing eukaryotes; however, the relatively higher error rates require computational and bioinformatic efforts to produce more accurate genome assemblies. Here, we evaluated the effect of read correction tools on eukaryote genome completeness, gene detection and protein annotation. METHODS Reads generated by ONT of four eukaryotes, C. albicans, C. gattii, S. cerevisiae, and P. falciparum, were assembled using minimap2 and underwent three rounds of read correction using flye, medaka and racon. The generates consensus FASTA files were compared for total length (bp), genome completeness, gene detection, and protein-annotation by QUAST, BUSCO, BRAKER1 and InterProScan, respectively. RESULTS Genome completeness was dependent on the assembly method rather than on the read correction tool; however, medaka performed better than flye and racon. Racon significantly performed better than flye and medaka in gene detection, while both racon and medaka significantly performed better than flye in protein-annotation. CONCLUSION We show that three rounds of read correction significantly affect gene detection and protein annotation, which are dependent on assembly quality in preference to assembly completeness.
Collapse
Affiliation(s)
- Hussain A. Safar
- OMICS Research Unit, Health Science Centre, Kuwait University, Kuwait City 13110, Kuwait;
| | - Fatemah Alatar
- Serology and Molecular Microbiology Reference Laboratory, Mubarak Al-Kabeer Hospital, Ministry of Health, Kuwait City 13110, Kuwait;
| | - Abu Salim Mustafa
- Department of Microbiology, Faculty of Medicine, Kuwait University, Kuwait City 13110, Kuwait
| |
Collapse
|
11
|
Yin H, Wu S, Tan J, Guo Q, Li M, Guo J, Wang Y, Jiang X, Zhu H. IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning. Gigascience 2024; 13:giae018. [PMID: 38649300 PMCID: PMC11034026 DOI: 10.1093/gigascience/giae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. FINDINGS We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. CONCLUSIONS IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
Collapse
Affiliation(s)
- Hengchuang Yin
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Jie Tan
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Mo Li
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Jinyuan Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Yaqi Wang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
12
|
Dial DT, Weglarz KM, Brunet BMT, Havill NP, von Dohlen CD, Burke GR. Whole-genome sequence of the Cooley spruce gall adelgid, Adelges cooleyi (Hemiptera: Sternorrhyncha: Adelgidae). G3 (BETHESDA, MD.) 2023; 14:jkad224. [PMID: 37766465 PMCID: PMC10755206 DOI: 10.1093/g3journal/jkad224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 09/06/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
The adelgids (Adelgidae) are a small family of sap-feeding insects, which, together with true aphids (Aphididae) and phylloxerans (Phylloxeridae), make up the infraorder Aphidomorpha. Some adelgid species are highly destructive to forest ecosystems such as Adelges tsugae, Adelges piceae, Adelges laricis, Pineus pini, and Pineus boerneri. Despite this, there are no high-quality genomic resources for adelgids, hindering advanced genomic analyses within Adelgidae and among Aphidomorpha. Here, we used PacBio continuous long-read and Illumina RNA-sequencing to construct a high-quality draft genome assembly for the Cooley spruce gall adelgid, Adelges cooleyi (Gillette), a gall-forming species endemic to North America. The assembled genome is 270.2 Mb in total size and has scaffold and contig N50 statistics of 14.87 and 7.18 Mb, respectively. There are 24,967 predicted coding sequences, and the assembly completeness is estimated at 98.1 and 99.6% with core BUSCO gene sets of Arthropoda and Hemiptera, respectively. Phylogenomic analysis using the A. cooleyi genome, 3 publicly available adelgid transcriptomes, 4 phylloxera transcriptomes, the Daktulosphaira vitifoliae (grape phylloxera) genome, 4 aphid genomes, and 2 outgroup coccoid genomes fully resolves adelgids and phylloxerans as sister taxa. The mitochondrial genome is 24 kb, among the largest in insects sampled to date, with 39.4% composed of noncoding regions. This genome assembly is currently the only genome-scale, annotated assembly for adelgids and will be a valuable resource for understanding the ecology and evolution of Aphidomorpha.
Collapse
Affiliation(s)
- Dustin T Dial
- Department of Entomology, University of Georgia, Athens, GA 30602, USA
| | | | - Bryan M T Brunet
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada K1A 0C6
| | - Nathan P Havill
- USDA Forest Service, Northern Research Station, Hamden, CT 06514, USA
| | | | - Gaelen R Burke
- Department of Entomology, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
13
|
Liu Z, Zhu C, Steinmetz LM, Wei W. Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data. Nucleic Acids Res 2023; 51:e104. [PMID: 37843096 PMCID: PMC10639058 DOI: 10.1093/nar/gkad810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 08/03/2023] [Accepted: 09/20/2023] [Indexed: 10/17/2023] Open
Abstract
Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.
Collapse
Affiliation(s)
- Zhen Liu
- Lingang Laboratory, Shanghai, Shanghai 200031, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, Shanghai 200031, China
| | - Chenchen Zhu
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Lars M Steinmetz
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | - Wu Wei
- Lingang Laboratory, Shanghai, Shanghai 200031, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, Shanghai 200031, China
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, Shanghai 200040, China
| |
Collapse
|
14
|
Ciabatti E, González-Rueda A, de Malmazet D, Lee H, Morgese F, Tripodi M. Genomic stability of self-inactivating rabies. eLife 2023; 12:e83459. [PMID: 37921437 PMCID: PMC10666929 DOI: 10.7554/elife.83459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/02/2023] [Indexed: 11/04/2023] Open
Abstract
Transsynaptic viral vectors provide means to gain genetic access to neurons based on synaptic connectivity and are essential tools for the dissection of neural circuit function. Among them, the retrograde monosynaptic ΔG-Rabies has been widely used in neuroscience research. A recently developed engineered version of the ΔG-Rabies, the non-toxic self-inactivating (SiR) virus, allows the long term genetic manipulation of neural circuits. However, the high mutational rate of the rabies virus poses a risk that mutations targeting the key genetic regulatory element in the SiR genome could emerge and revert it to a canonical ΔG-Rabies. Such revertant mutations have recently been identified in a SiR batch. To address the origin, incidence and relevance of these mutations, we investigated the genomic stability of SiR in vitro and in vivo. We found that "revertant" mutations are rare and accumulate only when SiR is extensively amplified in vitro, particularly in suboptimal production cell lines that have insufficient levels of TEV protease activity. Moreover, we confirmed that SiR-CRE, unlike canonical ΔG-Rab-CRE or revertant-SiR-CRE, is non-toxic and that revertant mutations do not emerge in vivo during long-term experiments.
Collapse
Affiliation(s)
| | | | | | - Hassal Lee
- MRC Laboratory of Molecular BiologyCambridgeUnited Kingdom
| | - Fabio Morgese
- MRC Laboratory of Molecular BiologyCambridgeUnited Kingdom
| | - Marco Tripodi
- MRC Laboratory of Molecular BiologyCambridgeUnited Kingdom
| |
Collapse
|
15
|
Schelkunov MI. Mabs, a suite of tools for gene-informed genome assembly. BMC Bioinformatics 2023; 24:377. [PMID: 37794322 PMCID: PMC10548655 DOI: 10.1186/s12859-023-05499-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Despite constantly improving genome sequencing methods, error-free eukaryotic genome assembly has not yet been achieved. Among other kinds of problems of eukaryotic genome assembly are so-called "haplotypic duplications", which may manifest themselves as cases of alleles being mistakenly assembled as paralogues. Haplotypic duplications are dangerous because they create illusions of gene family expansions and, thus, may lead scientists to incorrect conclusions about genome evolution and functioning. RESULTS Here, I present Mabs, a suite of tools that serve as parameter optimizers of the popular genome assemblers Hifiasm and Flye. By optimizing the parameters of Hifiasm and Flye, Mabs tries to create genome assemblies with the genes assembled as accurately as possible. Tests on 6 eukaryotic genomes showed that in 6 out of 6 cases, Mabs created assemblies with more accurately assembled genes than those generated by Hifiasm and Flye when they were run with default parameters. When assemblies of Mabs, Hifiasm and Flye were postprocessed by a popular tool for haplotypic duplication removal, Purge_dups, genes were better assembled by Mabs in 5 out of 6 cases. CONCLUSIONS Mabs is useful for making high-quality genome assemblies. It is available at https://github.com/shelkmike/Mabs.
Collapse
|
16
|
Leitner K, Motheramgari K, Borth N, Marx N. Nanopore Cas9-targeted sequencing enables accurate and simultaneous identification of transgene integration sites, their structure and epigenetic status in recombinant Chinese hamster ovary cells. Biotechnol Bioeng 2023; 120:2403-2418. [PMID: 36938677 DOI: 10.1002/bit.28382] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/27/2023] [Accepted: 03/12/2023] [Indexed: 03/21/2023]
Abstract
The integration of a transgene expression construct into the host genome is the initial step for the generation of recombinant cell lines used for biopharmaceutical production. The stability and level of recombinant gene expression in Chinese hamster ovary (CHO) can be correlated to the copy number, its integration site as well as the epigenetic context of the transgene vector. Also, undesired integration events, such as concatemers, truncated, and inverted vector repeats, are impacting the stability of recombinant cell lines. Thus, to characterize cell clones and to isolate the most promising candidates, it is crucial to obtain information on the site of integration, the structure of integrated sequence and the epigenetic status. Current sequencing techniques allow to gather this information separately but do not offer a comprehensive and simultaneous resolution. In this study, we present a fast and robust nanopore Cas9-targeted sequencing (nCats) pipeline to identify integration sites, the composition of the integrated sequence as well as its DNA methylation status in CHO cells that can be obtained simultaneously from the same sequencing run. A Cas9-enrichment step during library preparation enables targeted and directional nanopore sequencing with up to 724× median on-target coverage and up to 153 kb long reads. The data generated by nCats provides sensitive, detailed, and correct information on the transgene integration sites and the expression vector structure, which could only be partly produced by traditional Targeted Locus Amplification-seq data. Moreover, with nCats the DNA methylation status can be analyzed from the same raw data without prior DNA amplification.
Collapse
Affiliation(s)
- Klaus Leitner
- Austrian Center of Industrial Biotechnology GmbH, Vienna, Austria
| | | | - Nicole Borth
- Austrian Center of Industrial Biotechnology GmbH, Vienna, Austria
- Department of Biotechnology, Institute of Animal Cell Technology and Systems Biology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Nicolas Marx
- Department of Biotechnology, Institute of Animal Cell Technology and Systems Biology, University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
17
|
Vachon A, Seo GE, Patel NH, Coffin CS, Marinier E, Eyras E, Osiowy C. Hepatitis B virus serum RNA transcript isoform composition and proportion in chronic hepatitis B patients by nanopore long-read sequencing. Front Microbiol 2023; 14:1233178. [PMID: 37645229 PMCID: PMC10461054 DOI: 10.3389/fmicb.2023.1233178] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 07/31/2023] [Indexed: 08/31/2023] Open
Abstract
Introduction Serum hepatitis B virus (HBV) RNA is a promising new biomarker to manage and predict clinical outcomes of chronic hepatitis B (CHB) infection. However, the HBV serum transcriptome within encapsidated particles, which is the biomarker analyte measured in serum, remains poorly characterized. This study aimed to evaluate serum HBV RNA transcript composition and proportionality by PCR-cDNA nanopore sequencing of samples from CHB patients having varied HBV genotype (gt, A to F) and HBeAg status. Methods Longitudinal specimens from 3 individuals during and following pregnancy (approximately 7 months between time points) were also investigated. HBV RNA extracted from 16 serum samples obtained from 13 patients (73.3% female, 84.6% Asian) was sequenced and serum HBV RNA isoform detection and quantification were performed using three bioinformatic workflows; FLAIR, RATTLE, and a GraphMap-based workflow within the Galaxy application. A spike-in RNA variant (SIRV) control mix was used to assess run quality and coverage. The proportionality of transcript isoforms was based on total HBV reads determined by each workflow. Results All chosen isoform detection workflows showed high agreement in transcript proportionality and composition for most samples. HBV pregenomic RNA (pgRNA) was the most frequently observed transcript isoform (93.8% of patient samples), while other detected transcripts included pgRNA spliced variants, 3' truncated variants and HBx mRNA, depending on the isoform detection method. Spliced variants of pgRNA were primarily observed in HBV gtB, C, E, or F-infected patients, with the Sp1 spliced variant detected most frequently. Twelve other pgRNA spliced variant transcripts were identified, including 3 previously unidentified transcripts, although spliced isoform identification was very dependent on the workflow used to analyze sequence data. Longitudinal sampling among pregnant and post-partum antiviral-treated individuals showed increasing proportions of 3' truncated pgRNA variants over time. Conclusions This study demonstrated long-read sequencing as a promising tool for the characterization of the serum HBV transcriptome. However, further studies are needed to better understand how serum HBV RNA isoform type and proportion are linked to CHB disease progression and antiviral treatment response.
Collapse
Affiliation(s)
- Alicia Vachon
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Grace E. Seo
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Nishi H. Patel
- Department of Medicine and Department of Microbiology, Immunology, and Infectious Diseases, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Carla S. Coffin
- Department of Medicine and Department of Microbiology, Immunology, and Infectious Diseases, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Eric Marinier
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, Australia
- The John Curtin School of Medical Research, ANU College of Health and Medicine, Canberra, ACT, Australia
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
- Hospital del Mar Medical Research Institute, Barcelona, Spain
| | - Carla Osiowy
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| |
Collapse
|
18
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550536. [PMID: 37546743 PMCID: PMC10402045 DOI: 10.1101/2023.07.26.550536] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Collapse
Affiliation(s)
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|
19
|
Frascarelli C, Zanetti N, Nasca A, Izzo R, Lamperti C, Lamantea E, Legati A, Ghezzi D. Nanopore long-read next-generation sequencing for detection of mitochondrial DNA large-scale deletions. Front Genet 2023; 14:1089956. [PMID: 37456669 PMCID: PMC10344361 DOI: 10.3389/fgene.2023.1089956] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 06/13/2023] [Indexed: 07/18/2023] Open
Abstract
Primary mitochondrial diseases are progressive genetic disorders affecting multiple organs and characterized by mitochondrial dysfunction. These disorders can be caused by mutations in nuclear genes coding proteins with mitochondrial localization or by genetic defects in the mitochondrial genome (mtDNA). The latter include point pathogenic variants and large-scale deletions/rearrangements. MtDNA molecules with the wild type or a variant sequence can exist together in a single cell, a condition known as mtDNA heteroplasmy. MtDNA single point mutations are typically detected by means of Next-Generation Sequencing (NGS) based on short reads which, however, are limited for the identification of structural mtDNA alterations. Recently, new NGS technologies based on long reads have been released, allowing to obtain sequences of several kilobases in length; this approach is suitable for detection of structural alterations affecting the mitochondrial genome. In the present work we illustrate the optimization of two sequencing protocols based on long-read Oxford Nanopore Technology to detect mtDNA structural alterations. This approach presents strong advantages in the analysis of mtDNA compared to both short-read NGS and traditional techniques, potentially becoming the method of choice for genetic studies on mtDNA.
Collapse
Affiliation(s)
- Chiara Frascarelli
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Nadia Zanetti
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Alessia Nasca
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Rossella Izzo
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Costanza Lamperti
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Eleonora Lamantea
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Andrea Legati
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Daniele Ghezzi
- Unit of Medical Genetics and Neurogenetics, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, Italy
- Department of Pathophysiology and Transplantation (DEPT), University of Milan, Milan, Italy
| |
Collapse
|
20
|
Levin I, Štrajbl M, Fastman Y, Baran D, Twito S, Mioduser J, Keren A, Fischman S, Zhenin M, Nimrod G, Levitin N, Mayor MB, Gadrich M, Ofran Y. Accurate profiling of full-length Fv in highly homologous antibody libraries using UMI tagged short reads. Nucleic Acids Res 2023; 51:e61. [PMID: 37014016 PMCID: PMC10287906 DOI: 10.1093/nar/gkad235] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/14/2023] [Accepted: 03/29/2023] [Indexed: 04/05/2023] Open
Abstract
Deep parallel sequencing (NGS) is a viable tool for monitoring scFv and Fab library dynamics in many antibody engineering high-throughput screening efforts. Although very useful, the commonly used Illumina NGS platform cannot handle the entire sequence of scFv or Fab in a single read, usually focusing on specific CDRs or resorting to sequencing VH and VL variable domains separately, thus limiting its utility in comprehensive monitoring of selection dynamics. Here we present a simple and robust method for deep sequencing repertoires of full length scFv, Fab and Fv antibody sequences. This process utilizes standard molecular procedures and unique molecular identifiers (UMI) to pair separately sequenced VH and VL. We show that UMI assisted VH-VL matching allows for a comprehensive and highly accurate mapping of full length Fv clonal dynamics in large highly homologous antibody libraries, as well as identification of rare variants. In addition to its utility in synthetic antibody discovery processes, our method can be instrumental in generating large datasets for machine learning (ML) applications, which in the field of antibody engineering has been hampered by conspicuous paucity of large scale full length Fv data.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Adi Keren
- Biolojic Design, Ltd, Rehovot, Israel
| | | | | | | | | | | | | | - Yanay Ofran
- Biolojic Design, Ltd, Rehovot, Israel
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
21
|
Boßelmann CM, Leu C, Lal D. Technological and computational approaches to detect somatic mosaicism in epilepsy. Neurobiol Dis 2023:106208. [PMID: 37343892 DOI: 10.1016/j.nbd.2023.106208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/03/2023] [Accepted: 06/16/2023] [Indexed: 06/23/2023] Open
Abstract
Lesional epilepsy is a common and severe disease commonly associated with malformations of cortical development, including focal cortical dysplasia and hemimegalencephaly. Recent advances in sequencing and variant calling technologies have identified several genetic causes, including both short/single nucleotide and structural somatic variation. In this review, we aim to provide a comprehensive overview of the methodological advancements in this field while highlighting the unresolved technological and computational challenges that persist, including ultra-low variant allele fractions in bulk tissue, low availability of paired control samples, spatial variability of mutational burden within the lesion, and the issue of false-positive calls and validation procedures. Information from genetic testing in focal epilepsy may be integrated into clinical care to inform histopathological diagnosis, postoperative prognosis, and candidate precision therapies.
Collapse
Affiliation(s)
- Christian M Boßelmann
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College London, London, UK.
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T., Cambridge, MA, USA; Cologne Center for Genomics (CCG), University of Cologne, Cologne, DE, USA
| |
Collapse
|
22
|
Li KK, Lau B, Suárez NM, Camiolo S, Gunson R, Davison AJ, Orton RJ. Direct Nanopore Sequencing of Human Cytomegalovirus Genomes from High-Viral-Load Clinical Samples. Viruses 2023; 15:1248. [PMID: 37376548 DOI: 10.3390/v15061248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
Nanopore sequencing is becoming increasingly commonplace in clinical settings, particularly for diagnostic assessments and outbreak investigations, due to its portability, low cost, and ability to operate in near real-time. Although high sequencing error rates initially hampered the wider implementation of this technology, improvements have been made continually with each iteration of the sequencing hardware and base-calling software. Here, we assess the feasibility of using nanopore sequencing to determine the complete genomes of human cytomegalovirus (HCMV) in high-viral-load clinical samples without viral DNA enrichment, PCR amplification, or prior knowledge of the sequences. We utilised a hybrid bioinformatic approach that involved assembling the reads de novo, improving the consensus sequence by aligning reads to the best-matching genome from a collated set of published sequences, and polishing the improved consensus sequence. The final genomes from a urine sample and a lung sample, the former with an HCMV to human DNA load approximately 50 times greater than the latter, achieved 99.97 and 99.93% identity, respectively, to the benchmark genomes obtained independently by Illumina sequencing. Thus, we demonstrated that nanopore sequencing is capable of determining HCMV genomes directly from high-viral-load clinical samples with a high accuracy.
Collapse
Affiliation(s)
- Kathy K Li
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
- Regional Virus Laboratory, Belfast Health and Social Care Trust, Belfast BT12 6BA, UK
| | - Betty Lau
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Nicolás M Suárez
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Salvatore Camiolo
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Rory Gunson
- West of Scotland Specialist Virology Centre, NHS Greater Glasgow & Clyde, Glasgow G31 2ER, UK
| | - Andrew J Davison
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Richard J Orton
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| |
Collapse
|
23
|
Wong J, Coombe L, Nikolić V, Zhang E, Nip KM, Sidhu P, Warren RL, Birol I. Linear time complexity de novo long read genome assembly with GoldRush. Nat Commun 2023; 14:2906. [PMID: 37217507 DOI: 10.1038/s41467-023-38716-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 05/11/2023] [Indexed: 05/24/2023] Open
Abstract
Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.
Collapse
Affiliation(s)
- Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolić
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Emily Zhang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Puneet Sidhu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| |
Collapse
|
24
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
25
|
Gorzalski AJ, Kerwin H, Verma S, Hess DC, Sevinsky J, Libuit K, Vlasova-St Louis I, Siao D, Siao L, Buñuel D, Van Hooser S, Pandori MW. Rapid Lineage Assignment of Severe Acute Respiratory Syndrome Coronavirus 2 Cases through Automated Library Preparation, Sequencing, and Bioinformatic Analysis. J Mol Diagn 2023; 25:191-196. [PMID: 36754279 PMCID: PMC9902282 DOI: 10.1016/j.jmoldx.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/06/2023] [Accepted: 01/12/2023] [Indexed: 02/10/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has provided a stage to illustrate that there is considerable value in obtaining rapid, whole-genome-based information about pathogens. This article describes the utility of a commercially available, automated severe acute respiratory syndrome associated coronavirus 2 (SARS-CoV-2) library preparation, genome sequencing, and a bioinformatics analysis pipeline to provide rapid, near-real-time SARS-CoV-2 variant description. This study evaluated the turnaround time, accuracy, and other quality-related parameters obtained from commercially available automated sequencing instrumentation, from analysis of continuous clinical samples obtained from January 1, 2021, to October 6, 2021. This analysis included a base-by-base assessment of sequencing accuracy at every position in the SARS-CoV-2 chromosome using two commercially available methods. Mean turnaround time, from the receipt of a specimen for SARS-CoV-2 testing to the availability of the results, with lineage assignment, was <3 days. Accuracy of sequencing by one method was 100%, although certain sites on the genome were found repeatedly to have been sequenced with varying degrees of read error rate.
Collapse
Affiliation(s)
| | | | - Subhash Verma
- Department of Microbiology and Immunology, University of Nevada-Reno, School of Medicine, Reno, Nevada
| | - David C Hess
- Nevada State Public Health Laboratory, Reno, Nevada; Department of Pathology and Laboratory Medicine, University of Nevada-Reno, School of Medicine, Reno, Nevada
| | | | | | | | | | - Lauren Siao
- Nevada State Public Health Laboratory, Reno, Nevada
| | - Diego Buñuel
- Nevada State Public Health Laboratory, Reno, Nevada
| | | | - Mark W Pandori
- Nevada State Public Health Laboratory, Reno, Nevada; Department of Microbiology and Immunology, University of Nevada-Reno, School of Medicine, Reno, Nevada; Department of Pathology and Laboratory Medicine, University of Nevada-Reno, School of Medicine, Reno, Nevada.
| |
Collapse
|
26
|
Clappier C, Böttner D, Heinzelmann D, Stadermann A, Schulz P, Schmidt M, Lindner B. Deciphering integration loci of CHO manufacturing cell lines using long read nanopore sequencing. N Biotechnol 2023; 75:31-39. [PMID: 36925062 DOI: 10.1016/j.nbt.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 03/02/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
Despite advances in genetic characterization of Chinese hamster ovary (CHO) cell lines regarding identification of integration sites using next generation sequencing, e.g. targeted locus amplification sequencing (TLA-seq), the concatemer structure of the integrated vectors remains elusive. Here, the entire integration locus of two CHO manufacturing cell lines was reconstructed combining CRISPR/Cas9 target enrichment, nanopore sequencing and the Canu de novo assembly pipeline. An IgG producing CHO cell line integrated 3 vector copies, which were near full-length and contained all relevant vector elements such as transgenes and their promoters on each of the vector copies. In contrast, a second CHO cell line producing a bivalent bispecific antibody integrated 7 highly fragmented vector copies in different orientations leading to head-to-head and tail-to-tail fusions. The size of the vector fragments ranged from 3.0 to 11.4 kbp each carrying 1-3 transgenes. The breakpoints of the genome-vector and vector-vector junctions were validated using Sanger sequencing and Southern blotting. A comparison to TLA-seq data confirmed the genomic breakpoints, but most of the breakpoints of the vector-vector fusions were missed by TLA-seq. For the first time, the complete transgene locus of CHO manufacturing cell lines could be deciphered. Strikingly, the application of the nanopore long-read sequencing technology led to novel insights into the complexity of genomic transgene integrations of CHO manufacturing cell lines generated via random integration.
Collapse
Affiliation(s)
- Christian Clappier
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Dennis Böttner
- Research, Cardiometabolic Diseases, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Daniel Heinzelmann
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Anna Stadermann
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Patrick Schulz
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Moritz Schmidt
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany
| | - Benjamin Lindner
- Bioprocess Development Biologicals, Cell Line Development, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach, Germany.
| |
Collapse
|
27
|
Eagle SHC, Robertson J, Bastedo DP, Liu K, Nash JHE. Evaluation of five commercial DNA extraction kits using Salmonella as a model for implementation of rapid Nanopore sequencing in routine diagnostic laboratories. Access Microbiol 2023; 5:000468.v3. [PMID: 36910509 PMCID: PMC9996181 DOI: 10.1099/acmi.0.000468.v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/07/2022] [Indexed: 02/23/2023] Open
Abstract
Oxford Nanopore long-read sequencing offers advantages over Illumina short reads for the identification and characterization of bacterial pathogens for outbreak detection and surveillance activities within a diagnostic public health laboratory context. Compared to Illumina, Nanopore is more cost-effective for small batches, has a lower capital cost and has a faster turnaround time, in addition to the ability to assemble complete bacterial genomes. The quantity and quality of DNA required for Nanopore sequencing are greater than for Illumina, and the DNA extraction methods recommended for obtaining high-molecular-weight DNA are different from those typically used in diagnostic laboratories. Using a Salmonella isolate with a previously closed PacBio genome as a model Enterobacteriaceae organism, we evaluated the quantity, quality and fragmentation of five commercial DNA extraction kits. Nanopore sequencing performance was evaluated for the top three methods: Qiagen EZ1 DNA Tissue, Qiagen DNeasy Blood and Tissue, and a modified, in-house version of the MasterPure Complete DNA and RNA purification. To evaluate the effect of post-extraction DNA purification methods, we subjected extracted DNA from the three selected extraction methods to purification by AMPure beads or ethanol precipitation and compared these outputs with untreated DNA as a control. All methods are suitable for routine whole-genome sequencing (WGS), since all 60 replicates had very high genome recovery rates, with ≥98 % of the reference genome covered by mapped Nanopore reads. For 85 % of the replicates, assembly was able to produce a complete, circular chromosome using either Flye or Canu. In most cases, it is recommended to move directly from extraction to sequencing, as untreated DNA had the highest rates of genome closure regardless of extraction method. Using our evaluation criteria, the Qiagen DNeasy Blood and Tissue kit was found to be the best overall method due to its low cost, ability to scale from single tubes to 96-well plates, and high consistency in yield and sequencing performance.
Collapse
Affiliation(s)
- Shannon H C Eagle
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - James Robertson
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - D Patrick Bastedo
- National Microbiology Laboratory, Public Health Agency of Canada, Toronto, Ontario, Canada
| | - Kira Liu
- Patented Medicine Prices Review Board, Ottawa, Ontario, Canada
| | - John H E Nash
- National Microbiology Laboratory, Public Health Agency of Canada, Toronto, Ontario, Canada
| |
Collapse
|
28
|
Goemann CL, Wilkinson R, Henriques W, Bui H, Goemann HM, Carlson RP, Viamajala S, Gerlach R, Wiedenheft B. Genome sequence, phylogenetic analysis, and structure-based annotation reveal metabolic potential of Chlorella sp. SLA-04. ALGAL RES 2022. [DOI: 10.1016/j.algal.2022.102943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
29
|
Shearman JR, Pootakham W, Sonthirod C, Naktang C, Yoocha T, Sangsrakru D, Jomchai N, Tongsima S, Piriyapongsa J, Ngamphiw C, Wanasen N, Ukoskit K, Punpee P, Klomsa-ard P, Sriroth K, Zhang J, Zhang X, Ming R, Tragoonrung S, Tangphatsornruang S. A draft chromosome-scale genome assembly of a commercial sugarcane. Sci Rep 2022; 12:20474. [PMID: 36443360 PMCID: PMC9705387 DOI: 10.1038/s41598-022-24823-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/21/2022] [Indexed: 11/29/2022] Open
Abstract
Sugarcane accounts for a large portion of the worlds sugar production. Modern commercial cultivars are complex hybrids of S. officinarum, S. spontaneum, and several other Saccharum species, resulting in an auto-allopolyploid with 8-12 copies of each chromosome. The current genome assembly gold standard is to generate a long read assembly followed by chromatin conformation capture sequencing to scaffold. We used the PacBio RSII and chromatin conformation capture sequencing to sequence and assemble the genome of a South East Asian commercial sugarcane cultivar, known as Khon Kaen 3. The Khon Kaen 3 genome assembled into 104,477 contigs totalling 7 Gb, which scaffolded into 56 pseudochromosomes containing 5.2 Gb of sequence. Genome annotation produced 242,406 genes from 30,927 orthogroups. Aligning the Khon Kaen 3 genome sequence to S. officinarum and S. spontaneum revealed a high level of apparent recombination, indicating a chimeric assembly. This assembly error is explained by high nucleotide identity between S. officinarum and S. spontaneum, where 91.8% of S. spontaneum aligns to S. officinarum at 94% identity. Thus, the subgenomes of commercial sugarcane are so similar that using short reads to correct long PacBio reads produced chimeric long reads. Future attempts to sequence sugarcane must take this information into account.
Collapse
Affiliation(s)
- Jeremy R. Shearman
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Wirulda Pootakham
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Chutima Sonthirod
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Chaiwat Naktang
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Thippawan Yoocha
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Duangjai Sangsrakru
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Nukoon Jomchai
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Sissades Tongsima
- grid.425537.20000 0001 2191 4408National Biobank of Thailand, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Jittima Piriyapongsa
- grid.425537.20000 0001 2191 4408National Biobank of Thailand, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Chumpol Ngamphiw
- grid.425537.20000 0001 2191 4408National Biobank of Thailand, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Nanchaya Wanasen
- grid.425537.20000 0001 2191 4408National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Kittipat Ukoskit
- grid.412434.40000 0004 1937 1127Department of Biotechnology, Faculty of Science and Technology, Thammasat University, Rangsit Campus, Klong Luang, Pathum Thani Thailand
| | - Prapat Punpee
- grid.425537.20000 0001 2191 4408National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Pathum Thani, Thailand ,Crop Production, Mitr Phol Innovation and Research Center, Pathum Thani, Thailand
| | - Peeraya Klomsa-ard
- Crop Production, Mitr Phol Innovation and Research Center, Pathum Thani, Thailand
| | - Klanarong Sriroth
- Crop Production, Mitr Phol Innovation and Research Center, Pathum Thani, Thailand
| | - Jisen Zhang
- grid.256111.00000 0004 1760 2876Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian China
| | - Xingtan Zhang
- grid.256111.00000 0004 1760 2876Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian China
| | - Ray Ming
- grid.256111.00000 0004 1760 2876Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian China
| | - Somvong Tragoonrung
- grid.425537.20000 0001 2191 4408National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Pathum Thani, Thailand
| | - Sithichoke Tangphatsornruang
- grid.425537.20000 0001 2191 4408National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand
| |
Collapse
|
30
|
Blassel L, Medvedev P, Chikhi R. Mapping-friendly sequence reductions: Going beyond homopolymer compression. iScience 2022; 25:105305. [PMID: 36339268 PMCID: PMC9633736 DOI: 10.1016/j.isci.2022.105305] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/17/2022] [Accepted: 10/03/2022] [Indexed: 11/09/2022] Open
Abstract
Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in long reads is homopolymer compression. It collapses runs of repeated nucleotides, to remove some sequencing errors and improve mapping sensitivity. Though our intuitive understanding justifies why homopolymer compression works, it in no way implies that it is the best transformation that can be done. In this paper, we explore if there are transformations that can be applied in the same pre-processing manner as homopolymer compression that would achieve better alignment sensitivity. We introduce a more general framework than homopolymer compression, called mapping-friendly sequence reductions. We transform the reference and the reads using these reductions and then apply an alignment algorithm. We demonstrate that some mapping-friendly sequence reductions lead to improved mapping accuracy, outperforming homopolymer compression. Mapping-friendly sequence reductions (MSRs) are functions that transform DNA sequences They are a generalization of the concept of homopolymer compression We show that some well-chosen MSRs enable more accurate long-read mapping
Collapse
Affiliation(s)
- Luc Blassel
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.,Sorbonne Université, Collège doctoral, Paris F-75005, France
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.,Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
31
|
Parameterized syncmer schemes improve long-read mapping. PLoS Comput Biol 2022; 18:e1010638. [PMID: 36306319 PMCID: PMC9645665 DOI: 10.1371/journal.pcbi.1010638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 11/09/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open
Abstract
Motivation Sequencing long reads presents novel challenges to mapping. One such challenge is low sequence similarity between the reads and the reference, due to high sequencing error and mutation rates. This occurs, e.g., in a cancer tumor, or due to differences between strains of viruses or bacteria. A key idea in mapping algorithms is to sketch sequences with their minimizers. Recently, syncmers were introduced as an alternative sketching method that is more robust to mutations and sequencing errors. Results We introduce parameterized syncmer schemes (PSS), a generalization of syncmers, and provide a theoretical analysis for multi-parameter schemes. By combining PSS with downsampling or minimizers we can achieve any desired compression and window guarantee. We implemented the use of PSS in the popular minimap2 and Winnowmap2 mappers. In tests on simulated and real long-read data from a variety of genomes, the PSS-based algorithms, with scheme parameters selected on the basis of our theoretical analysis, reduced unmapped reads by 20-60% at high compression while usually using less memory. The advantage was more pronounced at low sequence identity. At sequence identity of 75% and medium compression, PSS-minimap had only 37% as many unmapped reads, and 8% fewer of the reads that did map were incorrectly mapped. Even at lower compression and error rates, PSS-based mapping mapped more reads than the original minimizer-based mappers as well as mappers using the original syncmer schemes. We conclude that using PSS can improve mapping of long reads in a wide range of settings. Popular long-read mappers use minimizers, the minimal hashed k-mers from overlapping windows, as alignment seeds. Recent work showed that syncmers, which select a fixed set of k-mers as seeds, are more likely to be conserved under errors or mutations than minimizers, making them potentially useful for mapping error-prone long reads. We introduce a framework for creating syncmers, that we call parameterized syncmer schemes, which generalize those introduced so far, and provide a theoretical analysis of their properties. We implemented parameterized syncmer schemes in the minimap2 and Winnowmap2 long-read mappers. Using parameters selected on the basis of our theoretical analysis we demonstrate improved mapping performance, with fewer unmapped and incorrectly mapped reads on a variety of simulated and real datasets. The improvements are consistent across a broad range of compression rates and sequence identities, with the most significant improvements for lower sequence identity (high error or mutation rates) and high compression.
Collapse
|
32
|
Srinivas M, O’Sullivan O, Cotter PD, van Sinderen D, Kenny JG. The Application of Metagenomics to Study Microbial Communities and Develop Desirable Traits in Fermented Foods. Foods 2022; 11:3297. [PMCID: PMC9601669 DOI: 10.3390/foods11203297] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The microbial communities present within fermented foods are diverse and dynamic, producing a variety of metabolites responsible for the fermentation processes, imparting characteristic organoleptic qualities and health-promoting traits, and maintaining microbiological safety of fermented foods. In this context, it is crucial to study these microbial communities to characterise fermented foods and the production processes involved. High Throughput Sequencing (HTS)-based methods such as metagenomics enable microbial community studies through amplicon and shotgun sequencing approaches. As the field constantly develops, sequencing technologies are becoming more accessible, affordable and accurate with a further shift from short read to long read sequencing being observed. Metagenomics is enjoying wide-spread application in fermented food studies and in recent years is also being employed in concert with synthetic biology techniques to help tackle problems with the large amounts of waste generated in the food sector. This review presents an introduction to current sequencing technologies and the benefits of their application in fermented foods.
Collapse
Affiliation(s)
- Meghana Srinivas
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- School of Microbiology, University College Cork, T12 CY82 Cork, Ireland
| | - Orla O’Sullivan
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
| | - Paul D. Cotter
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
| | - Douwe van Sinderen
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- School of Microbiology, University College Cork, T12 CY82 Cork, Ireland
| | - John G. Kenny
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
- Correspondence:
| |
Collapse
|
33
|
Transcriptomics and RNA-Based Therapeutics as Potential Approaches to Manage SARS-CoV-2 Infection. Int J Mol Sci 2022; 23:ijms231911058. [PMID: 36232363 PMCID: PMC9570475 DOI: 10.3390/ijms231911058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 09/13/2022] [Accepted: 09/14/2022] [Indexed: 11/24/2022] Open
Abstract
SARS-CoV-2 is a coronavirus family member that appeared in China in December 2019 and caused the disease called COVID-19, which was declared a pandemic in 2020 by the World Health Organization. In recent months, great efforts have been made in the field of basic and clinical research to understand the biology and infection processes of SARS-CoV-2. In particular, transcriptome analysis has contributed to generating new knowledge of the viral sequences and intracellular signaling pathways that regulate the infection and pathogenesis of SARS-CoV-2, generating new information about its biology. Furthermore, transcriptomics approaches including spatial transcriptomics, single-cell transcriptomics and direct RNA sequencing have been used for clinical applications in monitoring, detection, diagnosis, and treatment to generate new clinical predictive models for SARS-CoV-2. Consequently, RNA-based therapeutics and their relationship with SARS-CoV-2 have emerged as promising strategies to battle the SARS-CoV-2 pandemic with the assistance of novel approaches such as CRISPR-CAS, ASOs, and siRNA systems. Lastly, we discuss the importance of precision public health in the management of patients infected with SARS-CoV-2 and establish that the fusion of transcriptomics, RNA-based therapeutics, and precision public health will allow a linkage for developing health systems that facilitate the acquisition of relevant clinical strategies for rapid decision making to assist in the management and treatment of the SARS-CoV-2-infected population to combat this global public health problem.
Collapse
|
34
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Center for Bioinformatics, Saarland University, Saarbrücken, Germany,
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany,Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium,Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA,
| | | |
Collapse
|
35
|
Jiao X, Imamichi H, Sherman BT, Nahar R, Dewar RL, Lane HC, Imamichi T, Chang W. QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads. Bioinformatics 2022; 38:3192-3199. [PMID: 35532087 PMCID: PMC9890302 DOI: 10.1093/bioinformatics/btac313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 04/27/2022] [Accepted: 05/04/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. RESULTS We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy <80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. AVAILABILITY AND IMPLEMENTATION QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Robin L Dewar
- Virus Isolation and Serology Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - H Clifford Lane
- Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA
| | - Tomozumi Imamichi
- Laboratory of Human Retrovirology and Immunoinformatics, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | | |
Collapse
|
36
|
Bohmann K, Elbrecht V, Carøe C, Bista I, Leese F, Bunce M, Yu DW, Seymour M, Dumbrell AJ, Creer S. Strategies for sample labelling and library preparation in DNA metabarcoding studies. Mol Ecol Resour 2022; 22:1231-1246. [PMID: 34551203 PMCID: PMC9293284 DOI: 10.1111/1755-0998.13512] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 09/07/2021] [Accepted: 09/14/2021] [Indexed: 11/26/2022]
Abstract
Metabarcoding of DNA extracted from environmental or bulk specimen samples is increasingly used to profile biota in basic and applied biodiversity research because of its targeted nature that allows sequencing of genetic markers from many samples in parallel. To achieve this, PCR amplification is carried out with primers designed to target a taxonomically informative marker within a taxonomic group, and sample-specific nucleotide identifiers are added to the amplicons prior to sequencing. The latter enables assignment of the sequences back to the samples they originated from. Nucleotide identifiers can be added during the metabarcoding PCR and during "library preparation", that is, when amplicons are prepared for sequencing. Different strategies to achieve this labelling exist. All have advantages, challenges and limitations, some of which can lead to misleading results, and in the worst case compromise the fidelity of the metabarcoding data. Given the range of questions addressed using metabarcoding, ensuring that data generation is robust and fit for the chosen purpose is critically important for practitioners seeking to employ metabarcoding for biodiversity assessments. Here, we present an overview of the three main workflows for sample-specific labelling and library preparation in metabarcoding studies on Illumina sequencing platforms; one-step PCR, two-step PCR, and tagged PCR. Further, we distill the key considerations for researchers seeking to select an appropriate metabarcoding strategy for their specific study. Ultimately, by gaining insights into the consequences of different metabarcoding workflows, we hope to further consolidate the power of metabarcoding as a tool to assess biodiversity across a range of applications.
Collapse
Affiliation(s)
- Kristine Bohmann
- Faculty of Health and Medical SciencesSection for Evolutionary GenomicsGlobe InstituteUniversity of CopenhagenCopenhagenDenmark
| | - Vasco Elbrecht
- Department of Environmental Systems ScienceETH ZurichZürichSwitzerland
| | - Christian Carøe
- Faculty of Health and Medical SciencesSection for Evolutionary GenomicsGlobe InstituteUniversity of CopenhagenCopenhagenDenmark
| | - Iliana Bista
- Department of GeneticsUniversity of CambridgeCambridgeUK
- Tree of LifeWellcome Sanger InstituteHinxtonUK
| | - Florian Leese
- Aquatic Ecosystem ResearchFaculty of BiologyUniversity of Duisburg‐EssenEssenGermany
| | - Michael Bunce
- Trace and Environmental DNA (TrEnD) LaboratorySchool of Molecular and Life SciencesCurtin UniversityPerthWAAustralia
| | - Douglas W. Yu
- State Key Laboratory of Genetic Resources and EvolutionKunming Institute of ZoologyChinese Academy of SciencesKunmingChina
- School of Biological SciencesNorwich Research ParkUniversity of East AngliaNorwichUK
- Center for Excellence in Animal Evolution and GeneticsChinese Academy of SciencesKunming YunnanChina
| | - Mathew Seymour
- Department of EcologySwedish University of Agricultural SciencesUppsalaSweden
| | | | - Simon Creer
- Molecular Ecology and Evolution GroupSchool of Natural SciencesBangor UniversityGwyneddUK
| |
Collapse
|
37
|
Choi S, Kim KW, Ku KB, Kim SJ, Park C, Park D, Kim S, Yi H. Human Alphacoronavirus Universal Primers for Genome Amplification and Sequencing. Front Microbiol 2022; 13:789665. [PMID: 35401489 PMCID: PMC8990890 DOI: 10.3389/fmicb.2022.789665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/04/2022] [Indexed: 11/13/2022] Open
Abstract
Rapid and accurate sequencing covering the entire genome is essential to identify genetic variations of viral pathogens. However, due to the low viral titers in clinical samples, certain amplification steps are required for viral genome sequencing. At present, there are no universal primers available for alphacoronaviruses and that, since these viruses have diverse strains, new primers specific to the target strain must be continuously developed for sequencing. Thus, in this study, we aimed to develop a universal primer set valid for all human alphacoronaviruses and applicable to samples containing trace amounts of the virus. To this aim, we designed overlapping primer pairs capable of amplifying the entire genome of all known human alphacoronaviruses. The selected primers, named the AC primer set, were composed of 10 primer pairs stretching over the entire genome of alphacoronaviruses, and produced PCR products of the expected size (3-5 kb) from both the HCoV-229E and HCoV-NL63 strains. After genome amplification, an evaluation using various sequencing platforms was carried out. The amplicon library sequencing data were assembled into complete genome sequences in all sequencing strategies examined in this study. The sequencing accuracy varied depending on the sequencing technology, but all sequencing methods showed a sequencing error of less than 0.01%. In the mock clinical specimen, the detection limit was 10-3 PFU/ml (102 copies/ml). The AC primer set and experimental procedure optimized in this study may enable the fast diagnosis of mutant alphacoronaviruses in future epidemics.
Collapse
Affiliation(s)
- Sungmi Choi
- Interdisciplinary Program in Precision Public Health, Korea University, Seoul, South Korea
| | - Kwan Woo Kim
- Interdisciplinary Program in Precision Public Health, Korea University, Seoul, South Korea
| | - Keun Bon Ku
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, South Korea
| | - Seong-Jun Kim
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, South Korea
| | - Changwoo Park
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, South Korea.,Microbiological Analysis Team, Group for Biometrology, Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea.,Department of Agricultural Biotechnology, Seoul National University, Seoul, South Korea
| | - Dongju Park
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, South Korea.,Microbiological Analysis Team, Group for Biometrology, Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea.,Department of Biological Science, Chungnam National University, Daejeon, South Korea
| | - Seil Kim
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, South Korea.,Microbiological Analysis Team, Group for Biometrology, Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea.,Department of Bio-Analysis Science, University of Science and Technology, Daejeon, South Korea
| | - Hana Yi
- Interdisciplinary Program in Precision Public Health, Korea University, Seoul, South Korea.,School of Biosystems and Biomedical Sciences, Korea University, Seoul, South Korea
| |
Collapse
|
38
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
39
|
Chaurasiya S, Yang A, Zhang Z, Lu J, Valencia H, Kim SI, Woo Y, Warner SG, Olafsen T, Zhao Y, Wu X, Fein S, Cheng L, Cheng M, Ede N, Fong Y. A comprehensive preclinical study supporting clinical trial of oncolytic chimeric poxvirus CF33-hNIS-anti-PD-L1 to treat breast cancer. Mol Ther Methods Clin Dev 2022; 24:102-116. [PMID: 35024377 PMCID: PMC8718831 DOI: 10.1016/j.omtm.2021.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/04/2021] [Indexed: 01/12/2023]
Abstract
CF33-hNIS-anti-PD-L1 is an oncolytic chimeric poxvirus encoding two transgenes: human sodium iodide symporter and a single-chain variable fragment against PD-L1. Comprehensive preclinical pharmacology studies encompassing primary and secondary pharmacodynamics and biodistribution and safety studies were performed to support the clinical development of CF33-hNIS-anti-PD-L1. Most of the studies were performed in triple-negative breast cancer (TNBC) models, as the phase I trial is planned for patients with TNBC. Biological functions of virus-encoded transgenes were confirmed, and the virus demonstrated anti-tumor efficacy against TNBC models in mice. In a good laboratory practice (GLP) toxicology study, the virus did not produce any observable adverse effects in mice, suggesting that the doses proposed for the clinical trial should be well tolerated in patients. Furthermore, no neurotoxic effects in mice were seen following intracranial injection of the virus. Also, the risk for horizontal transmission of CF33-hNIS-anti-PD-L1 was assessed in mice, and our results suggest that the virus is unlikely to transmit from infected patients to healthy individuals. Finally, the in-use stability and compatibility of CF33-hNIS-anti-PD-L1 tested under different conditions mimicking the clinical scenarios confirmed the suitability of the virus in clinical settings. The results of these preclinical studies support the use of CF33-hNIS-anti-PD-L1 in a first-in-human trial in patients with TNBC.
Collapse
Affiliation(s)
- Shyambabu Chaurasiya
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Annie Yang
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Zhifang Zhang
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Jianming Lu
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Hannah Valencia
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Sang-In Kim
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Yanghee Woo
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| | - Suanne G Warner
- Department of Surgery, Mayo Clinic, Rochester, MN 55902, USA
| | - Tove Olafsen
- Small Animal Imaging Core, Shared Resources, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Yuqi Zhao
- Integrative Genomics Core, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Xiwei Wu
- Integrative Genomics Core, City of Hope National Medical Center, Duarte, CA 91010, USA
| | | | | | | | | | - Yuman Fong
- Department of Surgery, City of Hope National Medical Center, Familian Science building, Room#1100 1500 E Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
40
|
Control of subunit stoichiometry in single-chain MspA nanopores. Biophys J 2022; 121:742-754. [PMID: 35101416 PMCID: PMC8943699 DOI: 10.1016/j.bpj.2022.01.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 12/18/2021] [Accepted: 01/25/2022] [Indexed: 11/21/2022] Open
Abstract
Transmembrane protein channels enable fast and highly sensitive detection of single molecules. Nanopore sequencing of DNA was achieved using an engineered Mycobacterium smegmatis porin A (MspA) in combination with a motor enzyme. Due to its favorable channel geometry, the octameric MspA pore exhibits the highest current level compared with other pore proteins. To date, MspA is the only protein nanopore with a published record of DNA sequencing. While widely used in commercial devices, nanopore sequencing of DNA suffers from significant base-calling errors due to stochastic events of the complex DNA-motor-pore combination and the contribution of up to five nucleotides to the signal at each position. Different mutations in specific subunits of a pore protein offer an enormous potential to improve nucleotide resolution and sequencing accuracy. However, individual subunits of MspA and other oligomeric protein pores are randomly assembled in vivo and in vitro, preventing the efficient production of designed pores with different subunit mutations. In this study, we converted octameric MspA into a single-chain pore by connecting eight subunits using peptide linkers. Lipid bilayer experiments demonstrated that single-chain MspA formed membrane-spanning channels and discriminated all four nucleotides identical to MspA produced from monomers in DNA hairpin experiments. Single-chain constructs comprising three, five, six, and seven connected subunits assembled to functional channels, demonstrating a remarkable plasticity of MspA to different subunit stoichiometries. Thus, single-chain MspA constitutes a new milestone in the optimization of MspA as a biosensor for DNA sequencing and many other applications by enabling the production of pores with distinct subunit mutations and pore diameters.
Collapse
|
41
|
Nakanishi H, Yoneyama K, Hara M, Takada A, Sakai K, Saito K. Estimating individual mtDNA haplotypes in mixed DNA samples by combining MinION and MiSeq. Int J Legal Med 2022; 136:423-432. [PMID: 35001166 DOI: 10.1007/s00414-021-02763-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 12/03/2021] [Indexed: 12/30/2022]
Abstract
We tried to estimate individual mtDNA haplotypes in mixed DNA samples by combining MinION and MiSeq. The BAM files produced by MiSeq were viewed using Integrative Genomics Viewer (IGV) to verify mixed bases. By sorting the reads according to base type for each mixed base, partial haplotypes were determined. Then, the BAM files produced by MinKNOW were viewed using IGV. To determine haplotypes with IGV, only mixed bases determined by MiSeq were used as target bases. By sorting the reads according to base type for each target base, each contributor's haplotype was estimated. In mixed samples from two contributors, even a haplotype with a minor contribution of 5% could be distinguished from the haplotype of the major contributor. In mixed samples of three contributors (mixture ratios of 1:1:1 and 4:2:1), each haplotype could also be distinguished. Sequences of C-stretches were determined very inaccurately in the MinION analysis. Although the analysis method was simple, each haplotype was correctly detected in all mixed samples with two or three contributors in various mixture ratios by combining MinION and MiSeq. This should be useful for identifying contributors to mixed samples.
Collapse
Affiliation(s)
- Hiroaki Nakanishi
- Department of Forensic Medicine, Juntendo University School of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
| | - Katsumi Yoneyama
- Department of Forensic Medicine, Saitama Medical University, 38 Morohongo, Moroyama, Saitama, 350-0495, Japan
| | - Masaaki Hara
- Department of Forensic Medicine, Saitama Medical University, 38 Morohongo, Moroyama, Saitama, 350-0495, Japan
| | - Aya Takada
- Department of Forensic Medicine, Saitama Medical University, 38 Morohongo, Moroyama, Saitama, 350-0495, Japan
| | - Kentaro Sakai
- Department of Forensic Medicine, Juntendo University School of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan
- Tokyo Medical Examiner's Office, Tokyo Metropolitan Government, 4-21-18, Otsuka, Bunkyo-Ku, Tokyo, 112-0012, Japan
| | - Kazuyuki Saito
- Department of Forensic Medicine, Juntendo University School of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan
| |
Collapse
|
42
|
Surveillance of Listeria monocytogenes: Early Detection, Population Dynamics, and Quasimetagenomic Sequencing during Selective Enrichment. Appl Environ Microbiol 2021; 87:e0177421. [PMID: 34613762 PMCID: PMC8612253 DOI: 10.1128/aem.01774-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In this study, we addressed different aspects regarding the implementation of quasimetagenomic sequencing as a hybrid surveillance method in combination with enrichment for early detection of Listeria monocytogenes in the food industry. Different experimental enrichment cultures were used, comprising seven L. monocytogenes strains of different sequence types (STs), with and without a background microbiota community. To assess whether the proportions of the different STs changed over time during enrichment, the growth and population dynamics were assessed using dapE colony sequencing and dapE and 16S rRNA amplicon sequencing. There was a tendency of some STs to have a higher relative abundance during the late stage of enrichment when L. monocytogenes was enriched without background microbiota. When coenriched with background microbiota, the population dynamics of the different STs was more consistent over time. To evaluate the earliest possible time point during enrichment that allows the detection of L. monocytogenes and at the same time the generation of genetic information that enables an estimation regarding the strain diversity in a sample, quasimetagenomic sequencing was performed early during enrichment in the presence of the background microbiota using Oxford Nanopore Technologies Flongle and Illumina MiSeq sequencing. The application of multiple displacement amplification (MDA) enabled detection of L. monocytogenes (and the background microbiota) after only 4 h of enrichment using both applied sequencing approaches. The MiSeq sequencing data additionally enabled the prediction of cooccurring L. monocytogenes strains in the samples. IMPORTANCE We showed that a combination of a short primary enrichment combined with MDA and Nanopore sequencing can accelerate the traditional process of cultivation and identification of L. monocytogenes. The use of Illumina MiSeq sequencing additionally allowed us to predict the presence of cooccurring L. monocytogenes strains. Our results suggest quasimetagenomic sequencing is a valuable and promising hybrid surveillance tool for the food industry that enables faster identification of L. monocytogenes during early enrichment. Routine application of this approach could lead to more efficient and proactive actions in the food industry that prevent contamination and subsequent product recalls and food destruction, economic and reputational losses, and human listeriosis cases.
Collapse
|
43
|
Shaw J, Yu YW. Theory of local k-mer selection with applications to long-read alignment. Bioinformatics 2021; 38:4659-4669. [PMID: 36124869 PMCID: PMC9563685 DOI: 10.1093/bioinformatics/btab790] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/09/2021] [Accepted: 11/16/2021] [Indexed: 01/23/2023] Open
Abstract
Motivation Selecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the ‘lowest-ordered’ k-mer in a sliding window. Recently, it has been shown that minimizers may be a sub-optimal method for selecting subsets of k-mers when mutations are present. There is, however, a lack of understanding behind the theory of why certain methods perform well. Results We first theoretically investigate the conservation metric for k-mer selection methods. We derive an exact expression for calculating the conservation of a k-mer selection method. This turns out to be tractable enough for us to prove closed-form expressions for a variety of methods, including (open and closed) syncmers, (a, b, n)-words, and an upper bound for minimizers. As a demonstration of our results, we modified the minimap2 read aligner to use a more conserved k-mer selection method and demonstrate that there is up to an 8.2% relative increase in number of mapped reads. However, we found that the k-mers selected by more conserved methods are also more repetitive, leading to a runtime increase during alignment. We give new insight into how one might use new k-mer selection methods as a reparameterization to optimize for speed and alignment quality. Availability and implementation Simulations and supplementary methods are available at https://github.com/bluenote-1577/local-kmer-selection-results. os-minimap2 is a modified version of minimap2 and available at https://github.com/bluenote-1577/os-minimap2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jim Shaw
- Department of Mathematics, University of Toronto , Toronto, ON M5S 2E4, Canada
| | - Yun William Yu
- Department of Mathematics, University of Toronto , Toronto, ON M5S 2E4, Canada
- Department of Computer and Mathematical Sciences, University of Toronto at Scarborough , Scarborough, ON M1C 1A4, Canada
| |
Collapse
|
44
|
Galata V, Busi SB, Kunath BJ, de Nies L, Calusinska M, Halder R, May P, Wilmes P, Laczny CC. Functional meta-omics provide critical insights into long- and short-read assemblies. Brief Bioinform 2021; 22:bbab330. [PMID: 34453168 PMCID: PMC8575027 DOI: 10.1093/bib/bbab330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 07/13/2021] [Accepted: 07/26/2021] [Indexed: 11/12/2022] Open
Abstract
Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.
Collapse
Affiliation(s)
- Valentina Galata
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Susheel Bhanu Busi
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Benoît Josef Kunath
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Laura de Nies
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Magdalena Calusinska
- BioSystems and Bioprocessing Engineering, Luxembourg Institute of Science and Technology, Rue du Brill 41, Belvaux L-4422, Luxembourg
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Cédric Christian Laczny
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
45
|
Huang N, Nie F, Ni P, Gao X, Luo F, Wang J. BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer. Brief Bioinform 2021; 23:6383560. [PMID: 34619757 DOI: 10.1093/bib/bbab405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/13/2021] [Accepted: 09/03/2021] [Indexed: 11/13/2022] Open
Abstract
Long-read sequencing technology enables significant progress in de novo genome assembly. However, the high error rate and the wide error distribution of raw reads result in a large number of errors in the assembly. Polishing is a procedure to fix errors in the draft assembly and improve the reliability of genomic analysis. However, existing methods treat all the regions of the assembly equally while there are fundamental differences between the error distributions of these regions. How to achieve very high accuracy in genome assembly is still a challenging problem. Motivated by the uneven errors in different regions of the assembly, we propose a novel polishing workflow named BlockPolish. In this method, we divide contigs into blocks with low complexity and high complexity according to statistics of aligned nucleotide bases. Multiple sequence alignment is applied to realign raw reads in complex blocks and optimize the alignment result. Due to the different distributions of error rates in trivial and complex blocks, two multitask bidirectional Long short-term memory (LSTM) networks are proposed to predict the consensus sequences. In the whole-genome assemblies of NA12878 assembled by Wtdbg2 and Flye using Nanopore data, BlockPolish has a higher polishing accuracy than other state-of-the-arts including Racon, Medaka and MarginPolish & HELEN. In all assemblies, errors are predominantly indels and BlockPolish has a good performance in correcting them. In addition to the Nanopore assemblies, we further demonstrate that BlockPolish can also reduce the errors in the PacBio assemblies. The source code of BlockPolish is freely available on Github (https://github.com/huangnengCSU/BlockPolish).
Collapse
Affiliation(s)
- Neng Huang
- School of Computer Science and Engineering, Central South University, China
| | - Fan Nie
- School of Computer Science and Engineering, Central South University, China
| | - Peng Ni
- School of Computer Science and Engineering, Central South University, China
| | - Xin Gao
- School of Computer Science, King Abdullah University of Science and Technology, Saudi Arabia
| | - Feng Luo
- School of Computing, Clemson University, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, China
| |
Collapse
|
46
|
Delahaye C, Nicolas J. Sequencing DNA with nanopores: Troubles and biases. PLoS One 2021; 16:e0257521. [PMID: 34597327 PMCID: PMC8486125 DOI: 10.1371/journal.pone.0257521] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 09/06/2021] [Indexed: 12/03/2022] Open
Abstract
Oxford Nanopore Technologies' (ONT) long read sequencers offer access to longer DNA fragments than previous sequencer generations, at the cost of a higher error rate. While many papers have studied read correction methods, few have addressed the detailed characterization of observed errors, a task complicated by frequent changes in chemistry and software in ONT technology. The MinION sequencer is now more stable and this paper proposes an up-to-date view of its error landscape, using the most mature flowcell and basecaller. We studied Nanopore sequencing error biases on both bacterial and human DNA reads. We found that, although Nanopore sequencing is expected not to suffer from GC bias, it is a crucial parameter with respect to errors. In particular, low-GC reads have fewer errors than high-GC reads (about 6% and 8% respectively). The error profile for homopolymeric regions or regions with short repeats, the source of about half of all sequencing errors, also depends on the GC rate and mainly shows deletions, although there are some reads with long insertions. Another interesting finding is that the quality measure, although over-estimated, offers valuable information to predict the error rate as well as the abundance of reads. We supplemented this study with an analysis of a rapeseed RNA read set and shown a higher level of errors with a higher level of deletion in these data. Finally, we have implemented an open source pipeline for long-term monitoring of the error profile, which enables users to easily compute various analysis presented in this work, including for future developments of the sequencing device. Overall, we hope this work will provide a basis for the design of better error-correction methods.
Collapse
|
47
|
Holmqvist I, Bäckerholm A, Tian Y, Xie G, Thorell K, Tang KW. FLAME: long-read bioinformatics tool for comprehensive spliceome characterization. RNA (NEW YORK, N.Y.) 2021; 27:1127-1139. [PMID: 34253685 PMCID: PMC8457008 DOI: 10.1261/rna.078800.121] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/28/2021] [Indexed: 06/13/2023]
Abstract
Comprehensive characterization of differentially spliced RNA transcripts with nanopore sequencing is limited by bioinformatics tools that are reliant on existing annotations. We have developed FLAME, a bioinformatics pipeline for alternative splicing analysis of gene-specific or transcriptome-wide long-read sequencing data. FLAME is a Python-based tool aimed at providing comprehensible quantification of full-length splice variants, reliable de novo recognition of splice sites and exons, and representation of consecutive exon connectivity in the form of a weighted adjacency matrix. Notably, this workflow circumvents issues related to inadequate reference annotations and allows for incorporation of short-read sequencing data to improve the confidence of nanopore sequencing reads. In this study, the Epstein-Barr virus long noncoding RNA RPMS1 was used to demonstrate the utility of the pipeline. RPMS1 is ubiquitously expressed in Epstein-Barr virus associated cancer and known to undergo ample differential splicing. To fully resolve the RPMS1 spliceome, we combined gene-specific nanopore sequencing reads from a primary gastric adenocarcinoma and a nasopharyngeal carcinoma cell line with matched publicly available short-read sequencing data sets. All previously reported splice variants, including putative ORFs, were detected using FLAME. In addition, 32 novel exons, including two intron retentions and a cassette exon, were discovered within the RPMS1 gene.
Collapse
MESH Headings
- Benchmarking
- Cell Line, Tumor
- Computational Biology/methods
- Epstein-Barr Virus Infections/genetics
- Epstein-Barr Virus Infections/metabolism
- Epstein-Barr Virus Infections/pathology
- Epstein-Barr Virus Infections/virology
- Exons
- Herpesvirus 4, Human/genetics
- Herpesvirus 4, Human/pathogenicity
- High-Throughput Nucleotide Sequencing
- Humans
- Introns
- Nanopore Sequencing
- Nasopharyngeal Carcinoma/genetics
- Nasopharyngeal Carcinoma/metabolism
- Nasopharyngeal Carcinoma/pathology
- Nasopharyngeal Carcinoma/virology
- Nasopharyngeal Neoplasms/genetics
- Nasopharyngeal Neoplasms/metabolism
- Nasopharyngeal Neoplasms/pathology
- Nasopharyngeal Neoplasms/virology
- RNA Splicing
- RNA, Long Noncoding/genetics
- RNA, Long Noncoding/metabolism
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- RNA, Viral/genetics
- RNA, Viral/metabolism
- Sequence Analysis, RNA
- Software
Collapse
Affiliation(s)
- Isak Holmqvist
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
| | - Alan Bäckerholm
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
| | - Yarong Tian
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
| | - Guojiang Xie
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
| | - Kaisa Thorell
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
| | - Ka-Wei Tang
- Department of Infectious Diseases, Institute of Biomedicine, University of Gothenburg, 413 46 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, Sahlgrenska Center for Cancer Research, Västra Götaland Region, Department of Clinical Microbiology, Sahlgrenska University Hospital, 413 46 Gothenburg, Sweden
| |
Collapse
|
48
|
Pistone D, Meroni G, Panelli S, D’Auria E, Acunzo M, Pasala AR, Zuccotti GV, Bandi C, Drago L. A Journey on the Skin Microbiome: Pitfalls and Opportunities. Int J Mol Sci 2021; 22:9846. [PMID: 34576010 PMCID: PMC8469928 DOI: 10.3390/ijms22189846] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/07/2021] [Accepted: 09/08/2021] [Indexed: 12/22/2022] Open
Abstract
The human skin microbiota is essential for maintaining homeostasis and ensuring barrier functions. Over the years, the characterization of its composition and taxonomic diversity has reached outstanding goals, with more than 10 million bacterial genes collected and cataloged. Nevertheless, the study of the skin microbiota presents specific challenges that need to be addressed in study design. Benchmarking procedures and reproducible and robust analysis workflows for increasing comparability among studies are required. For various reasons and because of specific technical problems, these issues have been investigated in gut microbiota studies, but they have been largely overlooked for skin microbiota. After a short description of the skin microbiota, the review tackles methodological aspects and their pitfalls, covering NGS approaches and high throughput culture-based techniques. Recent insights into the "core" and "transient" types of skin microbiota and how the manipulation of these communities can prevent or combat skin diseases are also covered. Finally, this review includes an overview of the main dermatological diseases, the changes in the microbiota composition associated with them, and the recommended skin sampling procedures. The last section focuses on topical and oral probiotics to improve and maintain skin health, considering their possible applications for skin diseases.
Collapse
Affiliation(s)
- Dario Pistone
- Pediatric Clinical Research Center “Invernizzi”, Department of Biomedical and Clinical Sciences “L. Sacco”, University of Milan, 20157 Milan, Italy; (S.P.); (A.R.P.); (G.V.Z.)
- Department of Biomedical Sciences for Health, University of Milan, 20133 Milan, Italy;
| | - Gabriele Meroni
- Department of Biomedical Surgical and Dental Sciences-One Health Unit, University of Milan, 20133 Milan, Italy;
| | - Simona Panelli
- Pediatric Clinical Research Center “Invernizzi”, Department of Biomedical and Clinical Sciences “L. Sacco”, University of Milan, 20157 Milan, Italy; (S.P.); (A.R.P.); (G.V.Z.)
| | - Enza D’Auria
- Department of Pediatrics, Children’s Hospital Vittore Buzzi, University of Milan, 20154 Milan, Italy; (E.D.); (M.A.)
| | - Miriam Acunzo
- Department of Pediatrics, Children’s Hospital Vittore Buzzi, University of Milan, 20154 Milan, Italy; (E.D.); (M.A.)
| | - Ajay Ratan Pasala
- Pediatric Clinical Research Center “Invernizzi”, Department of Biomedical and Clinical Sciences “L. Sacco”, University of Milan, 20157 Milan, Italy; (S.P.); (A.R.P.); (G.V.Z.)
| | - Gian Vincenzo Zuccotti
- Pediatric Clinical Research Center “Invernizzi”, Department of Biomedical and Clinical Sciences “L. Sacco”, University of Milan, 20157 Milan, Italy; (S.P.); (A.R.P.); (G.V.Z.)
- Department of Pediatrics, Children’s Hospital Vittore Buzzi, University of Milan, 20154 Milan, Italy; (E.D.); (M.A.)
| | - Claudio Bandi
- Pediatric Clinical Research Center “Invernizzi”, Department of Biosciences, University of Milan, 20133 Milan, Italy;
| | - Lorenzo Drago
- Department of Biomedical Sciences for Health, University of Milan, 20133 Milan, Italy;
| |
Collapse
|
49
|
Garushyants SK, Rogozin IB, Koonin EV. Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.04.23.441209. [PMID: 33907754 PMCID: PMC8077628 DOI: 10.1101/2021.04.23.441209] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The appearance of multiple new SARS-CoV-2 variants during the winter of 2020-2021 is a matter of grave concern. Some of these new variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the COVID-19 pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on point nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the spike protein is thought to be a determinant of SARS-CoV-2 virulence and other inserts might have contributed to coronavirus pathogenicity as well. Here, we investigate insertions in SARS-CoV-2 genomes and identify 347 unique inserts of different lengths. We present evidence that these inserts reflect actual virus variance rather than sequencing errors. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. We show that inserts in the Spike glycoprotein can affect its antigenic properties and thus merit monitoring. At least, three inserts in the N-terminal domain of the Spike (ins245IME, ins246DSWG, and ins248SSLT) that were first detected in 2021 are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity.
Collapse
Affiliation(s)
- Sofya K. Garushyants
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor B. Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
50
|
Wold J, Koepfli KP, Galla SJ, Eccles D, Hogg CJ, Le Lec MF, Guhlin J, Santure AW, Steeves TE. Expanding the conservation genomics toolbox: Incorporating structural variants to enhance genomic studies for species of conservation concern. Mol Ecol 2021; 30:5949-5965. [PMID: 34424587 PMCID: PMC9290615 DOI: 10.1111/mec.16141] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/28/2021] [Accepted: 08/18/2021] [Indexed: 12/28/2022]
Abstract
Structural variants (SVs) are large rearrangements (>50 bp) within the genome that impact gene function and the content and structure of chromosomes. As a result, SVs are a significant source of functional genomic variation, that is, variation at genomic regions underpinning phenotype differences, that can have large effects on individual and population fitness. While there are increasing opportunities to investigate functional genomic variation in threatened species via single nucleotide polymorphism (SNP) data sets, SVs remain understudied despite their potential influence on fitness traits of conservation interest. In this future-focused Opinion, we contend that characterizing SVs offers the conservation genomics community an exciting opportunity to complement SNP-based approaches to enhance species recovery. We also leverage the existing literature-predominantly in human health, agriculture and ecoevolutionary biology-to identify approaches for readily characterizing SVs and consider how integrating these into the conservation genomics toolbox may transform the way we manage some of the world's most threatened species.
Collapse
Affiliation(s)
- Jana Wold
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, Front Royal, Virginia, USA.,Centre for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA.,Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Stephanie J Galla
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - David Eccles
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Marissa F Le Lec
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand
| | - Joseph Guhlin
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand.,Genomics Aotearoa, Dunedin, Otago, New Zealand
| | - Anna W Santure
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|