1
|
PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets. Life (Basel) 2022; 12:life12091345. [PMID: 36143382 PMCID: PMC9505849 DOI: 10.3390/life12091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.
Collapse
|
2
|
Lang J, Sun J, Yang Z, He L, He Y, Chen Y, Huang L, Li P, Li J, Qin L. Nano2NGS-Muta: a framework for converting nanopore sequencing data to NGS-liked sequencing data for hotspot mutation detection. NAR Genom Bioinform 2022; 4:lqac033. [PMID: 35464239 PMCID: PMC9022462 DOI: 10.1093/nargab/lqac033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/30/2022] [Accepted: 04/13/2022] [Indexed: 12/12/2022] Open
Abstract
Nanopore sequencing, also known as single-molecule real-time sequencing, is a third/fourth generation sequencing technology that enables deciphering single DNA/RNA molecules without the polymerase chain reaction. Although nanopore sequencing has made significant progress in scientific research and clinical practice, its application has been limited compared with next-generation sequencing (NGS) due to specific design principle and data characteristics, especially in hotspot mutation detection. Therefore, we developed Nano2NGS-Muta as a data analysis framework for hotspot mutation detection based on long reads from nanopore sequencing. Nano2NGS-Muta is characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. Long reads can be converted into short reads and then processed through existing NGS analysis pipelines in combination with statistical methods for hotspot mutation detection. Nano2NGS-Muta not only effectively avoids false positive/negative results caused by non-random errors and unexpected insertions-deletions (indels) of nanopore sequencing data, improves the detection accuracy of hotspot mutations compared to conventional nanopore sequencing data analysis algorithms but also breaks the barriers of data analysis methods between short-read sequencing and long-read sequencing. We hope Nano2NGS-Muta can serves as a reference method for nanopore sequencing data and promotes higher application scope of nanopore sequencing technology in scientific research and clinical practice.
Collapse
Affiliation(s)
- Jidong Lang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Jiguo Sun
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Zhi Yang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Lei He
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Yu He
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Yanmei Chen
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Lei Huang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Ping Li
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Jialin Li
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Liu Qin
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| |
Collapse
|
3
|
Gaedigk A, Boone EC, Scherer SE, Lee SB, Numanagić I, Sahinalp C, Smith JD, McGee S, Radhakrishnan A, Qin X, Wang WY, Farrow EG, Gonzaludo N, Halpern AL, Nickerson DA, Miller NA, Pratt VM, Kalman LV. CYP2C8, CYP2C9, and CYP2C19 Characterization Using Next-Generation Sequencing and Haplotype Analysis: A GeT-RM Collaborative Project. J Mol Diagn 2022; 24:337-350. [PMID: 35134542 PMCID: PMC9069873 DOI: 10.1016/j.jmoldx.2021.12.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/09/2021] [Accepted: 12/28/2021] [Indexed: 01/13/2023] Open
Abstract
Pharmacogenetic tests typically target selected sequence variants to identify haplotypes that are often defined by star (∗) allele nomenclature. Due to their design, these targeted genotyping assays are unable to detect novel variants that may change the function of the gene product and thereby affect phenotype prediction and patient care. In the current study, 137 DNA samples that were previously characterized by the Genetic Testing Reference Material (GeT-RM) program using a variety of targeted genotyping methods were recharacterized using targeted and whole genome sequencing analysis. Sequence data were analyzed using three genotype calling tools to identify star allele diplotypes for CYP2C8, CYP2C9, and CYP2C19. The genotype calls from next-generation sequencing (NGS) correlated well to those previously reported, except when novel alleles were present in a sample. Six novel alleles and 38 novel suballeles were identified in the three genes due to identification of variants not covered by targeted genotyping assays. In addition, several ambiguous genotype calls from a previous study were resolved using the NGS and/or long-read NGS data. Diplotype calls were mostly consistent between the calling algorithms, although several discrepancies were noted. This study highlights the utility of NGS for pharmacogenetic testing and demonstrates that there are many novel alleles that are yet to be discovered, even in highly characterized genes such as CYP2C9 and CYP2C19.
Collapse
Affiliation(s)
- Andrea Gaedigk
- Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri; University of Missouri-Kansas City School of Medicine, Kansas City, Missouri
| | - Erin C Boone
- Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri
| | - Steven E Scherer
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Seung-Been Lee
- Precision Medicine Institute, Macrogen Inc., Seongnam, Republic of Korea
| | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Sean McGee
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | | | - Xiang Qin
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Wendy Y Wang
- Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri
| | - Emily G Farrow
- University of Missouri-Kansas City School of Medicine, Kansas City, Missouri; Center for Genomic Medicine, Children's Mercy Kansas City, Kansas City, Missouri
| | - Nina Gonzaludo
- Medical Genomics Research, Illumina Inc., San Diego, California
| | - Aaron L Halpern
- Medical Genomics Research, Illumina Inc., San Diego, California
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Neil A Miller
- University of Missouri-Kansas City School of Medicine, Kansas City, Missouri; Center for Genomic Medicine, Children's Mercy Kansas City, Kansas City, Missouri
| | - Victoria M Pratt
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Lisa V Kalman
- Informatics and Data Science Branch, Division of Laboratory Systems, Centers for Disease Control and Prevention, Atlanta, Georgia.
| |
Collapse
|
4
|
Zhang D, Zhang J, Du J, Zhou Y, Wu P, Liu Z, Sun Z, Wang J, Ding W, Chen J, Wang J, Xu Y, Ouyang C, Yang Q. OUP accepted manuscript. Clin Chem 2022; 68:826-836. [PMID: 35290433 DOI: 10.1093/clinchem/hvac024] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022]
Affiliation(s)
- Dong Zhang
- Department of Clinical Laboratory, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Jingjia Zhang
- Department of Clinical Laboratory, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Juan Du
- Department of Clinical Laboratory, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Yiwen Zhou
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Pengfei Wu
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Zidan Liu
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Zhunzhun Sun
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Jianghao Wang
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Wenchao Ding
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Junjie Chen
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Jun Wang
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Yingchun Xu
- Department of Clinical Laboratory, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Chuan Ouyang
- Hangzhou Matridx Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Qiwen Yang
- Department of Clinical Laboratory, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
5
|
Chappleboim A, Joseph-Strauss D, Rahat A, Sharkia I, Adam M, Kitsberg D, Fialkoff G, Lotem M, Gershon O, Schmidtner AK, Oiknine-Djian E, Klochendler A, Sadeh R, Dor Y, Wolf D, Habib N, Friedman N. Early sample tagging and pooling enables simultaneous SARS-CoV-2 detection and variant sequencing. Sci Transl Med 2021; 13:eabj2266. [PMID: 34591660 PMCID: PMC9928115 DOI: 10.1126/scitranslmed.abj2266] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Most severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) diagnostic tests have relied on RNA extraction followed by reverse transcription quantitative polymerase chain reaction (RT-qPCR) assays. Whereas automation improved logistics and different pooling strategies increased testing capacity, highly multiplexed next-generation sequencing (NGS) diagnostics remain a largely untapped resource. NGS tests have the potential to markedly increase throughput while providing crucial SARS-CoV-2 variant information. Current NGS-based detection and genotyping assays for SARS-CoV-2 are costly, mostly due to parallel sample processing through multiple steps. Here, we have established ApharSeq, in which samples are barcoded in the lysis buffer and pooled before reverse transcription. We validated this assay by applying ApharSeq to more than 500 clinical samples from the Clinical Virology Laboratory at Hadassah hospital in a robotic workflow. The assay was linear across five orders of magnitude, and the limit of detection was Ct 33 (~1000 copies/ml, 95% sensitivity) with >99.5% specificity. ApharSeq provided targeted high-confidence genotype information due to unique molecular identifiers incorporated into this method. Because of early pooling, we were able to estimate a 10- to 100-fold reduction in labor, automated liquid handling, and reagent requirements in high-throughput settings compared to current testing methods. The protocol can be tailored to assay other host or pathogen RNA targets simultaneously. These results suggest that ApharSeq can be a promising tool for current and future mass diagnostic challenges.
Collapse
Affiliation(s)
- Alon Chappleboim
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Daphna Joseph-Strauss
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Ayelet Rahat
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Israa Sharkia
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Miriam Adam
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Daniel Kitsberg
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Gavriel Fialkoff
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Matan Lotem
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Omer Gershon
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Anna-Kristina Schmidtner
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Esther Oiknine-Djian
- Hadassah Hebrew University Medical Center, Jerusalem 9112001, Israel.,Lautenberg Centre for Immunology and Cancer Research, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Agnes Klochendler
- Department of Developmental Biology and Cancer Research, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ronen Sadeh
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Yuval Dor
- Department of Developmental Biology and Cancer Research, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Dana Wolf
- Hadassah Hebrew University Medical Center, Jerusalem 9112001, Israel.,Lautenberg Centre for Immunology and Cancer Research, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Naomi Habib
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Nir Friedman
- Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel.,Rachel and Selim Benin School of Computer Science, Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
6
|
Bartoszewicz JM, Genske U, Renard BY. Deep learning-based real-time detection of novel pathogens during sequencing. Brief Bioinform 2021; 22:6326527. [PMID: 34297793 DOI: 10.1093/bib/bbab269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| |
Collapse
|
7
|
Kaye AM, Wasserman WW. The genome atlas: navigating a new era of reference genomes. Trends Genet 2021; 37:807-818. [PMID: 33419587 DOI: 10.1016/j.tig.2020.12.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/07/2020] [Indexed: 10/22/2022]
Abstract
The reference genome serves two distinct purposes within the field of genomics. First, it provides a persistent structure against which findings can be reported, allowing for universal knowledge exchange between users. Second, it reduces the computational costs and time required to process genomic data by creating a scaffold that can be relied upon by analysis software. Here, we posit that current efforts to extend the linear reference to a graph-based structure while trying to fulfil both of these purposes concurrently will face a trade-off between comprehensiveness and computational efficiency. In this article, we explore how the reference genome is used and suggest an alternative structure, The Genome Atlas (TGA), to fulfil the bipartite role of the reference genome.
Collapse
Affiliation(s)
- Alice M Kaye
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|