1
|
Alathari S, Joseph A, Bolaños LM, Studholme DJ, Jeffries AR, Appenteng P, Duodu KA, Sawyerr EB, Paley R, Tyler CR, Temperton B. In field use of water samples for genomic surveillance of infectious spleen and kidney necrosis virus (ISKNV) infecting tilapia fish in Lake Volta, Ghana. PeerJ 2024; 12:e17605. [PMID: 39011377 PMCID: PMC11248997 DOI: 10.7717/peerj.17605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 05/30/2024] [Indexed: 07/17/2024] Open
Abstract
Viral outbreaks are a constant threat to aquaculture, limiting production for better global food security. A lack of diagnostic testing and monitoring in resource-limited areas hinders the capacity to respond rapidly to disease outbreaks and to prevent viral pathogens becoming endemic in fisheries productive waters. Recent developments in diagnostic testing for emerging viruses, however, offers a solution for rapid in situ monitoring of viral outbreaks. Genomic epidemiology has furthermore proven highly effective in detecting viral mutations involved in pathogenesis and assisting in resolving chains of transmission. Here, we demonstrate the application of an in-field epidemiological tool kit to track viral outbreaks in aquaculture on farms with reduced access to diagnostic labs, and with non-destructive sampling. Inspired by the "lab in a suitcase" approach used for genomic surveillance of human viral pathogens and wastewater monitoring of COVID19, we evaluated the feasibility of real-time genome sequencing surveillance of the fish pathogen, Infectious spleen and kidney necrosis virus (ISKNV) in Lake Volta. Viral fractions from water samples collected from cages holding Nile tilapia (Oreochromis niloticus) with suspected ongoing ISKNV infections were concentrated and used as a template for whole genome sequencing, using a previously developed tiled PCR method for ISKNV. Mutations in ISKNV in samples collected from the water surrounding the cages matched those collected from infected caged fish, illustrating that water samples can be used for detecting predominant ISKNV variants in an ongoing outbreak. This approach allows for the detection of ISKNV and tracking of the dynamics of variant frequencies, and may thus assist in guiding control measures for the rapid isolation and quarantine of infected farms and facilities.
Collapse
Affiliation(s)
- Shayma Alathari
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Andrew Joseph
- Centre for Environment, Fisheries and Aquaculture Science (Cefas), Weymouth, United Kingdom
| | - Luis M Bolaños
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - David J Studholme
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Aaron R Jeffries
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Patrick Appenteng
- Fisheries Commission, Ministry of Fisheries and Aquaculture Development, Accra, Ghana
| | - Kwaku A Duodu
- Fisheries Commission, Ministry of Fisheries and Aquaculture Development, Accra, Ghana
| | - Eric B Sawyerr
- Fisheries Commission, Ministry of Fisheries and Aquaculture Development, Accra, Ghana
| | - Richard Paley
- Centre for Environment, Fisheries and Aquaculture Science (Cefas), Weymouth, United Kingdom
| | - Charles R Tyler
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
- University of Exeter, Sustainable Aquaculture Futures Centre, Exeter, United Kingdom
| | - Ben Temperton
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
2
|
Wattanasombat S, Tongjai S. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline. F1000Res 2024; 13:556. [PMID: 38984017 PMCID: PMC11231628 DOI: 10.12688/f1000research.149577.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 07/11/2024] Open
Abstract
Background Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Collapse
Affiliation(s)
- Sara Wattanasombat
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Siripong Tongjai
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| |
Collapse
|
3
|
Hayashida T, Tran LK, Dang ALD, Nagai M, Matsumoto S, Le HNM, Van TD, Tran GV, Tanuma J, Pham TN, Oka S. Identification of New Circulating Recombinant Form of HIV-1 CRF127_07109 in Northern Vietnam. AIDS Res Hum Retroviruses 2024. [PMID: 38666693 DOI: 10.1089/aid.2024.0022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2024] Open
Abstract
Some candidates of a new circulating recombinant form (CRF) of HIV-1 were found in northern Vietnam in our previous study. We succeeded in near full-length sequencing using MinION with plasma samples from 12 people living with HIV. Three of the samples were CRF109_0107, which was recently reported in China. Three others were the newly identified CRF127_07109, while six of them were considered to be CRF127_07109-related unique recombinant forms (URFs). The time to the most recent common ancestor of CRF127_07109 was estimated to be between 2015 and 2019. Our findings showed that CRF127_07109 and related URFs were generated recently in northern Vietnam, rather than migrated independently to northern Vietnam.
Collapse
Affiliation(s)
- Tsunefusa Hayashida
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Linh Khanh Tran
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - An Luong-Dieu Dang
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Cosenza, Italy
| | - Moeko Nagai
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Shoko Matsumoto
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Hoa Nguyen-Minh Le
- Department of Viral and Parasitic Diseases, National Hospital of Tropical Diseases, Hanoi, Vietnam
| | - Trang Dinh Van
- Department of Viral and Parasitic Diseases, National Hospital of Tropical Diseases, Hanoi, Vietnam
| | - Giang Van Tran
- Department of Viral and Parasitic Diseases, National Hospital of Tropical Diseases, Hanoi, Vietnam
- Department of Infectious Diseases, Hanoi Medical University, Hanoi, Vietnam
| | - Junko Tanuma
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Thach Ngoc Pham
- Department of Viral and Parasitic Diseases, National Hospital of Tropical Diseases, Hanoi, Vietnam
| | - Shinichi Oka
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| |
Collapse
|
4
|
Wennmann JT, Lim FS, Senger S, Gani M, Jehle JA, Keilwagen J. Haplotype determination of the Bombyx mori nucleopolyhedrovirus by Nanopore sequencing and linkage of single nucleotide variants. J Gen Virol 2024; 105. [PMID: 38767624 DOI: 10.1099/jgv.0.001983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.
Collapse
Affiliation(s)
- Jörg T Wennmann
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Fang-Shiang Lim
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Sergei Senger
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Mudasir Gani
- Division of Entomology, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology, Kashmir 193 201, J&K, India
| | - Johannes A Jehle
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Jens Keilwagen
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Ernst-Baur-Str. 27, 06484 Quedlinburg, Germany
| |
Collapse
|
5
|
Su J, Li S, Zheng Z, Lam TW, Luo R. ClusterV-Web: a user-friendly tool for profiling HIV quasispecies and generating drug resistance reports from nanopore long-read data. BIOINFORMATICS ADVANCES 2024; 4:vbae006. [PMID: 38282975 PMCID: PMC10812873 DOI: 10.1093/bioadv/vbae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/11/2023] [Accepted: 01/12/2024] [Indexed: 01/30/2024]
Abstract
Summary Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users. To address this issue, we have introduced ClusterV-Web, a user-friendly web-based application that enables easy configuration and execution of ClusterV, both remotely and locally. Our tool provides interactive tables and data visualizations to aid in the interpretation of results. This development is expected to democratize access to long-read sequencing data analysis, enabling a wider range of researchers and clinicians to efficiently profile HIV quasispecies and detect drug resistance mutations. Availability and implementation ClusterV-Web is freely available and open source, with detailed documentation accessible at http://www.bio8.cs.hku.hk/ClusterVW/. The standalone Docker image and source code are also available at https://github.com/HKU-BAL/ClusterV-Web.
Collapse
Affiliation(s)
- Junhao Su
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Shumin Li
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| |
Collapse
|
6
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
7
|
Scanlan JL, Mitchell AC, Marcroft SJ, Forsyth LM, Idnurm A, Van de Wouw AP. Deep amplicon sequencing reveals extensive allelic diversity in the erg11/CYP51 promoter and allows multi-population DMI fungicide resistance monitoring in the canola pathogen Leptosphaeria maculans. Fungal Genet Biol 2023; 168:103814. [PMID: 37343617 DOI: 10.1016/j.fgb.2023.103814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 04/29/2023] [Accepted: 06/12/2023] [Indexed: 06/23/2023]
Abstract
Continued use of fungicides provides a strong selection pressure towards strains with mutations to render these chemicals less effective. Previous research has shown that resistance to the demethylation inhibitor (DMI) fungicides, which target ergosterol synthesis, in the canola pathogen Leptosphaeria maculans has emerged in Australia and Europe. The change in fungicide sensitivity of individual isolates was found to be due to DNA insertions into the promoter of the erg11/CYP51 DMI target gene. Whether or not these were the only types of mutations and how prevalent they were in Australian populations was explored in the current study. New isolates with reduced DMI sensitivity were obtained from screens on DMI-treated plants, revealing eight independent insertions in the erg11 promoter. A novel deep amplicon sequencing approach applied to populations of ascospores fired from stubble identified an additional undetected insertion allele and quantified the frequencies of all known insertions, suggesting that, at least in the samples processed, the combined frequency of resistant alleles is between 0.0376% and 32.6%. Combined insertion allele frequencies positively correlated with population-level measures of in planta resistance to four different DMI treatments. Additionally, there was no evidence for erg11 coding mutations playing a role in conferring resistance in Australian populations. This research provides a key method for assessing fungicide resistance frequency in stubble-borne populations of plant pathogens and a baseline from which additional surveillance can be conducted in L. maculans. Whether or not the observed resistance allele frequencies are associated with loss of effective disease control in the field remains to be established.
Collapse
Affiliation(s)
- Jack L Scanlan
- School of BioSciences, The University of Melbourne, VIC 3010, Australia
| | - Angela C Mitchell
- School of BioSciences, The University of Melbourne, VIC 3010, Australia
| | | | | | - Alexander Idnurm
- School of BioSciences, The University of Melbourne, VIC 3010, Australia
| | | |
Collapse
|
8
|
Molinos-Albert LM, Baquero E, Bouvin-Pley M, Lorin V, Charre C, Planchais C, Dimitrov JD, Monceaux V, Vos M, Hocqueloux L, Berger JL, Seaman MS, Braibant M, Avettand-Fenoël V, Sáez-Cirión A, Mouquet H. Anti-V1/V3-glycan broadly HIV-1 neutralizing antibodies in a post-treatment controller. Cell Host Microbe 2023; 31:1275-1287.e8. [PMID: 37433296 DOI: 10.1016/j.chom.2023.06.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/08/2023] [Accepted: 06/13/2023] [Indexed: 07/13/2023]
Abstract
HIV-1 broadly neutralizing antibodies (bNAbs) can decrease viremia but are usually unable to counteract autologous viruses escaping the antibody pressure. Nonetheless, bNAbs may contribute to natural HIV-1 control in individuals off antiretroviral therapy (ART). Here, we describe a bNAb B cell lineage elicited in a post-treatment controller (PTC) that exhibits broad seroneutralization and show that a representative antibody from this lineage, EPTC112, targets a quaternary epitope in the glycan-V3 loop supersite of the HIV-1 envelope glycoprotein. The cryo-EM structure of EPTC112 complexed with soluble BG505 SOSIP.664 envelope trimers revealed interactions with N301- and N156-branched N-glycans and the 324GDIR327 V3 loop motif. Although the sole contemporaneous virus circulating in this PTC was resistant to EPTC112, it was potently neutralized by autologous plasma IgG antibodies. Our findings illuminate how cross-neutralizing antibodies can alter the HIV-1 infection course in PTCs and may control viremia off-ART, supporting their role in functional HIV-1 cure strategies.
Collapse
Affiliation(s)
- Luis M Molinos-Albert
- Humoral Immunology Unit, Institut Pasteur, Université Paris Cité, INSERM U1222, Paris 75015, France
| | - Eduard Baquero
- NanoImaging Core Facility, Centre de Ressources et Recherches Technologiques (C2RT), Université Paris Cité, Institut Pasteur, Paris 75015, France
| | | | - Valérie Lorin
- Humoral Immunology Unit, Institut Pasteur, Université Paris Cité, INSERM U1222, Paris 75015, France
| | - Caroline Charre
- Université Cité, Faculté de Médecine, Paris 75014, France; INSERM U1016, CNRS UMR8104, Institut Cochin, Paris 75014, France; AP-HP, Service de Virologie, Hôpital Cochin, Paris 75014, France
| | - Cyril Planchais
- Humoral Immunology Unit, Institut Pasteur, Université Paris Cité, INSERM U1222, Paris 75015, France
| | - Jordan D Dimitrov
- Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, Université de Paris, Paris 75006, France
| | - Valérie Monceaux
- Viral Reservoirs and Immune control Unit, Institut Pasteur, Université Paris Cité, Paris 75015, France; HIV, Inflammation and Persistence Unit, Institut Pasteur, Université Paris Cité, Paris 75015, France
| | - Matthijn Vos
- NanoImaging Core Facility, Centre de Ressources et Recherches Technologiques (C2RT), Université Paris Cité, Institut Pasteur, Paris 75015, France
| | - Laurent Hocqueloux
- Service des Maladies Infectieuses et Tropicales, Centre Hospitalier Universitaire d'Orléans La Source, Orléans 45067, France
| | - Jean-Luc Berger
- Department of Internal Medicine, Clinical Immunology and Infectious Diseases, Reims University Hospital, Reims 51100, France
| | | | | | - Véronique Avettand-Fenoël
- Université Cité, Faculté de Médecine, Paris 75014, France; INSERM U1016, CNRS UMR8104, Institut Cochin, Paris 75014, France; AP-HP, Service de Virologie, Hôpital Cochin, Paris 75014, France
| | - Asier Sáez-Cirión
- Viral Reservoirs and Immune control Unit, Institut Pasteur, Université Paris Cité, Paris 75015, France; HIV, Inflammation and Persistence Unit, Institut Pasteur, Université Paris Cité, Paris 75015, France
| | - Hugo Mouquet
- Humoral Immunology Unit, Institut Pasteur, Université Paris Cité, INSERM U1222, Paris 75015, France.
| |
Collapse
|
9
|
Sun S, Cheng F, Han D, Wei S, Zhong A, Massoudian S, Johnson AB. Pairwise comparative analysis of six haplotype assembly methods based on users' experience. BMC Genom Data 2023; 24:35. [PMID: 37386408 PMCID: PMC10311811 DOI: 10.1186/s12863-023-01134-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 05/25/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. RESULT Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms' run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. CONCLUSION The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Mathematics, Texas State University, San Marcos, TX USA
| | - Flora Cheng
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Daphne Han
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Sarah Wei
- Massachusetts Institute of Technology, Cambridge, MA USA
| | | | | | | |
Collapse
|
10
|
Yu R, Cai D, Sun Y. AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data. Bioinformatics 2023; 39:6969105. [PMID: 36610711 PMCID: PMC9825286 DOI: 10.1093/bioinformatics/btac827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/24/2022] [Accepted: 12/24/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION RNA viruses tend to mutate constantly. While many of the variants are neutral, some can lead to higher transmissibility or virulence. Accurate assembly of complete viral genomes enables the identification of underlying variants, which are essential for studying virus evolution and elucidating the relationship between genotypes and virus properties. Recently, third-generation sequencing platforms such as Nanopore sequencers have been used for real-time virus sequencing for Ebola, Zika, coronavirus disease 2019, etc. However, their high per-base error rate prevents the accurate reconstruction of the viral genome. RESULTS In this work, we introduce a new tool, AccuVIR, for viral genome assembly and polishing using error-prone long reads. It can better distinguish sequencing errors from true variants based on the key observation that sequencing errors can disrupt the gene structures of viruses, which usually have a high density of coding regions. Our experimental results on both simulated and real third-generation sequencing data demonstrated its superior performance on generating more accurate viral genomes than generic assembly or polish tools. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of AccuVIR are available at https://github.com/rainyrubyzhou/AccuVIR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runzhou Yu
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR 000000, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR 000000, China
| | - Yanni Sun
- To whom correspondence should be addressed.
| |
Collapse
|
11
|
Cai D, Shang J, Sun Y. HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization. Bioinformatics 2022; 38:5360-5367. [PMID: 36308467 PMCID: PMC9750122 DOI: 10.1093/bioinformatics/btac708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/06/2022] [Accepted: 10/25/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses' evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult. RESULTS In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Yanni Sun
- To whom correspondence should be addressed.
| |
Collapse
|
12
|
Liu Y, Kearney J, Mahmoud M, Kille B, Sedlazeck FJ, Treangen TJ. Rescuing Low Frequency Variants within Intra-Host Viral Populations directly from Oxford Nanopore sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.09.03.458038. [PMID: 34518837 PMCID: PMC8437309 DOI: 10.1101/2021.09.03.458038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Infectious disease monitoring on Oxford Nanopore Technologies (ONT) platforms offers rapid turnaround times and low cost, exemplified by well over a half of million ONT SARS-COV-2 datasets. Tracking low frequency intra-host variants has provided important insights with respect to elucidating within host viral population dynamics and transmission. However, given the higher error rate of ONT, accurate identification of intra-host variants with low allele frequencies remains an open challenge with no viable solutions available. In response to this need, we present Variabel, a novel approach and first method designed for rescuing low frequency intra-host variants from ONT data alone. We evaluated Variabel on both within patient and across patient paired Illumina and ONT datasets; our results show that Variabel can accurately identify low frequency variants below 0.5 allele frequency, outperforming existing state-of-the-art ONT variant callers for this task. Variabel is open-source and available for download at: www.gitlab.com/treangenlab/variabel.
Collapse
Affiliation(s)
- Yunxi Liu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Joshua Kearney
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|