1
|
Quek ZBR, Ng SH. Hybrid-Capture Target Enrichment in Human Pathogens: Identification, Evolution, Biosurveillance, and Genomic Epidemiology. Pathogens 2024; 13:275. [PMID: 38668230 PMCID: PMC11054155 DOI: 10.3390/pathogens13040275] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/11/2024] [Accepted: 03/18/2024] [Indexed: 04/29/2024] Open
Abstract
High-throughput sequencing (HTS) has revolutionised the field of pathogen genomics, enabling the direct recovery of pathogen genomes from clinical and environmental samples. However, pathogen nucleic acids are often overwhelmed by those of the host, requiring deep metagenomic sequencing to recover sufficient sequences for downstream analyses (e.g., identification and genome characterisation). To circumvent this, hybrid-capture target enrichment (HC) is able to enrich pathogen nucleic acids across multiple scales of divergences and taxa, depending on the panel used. In this review, we outline the applications of HC in human pathogens-bacteria, fungi, parasites and viruses-including identification, genomic epidemiology, antimicrobial resistance genotyping, and evolution. Importantly, we explored the applicability of HC to clinical metagenomics, which ultimately requires more work before it is a reliable and accurate tool for clinical diagnosis. Relatedly, the utility of HC was exemplified by COVID-19, which was used as a case study to illustrate the maturity of HC for recovering pathogen sequences. As we unravel the origins of COVID-19, zoonoses remain more relevant than ever. Therefore, the role of HC in biosurveillance studies is also highlighted in this review, which is critical in preparing us for the next pandemic. We also found that while HC is a popular tool to study viruses, it remains underutilised in parasites and fungi and, to a lesser extent, bacteria. Finally, weevaluated the future of HC with respect to bait design in the eukaryotic groups and the prospect of combining HC with long-read HTS.
Collapse
Affiliation(s)
- Z. B. Randolph Quek
- Defence Medical & Environmental Research Institute, DSO National Laboratories, Singapore 117510, Singapore
| | | |
Collapse
|
2
|
Li T, Unger ER, Rajeevan MS. Broad-Spectrum Detection of HPV in Male Genital Samples Using Target-Enriched Whole-Genome Sequencing. Viruses 2023; 15:1967. [PMID: 37766373 PMCID: PMC10538195 DOI: 10.3390/v15091967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/18/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023] Open
Abstract
Most human papillomavirus (HPV) surveillance studies target 30-50 of the more than 200 known types. We applied our recently described enriched whole-genome sequencing (eWGS) assay to demonstrate the impact of detecting all known and novel HPV types in male genital samples (n = 50). HPV was detected in nearly all (82%) samples, (mean number of types/samples 13.6; range 1-85), and nearly all HPV-positive samples included types in multiple genera (88%). A total of 560 HPV detections (237 unique HPV types: 46 alpha, 55 beta, 135 gamma, and 1 mu types) were made. The most frequently detected HPV types were alpha (HPV90, 43, and 74), beta (HPV115, 195, and 120), and gamma (HPV134, mSD2, and HPV50). High-risk alpha types (HPV16, 18, 31, 39, 52, and 58) were not common. A novel gamma type was identified (now officially HPV229) along with 90 unclassified types. This pilot study demonstrates the utility of the eWGS assay for broad-spectrum type detection and suggests a significantly higher type diversity in males compared to females that warrants further study.
Collapse
Affiliation(s)
| | | | - Mangalathu S. Rajeevan
- Division of High-Consequence Pathogens & Pathology, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA 30329, USA; (T.L.); (E.R.U.)
| |
Collapse
|
3
|
Macedo R, Isidro J, Ferreira R, Pinto M, Borges V, Duarte S, Vieira L, Gomes JP. Molecular Capture of Mycobacterium tuberculosis Genomes Directly from Clinical Samples: A Potential Backup Approach for Epidemiological and Drug Susceptibility Inferences. Int J Mol Sci 2023; 24:ijms24032912. [PMID: 36769230 PMCID: PMC9918089 DOI: 10.3390/ijms24032912] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/20/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open
Abstract
The application of whole genome sequencing of Mycobacterium tuberculosis directly on clinical samples has been investigated as a means to avoid the time-consuming need for culture isolation that can lead to a potential prolonged suboptimal antibiotic treatment. We aimed to provide a proof-of-concept regarding the application of the molecular capture of M. tuberculosis genomes directly from positive sputum samples as an approach for epidemiological and drug susceptibility predictions. Smear-positive sputum samples (n = 100) were subjected to the SureSelectXT HS Target Enrichment protocol (Agilent Technologies, Santa Clara, CA, USA) and whole-genome sequencing analysis. A higher number of reads on target were obtained for higher smear grades samples (i.e., 3+ followed by 2+). Moreover, 37 out of 100 samples showed ≥90% of the reference genome covered with at least 10-fold depth of coverage (27, 9, and 1 samples were 3+, 2+, and 1+, respectively). Regarding drug-resistance/susceptibility prediction, for 42 samples, ≥90% of the >9000 hits that are surveyed by TB-profiler were detected. Our results demonstrated that M. tuberculosis genome capture and sequencing directly from clinical samples constitute a potential valid backup approach for phylogenetic inferences and resistance prediction, essentially in settings when culture is not routinely performed or for samples that fail to grow.
Collapse
Affiliation(s)
- Rita Macedo
- National Reference Laboratory for Mycobacteria, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Joana Isidro
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Rita Ferreira
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Miguel Pinto
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Vítor Borges
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Sílvia Duarte
- Innovation and Technology Unit, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - Luís Vieira
- Innovation and Technology Unit, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
| | - João Paulo Gomes
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health (INSA), 1649-016 Lisbon, Portugal
- Correspondence:
| |
Collapse
|
4
|
Nested PCR followed by NGS: Validation and application for HPV genotyping of Tunisian cervical samples. PLoS One 2021; 16:e0255914. [PMID: 34379683 PMCID: PMC8357094 DOI: 10.1371/journal.pone.0255914] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 07/26/2021] [Indexed: 12/28/2022] Open
Abstract
The most used methodologies for HPV genotyping in Tunisian studies are based on hybridization that are limited to a restricted number of HPV types and to a lack of specificity and sensitivity for same types. Recently, Next-Generation sequencing (NGS) technology has been efficiently used for HPV genotyping. In this work we designed and validated a sensitive genotyping method based on nested PCR followed by NGS. Eighty-six samples were tested for the validation of an HPV genotyping assay based on Nested-PCR followed by NGS. These include, 43 references plasmids and 43 positive HPV clinical cervical specimens previously evaluated with the conventional genotyping method: Reverse Line Hybridization (RLH). Results of genotyping using NGS were compared to those of RLH. The analytical sensitivity of the NGS assay was 1GE/μl per sample. The NGS allowed the detection of all HPV types presented in references plasmids. On the clinical samples, a total of 19 HPV types were detected versus 14 types using RLH. Besides the identification of more HPV types in multiple infection (6 types for NGS versus 4 for RLH), NGS allowed the identification of HPV types that were not detected by RLH. In addition, the NGS assay detected newly HPV types that were not described in Tunisia so far: HPV81, HPV43, HPV74, and HPV62. The high sensitivity and specificity of NGS for HPV genotyping in addition to the identification of new HPV types may justify the use of such technique to provide with high accuracy the profile of circulating types in epidemiological studies.
Collapse
|
5
|
Human Papillomavirus Detection by Whole-Genome Next-Generation Sequencing: Importance of Validation and Quality Assurance Procedures. Viruses 2021; 13:v13071323. [PMID: 34372528 PMCID: PMC8310033 DOI: 10.3390/v13071323] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/04/2021] [Accepted: 06/18/2021] [Indexed: 12/27/2022] Open
Abstract
Next-generation sequencing (NGS) yields powerful opportunities for studying human papillomavirus (HPV) genomics for applications in epidemiology, public health, and clinical diagnostics. HPV genotypes, variants, and point mutations can be investigated in clinical materials and described in previously unprecedented detail. However, both the NGS laboratory analysis and bioinformatical approach require numerous steps and checks to ensure robust interpretation of results. Here, we provide a step-by-step review of recommendations for validation and quality assurance procedures of each step in the typical NGS workflow, with a focus on whole-genome sequencing approaches. The use of directed pilots and protocols to ensure optimization of sequencing data yield, followed by curated bioinformatical procedures, is particularly emphasized. Finally, the storage and sharing of data sets are discussed. The development of international standards for quality assurance should be a goal for the HPV NGS community, similar to what has been developed for other areas of sequencing efforts including microbiology and molecular pathology. We thus propose that it is time for NGS to be included in the global efforts on quality assurance and improvement of HPV-based testing and diagnostics.
Collapse
|
6
|
Kim SY, Hwang KA, Ann JH, Kim JH, Nam JH. Next-generation sequencing for typing human papillomaviruses and predicting multi-infections and their clinical symptoms. Microbiol Immunol 2021; 65:273-278. [PMID: 34133044 DOI: 10.1111/1348-0421.12927] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 05/29/2021] [Accepted: 06/12/2021] [Indexed: 11/28/2022]
Abstract
Human papillomavirus (HPV) has more than 100 different types, some of which are associated with cancer. The most common example is that of cervical cancer, which is associated with HPV16 and HPV18. Here, we performed next-generation sequencing (NGS) to type 2436 samples obtained from Korean women to elucidate the correlation between multiple infections, virus types, and cytology. NGS revealed that types 58, 56, and 16 were the most common in high-risk (HR) types, whereas types 90, 54, and 81 were the most common in low-risk (LR) types. The incidence of atypical squamous cells of undetermined significance (ASCUS) or high-grade squamous intraepithelial lesion (HSIL) was 11.45% in single-type cases and 27.17% in multiple infections by the two types of HPV. ASCUS or HSIL was 29.79% in only the HR type multiple infections and 29.81% in mixed high- and low-risk types of multiple infections, whereas it was 18.79% in LR type multiple infections (P ≤ 0.0001). Co-infection by LR-HPV and HR-HPV is therefore more likely to cause cell lesions. Collectively, these results show that the higher the incidence of multiple infections, the greater the frequency of cell lesions. Thus, to predict the clinical symptoms, it would be beneficial to confirm the HPV type and multiple infections using NGS, although this could be relatively expensive.
Collapse
Affiliation(s)
- Sang-Yeon Kim
- Department of Medical and Biological Sciences and Department of Biotechnology, The Catholic University of Korea, Bucheon, Korea.,Department of Quality Assurance, SML Genetree, Seoul, Korea
| | - Kyung-A Hwang
- Department of Quality Assurance, SML Genetree, Seoul, Korea
| | - Ji-Hoon Ann
- Department of Quality Assurance, SML Genetree, Seoul, Korea
| | - Ji-Hye Kim
- Department of Medical Nutrition, Graduate School of East-West Medical Science, The Kyung Hee University of Korea, Yongin, Korea
| | - Jae-Hwan Nam
- Department of Medical and Biological Sciences and Department of Biotechnology, The Catholic University of Korea, Bucheon, Korea
| |
Collapse
|
7
|
Acquaviva G, Visani M, Sanza V, De Leo A, Maloberti T, Pierotti P, Crucitti P, Collina G, Chiarelli Olivari C, Pession A, Tallini G, de Biase D. Different Methods in HPV Genotyping of Anogenital and Oropharyngeal Lesions: Comparison between VisionArray® Technology, Next Generation Sequencing, and Hybrid Capture Assay. JOURNAL OF MOLECULAR PATHOLOGY 2021; 2:29-41. [DOI: 10.3390/jmp2010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024] Open
Abstract
(1) Background: Human papillomaviruses (HPVs) are known to be related to the development of about 5% of all human cancers. The clinical relevance of HPV infection has been deeply investigated in carcinomas of the oropharyngeal area, uterine cervix, and anogenital area. To date, several different methods have been used for detecting HPV infection. The aim of the present study was to compare three different methods for the diagnosis of the presence of the HPV genome. (2) Methods: A total of 50 samples were analyzed. Twenty-five of them were tested using both next generation sequencing (NGS) and VisionArray® technology, the other 25 were tested using Hybrid Capture (HC) II assay and VisionArray® technology. (3) Results: A substantial agreement was obtained using NGS and VisionArray® (κ = 0.802), as well as between HC II and VisionArray® (κ = 0.606). In both analyses, the concordance increased if only high risk HPVs I(HR-HPVs) were considered as “positive”. (4) Conclusions: Our data highlighted the importance of technical choice in HPV characterization, which should be guided by the clinical aims, costs, starting material, and turnaround time for results.
Collapse
Affiliation(s)
- Giorgia Acquaviva
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Michela Visani
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Viviana Sanza
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Antonio De Leo
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Thais Maloberti
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Paola Pierotti
- Anatomic Pathology Unit, Azienda USL-Maggiore Hospital, 40133 Bologna, Italy
| | - Paola Crucitti
- Anatomic Pathology Unit, Azienda USL-Maggiore Hospital, 40133 Bologna, Italy
| | - Guido Collina
- Anatomical Pathology Unit, ASUR Marche, Area Vasta 5, Ospedale “C e G Mazzoni” Ascoli Piceno, 63100 Ascoli Piceno, Italy
| | - Cecilia Chiarelli Olivari
- Molecular Diagnostic Unit, Department of Pharmacy and Biotechnology, University of Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Annalisa Pession
- Molecular Diagnostic Unit, Department of Pharmacy and Biotechnology, University of Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Giovanni Tallini
- Molecular Diagnostic Unit, Department of Medicine (Dipartimento di Medicina Specialistica, Diagnostica e Sperimentale), University of Bologna, Azienda USL di Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| | - Dario de Biase
- Molecular Diagnostic Unit, Department of Pharmacy and Biotechnology, University of Bologna, viale Ercolani 4/2, 40138 Bologna, Italy
| |
Collapse
|
8
|
Characterization and Diversity of 243 Complete Human Papillomavirus Genomes in Cervical Swabs Using Next Generation Sequencing. Viruses 2020; 12:v12121437. [PMID: 33327447 PMCID: PMC7764970 DOI: 10.3390/v12121437] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 12/04/2020] [Accepted: 12/10/2020] [Indexed: 12/21/2022] Open
Abstract
In recent years, next generation sequencing (NGS) technology has been widely used for the discovery of novel human papillomavirus (HPV) genotypes, variant characterization and genotyping. Here, we compared the analytical performance of NGS with a commercial PCR-based assay (Anyplex II HPV28) in cervical samples of 744 women. Overall, HPV positivity was 50.2% by the Anyplex and 45.5% by the NGS. With the NGS, we detected 25 genotypes covered by Anyplex and 41 additional genotypes. Agreement between the two methods for HPV positivity was 80.8% (kappa = 0.616) and 84.8% (kappa = 0.652) for 28 HPV genotypes and 14 high-risk genotypes, respectively. We recovered and characterized 243 complete HPV genomes from 153 samples spanning 40 different genotypes. According to phylogenetic analysis and pairwise distance, we identified novel lineages and sublineages of four high-risk and 16 low-risk genotypes. In total, 17 novel lineages and 14 novel sublineages were proposed, including novel lineages of HPV45, HPV52, HPV66 and a novel sublineage of HPV59. Our study provides important genomic insights on HPV types and lineages, where few complete genomes were publicly available.
Collapse
|
9
|
Yan Y, Zhang H, Jiang C, Ma X, Zhou X, Tian X, Song Y, Chen X, Yu L, Li R, Chen H, Wang X, Liu T, He Z, Li H. Human Papillomavirus Prevalence and Integration Status in Tissue Samples of Bladder Cancer in the Chinese Population. J Infect Dis 2020; 224:114-122. [PMID: 33205207 DOI: 10.1093/infdis/jiaa710] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Accepted: 11/11/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Human papillomavirus (HPV) infection is associated with multiple types of cancer, but the evidence has not yet been fully elucidated in bladder cancer. METHODS Frozen tissue samples collected from 146 patients aged 32 to 89 years with bladder cancer pathological diagnosis between 2015 and 2019 were analyzed. HPV genotyping and integration status determination were performed by capture-based next generation sequencing. Statistical analysis of HPV type distributions was performed according to stage, grade, sex, and age group of patients. RESULTS Mean (SD) age of the 146 patients was 66.64 ± 10.06 years and 83.56% were men. Overall HPV infection rate was 28.77% (37.50% in women and 27.05% in men), with 11.90% HPV integration events. Among them, 17.12% single and 11.65% coinfections were observed. HPV18 (24.66%) was the most prevalent genotype, followed by HPV33, 16, and 39. All HPV were European lineage (A). HPV16 was more prevalent in women (P = .04). CONCLUSIONS HPV infection may contribute to the etiology both in men and women with bladder cancer. HPV18, followed by HPV33, 16, and 39 genotypes, potentially represent the predominant oncogenic risk types for bladder carcinogenesis.
Collapse
Affiliation(s)
- Yongji Yan
- Department of Urology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Hongfeng Zhang
- Department of Pathology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Chunfan Jiang
- Department of Pathology, Xiangyang Central Hospital, Hubei University of Arts and Science, Xiangyang, Hubei, China
| | - Xin Ma
- Department of Urology, General Hospital of the People's Liberation Army, Beijing, China
| | - Xueying Zhou
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xun Tian
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.,Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | | | - Xu Chen
- Department of Urology, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Liyao Yu
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Rui Li
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Hongwei Chen
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xin Wang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Ting Liu
- Academician expert workstation of Obstetrics and Gynecology, Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zhaohui He
- Department of Urology, Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen, Guangdong, China
| | - Hongzhao Li
- Department of Urology, General Hospital of the People's Liberation Army, Beijing, China
| |
Collapse
|
10
|
Bioinformatics Pipeline for Human Papillomavirus Short Read Genomic Sequences Classification Using Support Vector Machine. Viruses 2020; 12:v12070710. [PMID: 32629900 PMCID: PMC7412107 DOI: 10.3390/v12070710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 06/26/2020] [Accepted: 06/27/2020] [Indexed: 12/24/2022] Open
Abstract
We recently developed a test based on the Agilent SureSelect target enrichment system capturing genomic fragments from 191 human papillomaviruses (HPV) types for Illumina sequencing. This enriched whole genome sequencing (eWGS) assay provides an approach to identify all HPV types in a sample. Here we present a machine learning algorithm that calls HPV types based on the eWGS output. The algorithm based on the support vector machine (SVM) technique was trained on eWGS data from 122 control samples with known HPV types. The new algorithm demonstrated good performance in HPV type detection for designed samples with 25 or greater HPV plasmid copies per sample. We compared the results of HPV typing made by the new algorithm for 261 residual epidemiologic samples with the results of the typing delivered by the standard HPV Linear Array (LA). The agreement between methods (97.4%) was substantial (kappa = 0.783). However, the new algorithm identified additionally 428 instances of HPV types not detectable by the LA assay by design. Overall, we have demonstrated that the bioinformatics pipeline is an accurate tool for calling HPV types by analyzing data generated by eWGS processing of DNA fragments extracted from control and epidemiological samples.
Collapse
|