1
|
Tozetto-Mendoza TR, da- Costa AC, Moron AF, Leal É, Lima SH, Ferreira NE, Honorato L, Paião HGO, Freire WS, Mendes-Correa MC, Witkin SS. Characterization of Torquetenovirus in amniotic fluid at the time of in utero fetal surgery: correlation with early premature delivery and respiratory distress. Front Med (Lausanne) 2023; 10:1161091. [PMID: 37547599 PMCID: PMC10400322 DOI: 10.3389/fmed.2023.1161091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/26/2023] [Indexed: 08/08/2023] Open
Abstract
Torquetenovirus (TTV) is a commensal virus present in many healthy individuals. Although considered to be non-pathogenic, its presence and titer have been shown to be indicative of altered immune status in individuals with chronic infections or following allogeneic transplantations. We evaluated if TTV was present in amniotic fluid (AF) at the time of in utero surgery to correct a fetal neurological defect, and whether its detection was predictive of adverse post-surgical parameters. AF was collected from 27 women by needle aspiration prior to a uterine incision. TTV titer in the AF was measured by isolation of viral DNA followed by gene amplification and analysis. The TTV genomes were further characterized and sequenced by metagenomics. Pregnancy outcome parameters were subsequently obtained by chart review. Three of the AFs (11.1%) were positive for TTV at 3.36, 4.16, and 4.19 log10 copies/mL. Analysis of their genomes revealed DNA sequences similar to previously identified TTV isolates. Mean gestational age at delivery was >2 weeks earlier (32.5 vs. 34.6 weeks) and the prevalence of respiratory distress was greater (100% vs. 20.8%) in the TTV-positive pregnancies. TTV detection in AF prior to intrauterine surgery may indicate elevated post-surgical risk for earlier delivery and newborn respiratory distress.
Collapse
Affiliation(s)
- Tania Regina Tozetto-Mendoza
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - A. Charlys da- Costa
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Antonio F. Moron
- Department of Obstetrics, Universidade Federal de São Paulo, São Paulo, Brazil
- Hospital e Maternidade Santa Joana, São Paulo, Brazil
| | - Élcio Leal
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
| | - Silvia Helena Lima
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Noely Evangelista Ferreira
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Layla Honorato
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Heuder Gustavo Oliveira Paião
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Wilton Santos Freire
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
| | - Maria Cássia Mendes-Correa
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
- Departamento de Moléstias Infecciosas e Parasitárias, Universidade de São Paulo, Faculdade de Medicina, São Paulo, Brazil
| | - Steven S. Witkin
- Laboratório de Investigação Médica em Virologia (LIM 52), Faculdade de Medicina da Universidade de São Paulo—Instituto de Medicina Tropical de São Paulo, São Paulo, Brazil
- Departamento de Moléstias Infecciosas e Parasitárias, Universidade de São Paulo, Faculdade de Medicina, São Paulo, Brazil
| |
Collapse
|
2
|
Deif MA, Solyman AAA, Kamarposhti MA, Band SS, Hammam RE. A deep bidirectional recurrent neural network for identification of SARS-CoV-2 from viral genome sequences. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8933-8950. [PMID: 34814329 DOI: 10.3934/mbe.2021440] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this work, Deep Bidirectional Recurrent Neural Networks (BRNNs) models were implemented based on both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells in order to distinguish between genome sequence of SARS-CoV-2 and other Corona Virus strains such as SARS-CoV and MERS-CoV, Common Cold and other Acute Respiratory Infection (ARI) viruses. An investigation of the hyper-parameters including the optimizer type and the number of unit cells, was also performed to attain the best performance of the BRNN models. Results showed that the GRU BRNNs model was able to discriminate between SARS-CoV-2 and other classes of viruses with a higher overall classification accuracy of 96.8% as compared to that of the LSTM BRNNs model having a 95.8% overall classification accuracy. The best hyper-parameters producing the highest performance for both models was obtained when applying the SGD optimizer and an optimum number of unit cells of 80 in both models. This study proved that the proposed GRU BRNN model has a better classification ability for SARS-CoV-2 thus providing an efficient tool to help in containing the disease and achieving better clinical decisions with high precision.
Collapse
Affiliation(s)
- Mohanad A Deif
- Department of Bioelectronics, Modern University of Technology and Information (MTI) University, Cairo 11571, Egypt
| | - Ahmed A A Solyman
- Department of Electrical and Electronics Engineering, Istanbul Gelisim University, Avcılar 34310, Turkey
| | | | - Shahab S Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, 123 University Road, Yunlin 64002, Taiwan
| | - Rania E Hammam
- Department of Bioelectronics, Modern University of Technology and Information (MTI) University, Cairo 11571, Egypt
| |
Collapse
|
3
|
Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. APPL INTELL 2021; 52:3002-3017. [PMID: 34764607 PMCID: PMC8232563 DOI: 10.1007/s10489-021-02572-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 11/27/2022]
Abstract
Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.
Collapse
Affiliation(s)
| | - Raju Bhukya
- National Institute of Technology, Warangal, Telangana 506004 India
| |
Collapse
|
4
|
Lopez-Rincon A, Tonda A, Mendoza-Maldonado L, Mulders DGJC, Molenkamp R, Perez-Romero CA, Claassen E, Garssen J, Kraneveld AD. Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning. Sci Rep 2021; 11:947. [PMID: 33441822 PMCID: PMC7806918 DOI: 10.1038/s41598-020-80363-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/21/2020] [Indexed: 02/07/2023] Open
Abstract
In this paper, deep learning is coupled with explainable artificial intelligence techniques for the discovery of representative genomic sequences in SARS-CoV-2. A convolutional neural network classifier is first trained on 553 sequences from the National Genomics Data Center repository, separating the genome of different virus strains from the Coronavirus family with 98.73% accuracy. The network's behavior is then analyzed, to discover sequences used by the model to identify SARS-CoV-2, ultimately uncovering sequences exclusive to it. The discovered sequences are validated on samples from the National Center for Biotechnology Information and Global Initiative on Sharing All Influenza Data repositories, and are proven to be able to separate SARS-CoV-2 from different virus strains with near-perfect accuracy. Next, one of the sequences is selected to generate a primer set, and tested against other state-of-the-art primer sets, obtaining competitive results. Finally, the primer is synthesized and tested on patient samples (n = 6 previously tested positive), delivering a sensitivity similar to routine diagnostic methods, and 100% specificity. The proposed methodology has a substantial added value over existing methods, as it is able to both automatically identify promising primer sets for a virus from a limited amount of data, and deliver effective results in a minimal amount of time. Considering the possibility of future pandemics, these characteristics are invaluable to promptly create specific detection methods for diagnostics.
Collapse
Affiliation(s)
- Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG, Utrecht, The Netherlands.
| | - Alberto Tonda
- UMR 518 MIA-Paris, INRAE, c/o 113 rue Nationale, 75103, Paris, France
| | - Lucero Mendoza-Maldonado
- Hospital Civil de Guadalajara "Dr. Juan I. Menchaca", Salvador Quevedo y Zubieta 750, Independencia Oriente, C.P. 44340, Guadalajara, Jalisco, México
| | | | - Richard Molenkamp
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Carmina A Perez-Romero
- Departamento de Investigación, Universidad Central de Queretaro (UNICEQ), Av. 5 de Febrero 1602, San Pablo, 76130, Santiago de Querétaro, QRO, Mexico
| | - Eric Claassen
- Athena Institute, Vrije Universiteit, De Boelelaan 1085, 1081 HV, Amsterdam, The Netherlands
| | - Johan Garssen
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG, Utrecht, The Netherlands
- Department Immunology, Danone Nutricia research, Uppsalalaan 12, 3584 CT, Utrecht, The Netherlands
| | - Aletta D Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Universiteitsweg 99, 3584 CG, Utrecht, The Netherlands
| |
Collapse
|
5
|
Constant companion: clinical and developmental aspects of torque teno virus infections. Arch Virol 2020; 165:2749-2757. [PMID: 33040309 DOI: 10.1007/s00705-020-04841-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 09/04/2020] [Indexed: 12/18/2022]
Abstract
Torque teno virus (TTV) is a commensal human virus observed as a circular single-negative-strand DNA molecule in various tissues and biological samples, notably in blood serum and lymphocytes. TTV has no apparent clinical significance, although it might be very useful as a prospective tool for gene delivery or as an epidemiological marker. Human populations are ubiquitously infected with TTV; the prevalence may reach 100%. The majority of babies become spontaneously infected with TTV, so that by the end of the first year of life, the prevalence reaches 'adult' values. TTV positivity in healthy early infancy and the presence of TTV in umbilical cord blood samples have been reported. The mechanism of infection and the dynamics of TTV prevalence in infants with age remain understudied. Meanwhile, the potential diagnostic and prognostic value of TTV as a marker deserves special attention and study, along with the possibility, causes and consequences of placental transmission of TTV under normal or pathological conditions.
Collapse
|
6
|
Shah AA, Wang D, Hirsch E. Nucleic Acid-Based Screening of Maternal Serum to Detect Viruses in Women with Labor or PROM. Reprod Sci 2020; 27:537-544. [PMID: 31925769 DOI: 10.1007/s43032-019-00051-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 06/12/2019] [Indexed: 01/12/2023]
Abstract
The purpose of this study was to determine whether timing of the initiating event of spontaneous labor (either uterine contractions with intact fetal membranes or rupture of membranes prior to labor (PROM)) is associated with maternal viral infection. It was a prospective case control study of women with either spontaneous labor or PROM occurring < 37 weeks' gestation ("cases") or at term ("controls"). An initial unbiased screen for viruses was performed with next-generation sequencing (NGS) in serum pooled from eight cases delivered by C/S and represents a range of gestational ages, membrane rupture status, and presence or absence of chorioamnionitis. Custom PCR was used to query individual patient samples from the original cohort. The NGS screen generated 15 million reads. Seven unique viral sequences were detected in two cases, all identified as torque teno virus (TTV), an ubiquitous DNA anellovirus of no known pathogenicity. Using nested and semi-nested PCR, sera from 72 patients (47 cases and 25 matched controls, stratified by ROM status) were screened for the 3 subtypes of anelloviruses (TTV, TTMDV, or TTMV). These were found in 43/47 cases (91%) and 16/25 controls (64%) (p = 0.012, OR = 5.9 (95% CI = 1.4-29.9)). In logistic regression, pregnant women with at least one type of anellovirus were more likely to experience preterm labor than those with no anellovirus (p = 0.03, aOR = 4.6, CI = 1.2-18.7). Among women experiencing a spontaneous initiating event of labor, TTV virus was more likely to be present in the serum of preterm than term patients. TTV may have a role in determining the timing of parturition.
Collapse
Affiliation(s)
- Ankit A Shah
- Department of Obstetrics and Gynecology, NorthShore University Health System, 2650 Ridge Ave, Evanston, IL, USA.,Department of Obstetrics and Gynecology, Pritzker School of Medicine, University of Chicago, 5801 S Ellis Ave, Chicago, IL, 60637, USA
| | - David Wang
- Departments of Molecular Microbiology and Pathology & Immunology, Washington University School of Medicine, 660 S Euclid Ave, St. Louis, MO, 63110, USA
| | - Emmet Hirsch
- Department of Obstetrics and Gynecology, NorthShore University Health System, 2650 Ridge Ave, Evanston, IL, USA. .,Department of Obstetrics and Gynecology, Pritzker School of Medicine, University of Chicago, 5801 S Ellis Ave, Chicago, IL, 60637, USA.
| |
Collapse
|
7
|
Tampuu A, Bzhalava Z, Dillner J, Vicente R. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS One 2019; 14:e0222271. [PMID: 31509583 PMCID: PMC6738585 DOI: 10.1371/journal.pone.0222271] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 08/22/2019] [Indexed: 11/23/2022] Open
Abstract
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as "unknown" since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as "unknown" by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.
Collapse
Affiliation(s)
- Ardi Tampuu
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
- Karolinska University Laboratory, Karolinska University Hospital, Stockholm, Sweden
| | - Raul Vicente
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
8
|
Abbas AA, Young JC, Clarke EL, Diamond JM, Imai I, Haas AR, Cantu E, Lederer DJ, Meyer K, Milewski RK, Olthoff KM, Shaked A, Christie JD, Bushman FD, Collman RG. Bidirectional transfer of Anelloviridae lineages between graft and host during lung transplantation. Am J Transplant 2019; 19:1086-1097. [PMID: 30203917 PMCID: PMC6411461 DOI: 10.1111/ajt.15116] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 09/05/2018] [Accepted: 09/05/2018] [Indexed: 01/25/2023]
Abstract
Solid organ transplantation disrupts virus-host relationships, potentially resulting in viral transfer from donor to recipient, reactivation of latent viruses, and new viral infections. Viral transfer, colonization, and reactivation are typically monitored using assays for specific viruses, leaving the behavior of full viral populations (the "virome") understudied. Here we sought to investigate the temporal behavior of viruses from donor lungs and transplant recipients comprehensively. We interrogated the bronchoalveolar lavage and blood viromes during the peritransplant period and 6-16 months posttransplant in 13 donor-recipient pairs using shotgun metagenomic sequencing. Anelloviridae, ubiquitous human commensal viruses, were the most abundant human viruses identified. Herpesviruses, parvoviruses, polyomaviruses, and bacteriophages were also detected. Anelloviridae populations were complex, with some donor organs and hosts harboring multiple contemporaneous lineages. We identified transfer of Anelloviridae lineages from donor organ to recipient serum in 4 of 7 cases that could be queried, and immigration of lineages from recipient serum into the allograft in 6 of 10 such cases. Thus, metagenomic analyses revealed that viral populations move between graft and host in both directions, showing that organ transplantation involves implantation of both the allograft and commensal viral communities.
Collapse
Affiliation(s)
- A. A. Abbas
- Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - J. C. Young
- Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - E. L. Clarke
- Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - J. M. Diamond
- Pulmonary, Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA,Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - I Imai
- Pulmonary, Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - A. R. Haas
- Pulmonary, Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - E. Cantu
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - D. J. Lederer
- Departments of Medicine and Epidemiology, College of Physicians and Surgeons, Columbia University, New York, NY
| | - K. Meyer
- School of Medicine and Public Health, University of Wisconsin, Madison, WI
| | - R. K. Milewski
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - K. M. Olthoff
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - A. Shaked
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - J. D. Christie
- Pulmonary, Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA,Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - F. D. Bushman
- Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - R. G. Collman
- Department of Microbiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA,Pulmonary, Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| |
Collapse
|
9
|
Maarala AI, Bzhalava Z, Dillner J, Heljanko K, Bzhalava D. ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads. Bioinformatics 2018; 34:928-935. [PMID: 29106455 DOI: 10.1093/bioinformatics/btx702] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 11/01/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Next Generation Sequencing (NGS) technology enables identification of microbial genomes from massive amount of human microbiomes more rapidly and cheaper than ever before. However, the traditional sequential genome analysis algorithms, tools, and platforms are inefficient for performing large-scale metagenomic studies on ever-growing sample data volumes. Currently, there is an urgent need for scalable analysis pipelines that enable harnessing all the power of parallel computation in computing clusters and in cloud computing environments. We propose ViraPipe, a scalable metagenome analysis pipeline that is able to analyze thousands of human microbiomes in parallel in tolerable time. The pipeline is tuned for analyzing viral metagenomes and the software is applicable for other metagenomic analyses as well. ViraPipe integrates parallel BWA-MEM read aligner, MegaHit De novo assembler, and BLAST and HMMER3 sequence search tools. We show the scalability of ViraPipe by running experiments on mining virus related genomes from NGS datasets in a distributed Spark computing cluster. Results ViraPipe analyses 768 human samples in 210 minutes on a Spark computing cluster comprising 23 nodes and 1288 cores in total. The speedup of ViraPipe executed on 23 nodes was 11x compared to the sequential analysis pipeline executed on a single node. The whole process includes parallel decompression, read interleaving, BWA-MEM read alignment, filtering and normalizing of non-human reads, De novo contigs assembling, and searching of sequences with BLAST and HMMER3 tools. Contact ilari.maarala@aalto.fi. Availability and implementation https://github.com/NGSeq/ViraPipe.
Collapse
Affiliation(s)
- Altti Ilari Maarala
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Keijo Heljanko
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Davit Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
10
|
Machine Learning for detection of viral sequences in human metagenomic datasets. BMC Bioinformatics 2018; 19:336. [PMID: 30249176 PMCID: PMC6154907 DOI: 10.1186/s12859-018-2340-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 08/28/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of highly divergent or yet unknown viruses from metagenomics sequencing datasets is a major bioinformatics challenge. When human samples are sequenced, a large proportion of assembled contigs are classified as "unknown", as conventional methods find no similarity to known sequences. We wished to explore whether machine learning algorithms using Relative Synonymous Codon Usage frequency (RSCU) could improve the detection of viral sequences in metagenomic sequencing data. RESULTS We trained Random Forest and Artificial Neural Network using metagenomic sequences taxonomically classified into virus and non-virus classes. The algorithms achieved accuracies well beyond chance level, with area under ROC curve 0.79. Two codons (TCG and CGC) were found to have a particularly strong discriminative capacity. CONCLUSION RSCU-based machine learning techniques applied to metagenomic sequencing data can help identify a large number of putative viral sequences and provide an addition to conventional methods for taxonomic classification.
Collapse
|
11
|
Arroyo Mühr LS, Hortlund M, Bzhalava Z, Nordqvist Kleppe S, Bzhalava D, Hultin E, Dillner J. Viruses in case series of tumors: Consistent presence in different cancers in the same subject. PLoS One 2017; 12:e0172308. [PMID: 28257474 PMCID: PMC5336194 DOI: 10.1371/journal.pone.0172308] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 02/02/2017] [Indexed: 12/20/2022] Open
Abstract
Studies investigating presence of viruses in cancer often analyze case series of cancers, resulting in detection of many viruses that are not etiologically linked to the tumors where they are found. The incidence of virus-associated cancers is greatly increased in immunocompromised individuals. Non-melanoma skin cancer (NMSC) is also greatly increased and a variety of viruses have been detected in NMSC. As immunosuppressed patients often develop multiple independent NMSCs, we reasoned that viruses consistently present in independent tumors might be more likely to be involved in tumorigenesis. We sequenced 8 different NMSCs from 1 patient in comparison to 8 different NMSCs from 8 different patients. Among the latter, 12 different virus sequences were detected, but none in more than 1 tumor each. In contrast, the patient with multiple NMSCs had human papillomavirus type 15 and type 38 present in 6 out of 8 NMSCs.
Collapse
Affiliation(s)
- Laila Sara Arroyo Mühr
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Maria Hortlund
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Zurab Bzhalava
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Sara Nordqvist Kleppe
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Davit Bzhalava
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Emilie Hultin
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
12
|
Virome characterisation from Guthrie cards in children who later developed acute lymphoblastic leukaemia. Br J Cancer 2016; 115:1008-1014. [PMID: 27552439 PMCID: PMC5061901 DOI: 10.1038/bjc.2016.261] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Revised: 07/22/2016] [Accepted: 07/29/2016] [Indexed: 12/17/2022] Open
Abstract
Background: Some childhood acute lymphoblastic leukaemias (ALL) can be traced back to a prenatal origin, where a virus infection could be involved in the first pre-leukaemic clone development. The DNA virome of 95 children who later developed ALL was characterised from neonatal blood spots (NBS) using unbiased next-generation sequencing (NGS) and compared with the virome of 95 non-ALL controls. Methods: DNA was individually extracted from the ALL-patients and controls, pooled, randomly amplified and sequenced using the Illumina MiSeq Sequencing System. Results: Virus-like sequences identified in both groups mapped to human endogenous retroviruses and propionibacterium phage, considered a part of the normal microbial flora. Potential pathogens human herpesvirus type 6 (HHV-6) and parvovirus B19 were also identified, but only few samples in both ALL and controls tested positive by PCR follow-up. Conclusions: Unbiased NGS was employed to search for DNA from potential infectious agents in neonatal samples of children who later developed ALL. Although several viral candidates were identified in the NBS samples, further investigation by PCR suggested that these viruses did not have a major role in ALL development.
Collapse
|
13
|
Abstract
We tested prostatic secretions from men with and without prostate cancer (13 cases and 13 matched controls) or prostatitis (18 cases and 18 matched controls) with metagenomic sequencing. A large number (>200) of viral reads was only detected among four prostate cancer cases (1 patient each positive for Merkel cell polyomavirus, JC polyomavirus and Human Papillomavirus types 89 or 40, respectively). Lower numbers of reads from a large variety of viruses were detected in all patient groups. Our knowledge of the biology of the prostate may be furthered by the fact that DNA viruses are commonly shed from the prostate and can be readily detected by metagenomic sequencing of expressed prostate secretions.
Collapse
|
14
|
Perlejewski K, Popiel M, Laskus T, Nakamura S, Motooka D, Stokowy T, Lipowski D, Pollak A, Lechowicz U, Caraballo Cortés K, Stępień A, Radkowski M, Bukowska-Ośko I. Next-generation sequencing (NGS) in the identification of encephalitis-causing viruses: Unexpected detection of human herpesvirus 1 while searching for RNA pathogens. J Virol Methods 2015; 226:1-6. [PMID: 26424618 DOI: 10.1016/j.jviromet.2015.09.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/24/2015] [Accepted: 09/24/2015] [Indexed: 11/18/2022]
Abstract
BACKGROUND Encephalitis is a severe neurological syndrome usually caused by viruses. Despite significant progress in diagnostic techniques, the causative agent remains unidentified in the majority of cases. The aim of the present study was to test an alternative approach for the detection of putative pathogens in encephalitis using next-generation sequencing (NGS). METHODS RNA was extracted from cerebrospinal fluid (CSF) from a 60-year-old male patient with encephalitis and subjected to isothermal linear nucleic acid amplification (Ribo-SPIA, NuGen) followed by next-generation sequencing using MiSeq (Illumina) system and metagenomics data analysis. RESULTS The sequencing run yielded 1,578,856 reads overall and 2579 reads matched human herpesvirus I (HHV-1) genome; the presence of this pathogen in CSF was confirmed by specific PCR. In subsequent experiments we found that the DNAse I treatment, while lowering the background of host-derived sequences, lowered the number of detectable HHV-1 sequences by a factor of 4. Furthermore, we found that the routine extraction of total RNA by the Chomczynski method could be used for identification of both DNA and RNA pathogens in typical clinical settings, as it results in retention of a significant amount of DNA. CONCLUSION In summary, it seems that NGS preceded by nucleic acid amplification could supplement currently used diagnostic methods in encephalitis.
Collapse
Affiliation(s)
- Karol Perlejewski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Marta Popiel
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland; Postgraduate School of Molecular Medicine, 61 Żwirki i Wigury Street, 02-091 Warsaw, Poland.
| | - Tomasz Laskus
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Shota Nakamura
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University 3-1 Yamadaoka, Suita-City, Osaka, Japan.
| | - Daisuke Motooka
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University 3-1 Yamadaoka, Suita-City, Osaka, Japan.
| | - Tomasz Stokowy
- Department of Clinical Science, University of Bergen, 5021 Bergen, Norway.
| | - Dariusz Lipowski
- Municipal Hospital for Infectious Diseases, 37 Wolska Street, 01-201 Warsaw, Poland.
| | - Agnieszka Pollak
- Department of Genetics, Institute of Physiology and Pathology of Hearing, Mochnackiego 10, 02-042 Warsaw, Poland.
| | - Urszula Lechowicz
- Department of Genetics, Institute of Physiology and Pathology of Hearing, Mochnackiego 10, 02-042 Warsaw, Poland.
| | - Kamila Caraballo Cortés
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Adam Stępień
- Department of Neurology, Military Institute of Medicine, 128 Szaserów Street, 04-141 Warsaw, Poland.
| | - Marek Radkowski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Iwona Bukowska-Ośko
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| |
Collapse
|
15
|
Bzhalava D, Hultin E, Arroyo Mühr LS, Ekström J, Lehtinen M, de Villiers EM, Dillner J. Viremia during pregnancy and risk of childhood leukemia and lymphomas in the offspring: Nested case-control study. Int J Cancer 2015; 138:2212-20. [PMID: 26132655 DOI: 10.1002/ijc.29666] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 05/24/2015] [Accepted: 06/01/2015] [Indexed: 01/29/2023]
Abstract
A possible role for infections of the pregnant mother in the development of childhood acute leukemias and lymphomas has been suggested. However, no specific infectious agent has been identified. Offspring of 74,000 mothers who had serum samples taken during pregnancy and stored in a large-scale biobank were followed up to the age of 15 years (750,000 person years) through over-generation linkages between the biobank files, the Swedish national population and cancer registers to identify incident leukemia/lymphoma cases in the offspring. First-trimester sera from mothers of 47 cases and 47 matched controls were retrieved and analyzed using next generation sequencing. Anelloviruses were the most common viruses detected, found in 37/47 cases and in 40/47 controls, respectively (OR: 0.6, 95% CI: 0.2-1.9). None of the detected viruses was associated with leukemia/lymphoma in the offspring. Viremia during pregnancy was common, but no association with leukemia/lymphoma risk in the offspring was found.
Collapse
Affiliation(s)
- Davit Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, SE-141 86, Sweden
| | - Emilie Hultin
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, SE-141 86, Sweden
| | | | - Johanna Ekström
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Matti Lehtinen
- National Institute for Health and Welfare, Oulu, Finland
| | - Ethel-Michele de Villiers
- Abteilung Tumorvirus-Charakterisierung, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, SE-141 86, Sweden.,Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, SE-171 77, Sweden
| |
Collapse
|
16
|
Kocjan BJ, Bzhalava D, Forslund O, Dillner J, Poljak M. Molecular methods for identification and characterization of novel papillomaviruses. Clin Microbiol Infect 2015; 21:808-16. [PMID: 26003284 DOI: 10.1016/j.cmi.2015.05.011] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 04/27/2015] [Accepted: 05/12/2015] [Indexed: 02/02/2023]
Abstract
Papillomaviruses (PV) are a remarkably heterogeneous family of small DNA viruses that infect a wide variety of vertebrate species and are aetiologically linked with the development of various neoplastic changes of the skin and mucosal epithelia. Based on nucleotide similarity, PVs are hierarchically classified into genera, species and types. Novel human PV (HPV) types are given a unique number only after the whole genome has been cloned and deposited with the International HPV Reference Center. As of 9 March 2015, 200 different HPV types, belonging to 49 species, had been recognized by the International HPV Reference Center. In addition, 131 animal PV types identified from 66 different animal species exist. Recent advances in molecular techniques have resulted in an explosive increase in the identification of novel HPV types and novel subgenomic HPV sequences in the last few years. Among PV genera, the γ-PV genus has been growing most rapidly in recent years with 80 completely sequenced HPV types, followed by α-PV and β-PV genera that have 65 and 51 recognized HPV types, respectively. We reviewed in detail the contemporary molecular methods most often used for identification and characterization of novel PV types, including PCR, rolling circle amplification and next-generation sequencing. Furthermore, we present a short overview of 12 and 10 novel HPV types recently identified in Sweden and Slovenia, respectively. Finally, an update on the International Human Papillomavirus Reference Center is provided.
Collapse
Affiliation(s)
- B J Kocjan
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Slovenia
| | - D Bzhalava
- International Human Papillomavirus Reference Center, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - O Forslund
- Department of Laboratory Medicine, Lund University, Malmö, Sweden
| | - J Dillner
- International Human Papillomavirus Reference Center, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - M Poljak
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Slovenia.
| |
Collapse
|
17
|
Leblanc D, Houde A, Gagné MJ, Plante D, Bellon-Gagnon P, Jones TH, Muehlhauser V, Wilhelm B, Avery B, Janecko N, Brassard J. Presence, viral load and characterization of Torque teno sus viruses in liver and pork chop samples at retail. Int J Food Microbiol 2014; 178:60-4. [DOI: 10.1016/j.ijfoodmicro.2014.03.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Revised: 02/24/2014] [Accepted: 03/03/2014] [Indexed: 11/15/2022]
|
18
|
Smelov V, Arroyo Mühr LS, Bzhalava D, Brown LJ, Komyakov B, Dillner J. Metagenomic sequencing of expressed prostate secretions. J Med Virol 2014; 86:2042-8. [DOI: 10.1002/jmv.23900] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/21/2014] [Indexed: 11/11/2022]
Affiliation(s)
- Vitaly Smelov
- Department of Laboratory Medicine; Karolinska Institutet; Stockholm Sweden
- Department of Urology and Andrology; North-Western State Medical University Named After I.I. Mechnikov; St. Petersburg Russia
- St. Petersburg State University Outpatient Clinic; St. Petersburg Russia
| | | | - Davit Bzhalava
- Department of Laboratory Medicine; Karolinska Institutet; Stockholm Sweden
| | | | - Boris Komyakov
- Department of Urology and Andrology; North-Western State Medical University Named After I.I. Mechnikov; St. Petersburg Russia
| | - Joakim Dillner
- Department of Laboratory Medicine; Karolinska Institutet; Stockholm Sweden
| |
Collapse
|
19
|
Bzhalava D, Johansson H, Ekström J, Faust H, Möller B, Eklund C, Nordin P, Stenquist B, Paoli J, Persson B, Forslund O, Dillner J. Unbiased approach for virus detection in skin lesions. PLoS One 2013; 8:e65953. [PMID: 23840382 PMCID: PMC3696016 DOI: 10.1371/journal.pone.0065953] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 05/01/2013] [Indexed: 01/28/2023] Open
Abstract
To assess presence of virus DNA in skin lesions, swab samples from 82 squamous cell carcinomas of the skin (SCCs), 60 actinic keratoses (AKs), paraffin-embedded biopsies from 28 SCCs and 72 kerathoacanthomas (KAs) and fresh-frozen biopsies from 92 KAs, 85 SCCs and 92 AKs were analyzed by high throughput sequencing (HTS) using 454 or Ion Torrent technology. We found total of 4,284 viral reads, out of which 4,168 were Human Papillomavirus (HPV)-related, belonging to 15 known (HPV8, HPV12, HPV20, HPV36, HPV38, HPV45, HPV57, HPV59, HPV104, HPV105, HPV107, HPV109, HPV124, HPV138, HPV147), four previously described putative (HPV 915 F 06 007 FD1, FA73, FA101, SE42) and two putatively new HPV types (SE46, SE47). SE42 was cloned, sequenced, designated as HPV155 and found to have 76% similarity to the most closely related known HPV type. In conclusion, an unbiased approach for viral DNA detection in skin tumors has found that, although some new putative HPVs were found, known HPV types constituted most of the viral DNA.
Collapse
Affiliation(s)
- Davit Bzhalava
- Departments of Laboratory Medicine, Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Departments of Clinical Microbiology and Pathology, Karolinska Hospital, Stockholm, Sweden
| | - Hanna Johansson
- Department of Medical Microbiology, Skåne University Hospital, Lund University, Malmö, Sweden
| | - Johanna Ekström
- Departments of Laboratory Medicine, Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Departments of Clinical Microbiology and Pathology, Karolinska Hospital, Stockholm, Sweden
- Department of Medical Microbiology, Skåne University Hospital, Lund University, Malmö, Sweden
| | - Helena Faust
- Department of Medical Microbiology, Skåne University Hospital, Lund University, Malmö, Sweden
| | - Birgitta Möller
- Departments of Laboratory Medicine, Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Departments of Clinical Microbiology and Pathology, Karolinska Hospital, Stockholm, Sweden
| | - Carina Eklund
- Departments of Laboratory Medicine, Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Departments of Clinical Microbiology and Pathology, Karolinska Hospital, Stockholm, Sweden
| | - Peter Nordin
- Dermatology Clinic, Läkarhuset, Gothenburg, Sweden
| | - Bo Stenquist
- Department of Dermatology and Venereology, Sahlgrenska University Hospital, Institute of Clinical Sciences at the Sahlgrenska Academy, University of Gothenburg, Sweden
| | - John Paoli
- Department of Dermatology and Venereology, Sahlgrenska University Hospital, Institute of Clinical Sciences at the Sahlgrenska Academy, University of Gothenburg, Sweden
| | - Bengt Persson
- IFM Bioinformatics and Swedish e-Science Research Centre, Linköping University, Linköping, Sweden
| | - Ola Forslund
- Department of Medical Microbiology, Skåne University Hospital, Lund University, Malmö, Sweden
| | - Joakim Dillner
- Departments of Laboratory Medicine, Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Departments of Clinical Microbiology and Pathology, Karolinska Hospital, Stockholm, Sweden
- Department of Medical Microbiology, Skåne University Hospital, Lund University, Malmö, Sweden
- * E-mail:
| |
Collapse
|