1
|
Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. APPL INTELL 2021; 52:3002-3017. [PMID: 34764607 PMCID: PMC8232563 DOI: 10.1007/s10489-021-02572-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 11/27/2022]
Abstract
Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.
Collapse
Affiliation(s)
| | - Raju Bhukya
- National Institute of Technology, Warangal, Telangana 506004 India
| |
Collapse
|
2
|
Kraberger S, Mastroeni D, Delvaux E, Varsani A. Genome Sequences of Novel Torque Teno Viruses Identified in Human Brain Tissue. Microbiol Resour Announc 2020; 9:e00924-20. [PMID: 32912920 PMCID: PMC7484079 DOI: 10.1128/mra.00924-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 08/19/2020] [Indexed: 11/24/2022] Open
Abstract
Complete genome sequences of two novel torque teno viruses (TTVs) were identified in human brain tissue. These sequences are 3,245 nucleotides (nt) and 2,900 nt long and share 68% and 72% open reading frame 1 (ORF1) identity, respectively, with other human TTVs. This report extends the identification of TTV sequences in the brain.
Collapse
Affiliation(s)
- Simona Kraberger
- The Biodesign Center for Fundamental and Applied Microbiomics, School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Diego Mastroeni
- The Biodesign ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, Arizona, USA
- School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Elaine Delvaux
- The Biodesign ASU-Banner Neurodegenerative Disease Research Center, Arizona State University, Tempe, Arizona, USA
| | - Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, School of Life Sciences, Arizona State University, Tempe, Arizona, USA
- School of Life Sciences, Arizona State University, Tempe, Arizona, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA
- Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town Observatory, Cape Town, South Africa
| |
Collapse
|
3
|
Maternal Infection in Pregnancy and Childhood Leukemia: A Systematic Review and Meta-analysis. J Pediatr 2020; 217:98-109.e8. [PMID: 31810630 PMCID: PMC7605597 DOI: 10.1016/j.jpeds.2019.10.046] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 09/13/2019] [Accepted: 10/17/2019] [Indexed: 01/22/2023]
Abstract
OBJECTIVE To summarize the published evidence regarding the association between maternal infection during pregnancy and childhood leukemia. STUDY DESIGN In this systematic review and meta-analysis (PROSPERO number, CRD42018087289), we searched PubMed and Embase to identify relevant studies. We included human studies that reported associations of at least one measure of maternal infection during pregnancy with acute lymphoblastic leukemia (ALL) or all childhood leukemias in the offspring. One reviewer extracted the data first using a standardized form, and the second reviewer independently checked the data for accuracy. Two reviewers used the Newcastle-Ottawa Scale to assess the quality of included studies. We conducted random effects meta-analyses to pool the ORs of specific type of infection on ALL and childhood leukemia. RESULTS This review included 20 studies (ALL, n = 15; childhood leukemia, n = 14) reported in 32 articles. Most (>65%) included studies reported a positive association between infection variables and ALL or childhood leukemia. Among specific types of infection, we found that influenza during pregnancy was associated with higher risk of ALL (pooled OR, 3.64; 95% CI, 1.34-9.90) and childhood leukemia (pooled OR, 1.77; 95% CI, 1.01-3.11). Varicella (pooled OR, 10.19; 95% CI, 1.98-52.39) and rubella (pooled OR, 2.79; 95% CI, 1.16-6.71) infections were also associated with higher childhood leukemia risk. CONCLUSIONS Our findings suggest that maternal infection during pregnancy may be associated with a higher risk of childhood leukemia.
Collapse
|
4
|
Tampuu A, Bzhalava Z, Dillner J, Vicente R. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS One 2019; 14:e0222271. [PMID: 31509583 PMCID: PMC6738585 DOI: 10.1371/journal.pone.0222271] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 08/22/2019] [Indexed: 11/23/2022] Open
Abstract
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as "unknown" since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as "unknown" by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.
Collapse
Affiliation(s)
- Ardi Tampuu
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
- Karolinska University Laboratory, Karolinska University Hospital, Stockholm, Sweden
| | - Raul Vicente
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
5
|
Machine Learning for detection of viral sequences in human metagenomic datasets. BMC Bioinformatics 2018; 19:336. [PMID: 30249176 PMCID: PMC6154907 DOI: 10.1186/s12859-018-2340-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 08/28/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of highly divergent or yet unknown viruses from metagenomics sequencing datasets is a major bioinformatics challenge. When human samples are sequenced, a large proportion of assembled contigs are classified as "unknown", as conventional methods find no similarity to known sequences. We wished to explore whether machine learning algorithms using Relative Synonymous Codon Usage frequency (RSCU) could improve the detection of viral sequences in metagenomic sequencing data. RESULTS We trained Random Forest and Artificial Neural Network using metagenomic sequences taxonomically classified into virus and non-virus classes. The algorithms achieved accuracies well beyond chance level, with area under ROC curve 0.79. Two codons (TCG and CGC) were found to have a particularly strong discriminative capacity. CONCLUSION RSCU-based machine learning techniques applied to metagenomic sequencing data can help identify a large number of putative viral sequences and provide an addition to conventional methods for taxonomic classification.
Collapse
|
6
|
Hultin E, Mühr LSA, Bzhalava Z, Hortlund M, Lagheden C, Sundström P, Dillner J. Viremia preceding multiple sclerosis: Two nested case-control studies. Virology 2018; 520:21-29. [PMID: 29772404 DOI: 10.1016/j.virol.2018.04.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 03/26/2018] [Accepted: 04/10/2018] [Indexed: 11/28/2022]
Abstract
Infections have been suggested to be involved in Multiple Sclerosis (MS). We used metagenomic sequencing to detect both known and yet unknown microorganisms in 2 nested case control studies of MS. Two different cohorts were followed for MS using registry linkages. Serum samples taken before diagnosis as well as samples from matched control subjects were selected. In cohort1 with 75 cases and 75 controls, most viral reads were Anelloviridae-related and >95% detected among the cases. Among samples taken up to 2 years before MS diagnosis, Anellovirus species TTMV1, TTMV6 and TTV27 were significantly more common among cases. In cohort2, 93 cases and 93 controls were tested under the pre-specified hypothesis that the same association would be found. Although most viral reads were again related to Anelloviridae, no significant case-control differences were seen. We conclude that the Anelloviridae-MS association may be due to multiple hypothesis testing, but other explanations are possible.
Collapse
Affiliation(s)
- Emilie Hultin
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | | | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Maria Hortlund
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Camilla Lagheden
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Peter Sundström
- Department of Pharmacology and Clinical Neuroscience, Umeå University, Umeå SE-901 87, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden.
| |
Collapse
|
7
|
Bzhalava Z, Hultin E, Dillner J. Extension of the viral ecology in humans using viral profile hidden Markov models. PLoS One 2018; 13:e0190938. [PMID: 29351302 PMCID: PMC5774701 DOI: 10.1371/journal.pone.0190938] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 12/23/2017] [Indexed: 11/18/2022] Open
Abstract
When human samples are sequenced, many assembled contigs are "unknown", as conventional alignments find no similarity to known sequences. Hidden Markov models (HMM) exploit the positions of specific nucleotides in protein-encoding codons in various microbes. The algorithm HMMER3 implements HMM using a reference set of sequences encoding viral proteins, "vFam". We used HMMER3 analysis of "unknown" human sample-derived sequences and identified 510 contigs distantly related to viruses (Anelloviridae (n = 1), Baculoviridae (n = 34), Circoviridae (n = 35), Caulimoviridae (n = 3), Closteroviridae (n = 5), Geminiviridae (n = 21), Herpesviridae (n = 10), Iridoviridae (n = 12), Marseillevirus (n = 26), Mimiviridae (n = 80), Phycodnaviridae (n = 165), Poxviridae (n = 23), Retroviridae (n = 6) and 89 contigs related to described viruses not yet assigned to any taxonomic family). In summary, we find that analysis using the HMMER3 algorithm and the "vFam" database greatly extended the detection of viruses in biospecimens from humans.
Collapse
Affiliation(s)
- Zurab Bzhalava
- Dept. of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Emilie Hultin
- Dept. of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Dept. of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
8
|
Arellano-Galindo J, Barrera AP, Jiménez-Hernández E, Zavala-Vega S, Campos-Valdéz G, Xicohtencatl-Cortes J, Ochoa SA, Cruz-Córdova A, Crisóstomo-Vázquez MDP, Fernández-Macías JC, Mejía-Aranguré JM. Infectious Agents in Childhood Leukemia. Arch Med Res 2017; 48:305-313. [PMID: 29157671 DOI: 10.1016/j.arcmed.2017.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 09/18/2017] [Indexed: 11/26/2022]
Abstract
Acute leukemia is the most common pediatric cancer, representing one-third of all cancers that occurs in under 15 year olds, with a varied incidence worldwide. Although a number of advances have increased the knowledge of leukemia pathophysiology, its etiology remains less well understood. The role of infectious agents, such as viruses, bacteria, or parasites, in the pathogenesis of leukemia has been discussed. To date, several cellular mechanisms involving infectious agents have been proposed to cause leukemia following infections. However, although leukemia can be triggered by contact with such agents, they can also be beneficial in developing immune stimulation and protection despite the risk of leukemic clones. In this review, we analyze the proposed hypotheses concerning how infectious agents may play a role in the origin and development of leukemia, as well as in a possible mechanism of protection following infections. We review reported clinical observations associated with vaccination or breastfeeding, that support hypotheses such as early life exposure and the resulting early immune stimulation that lead to protection.
Collapse
Affiliation(s)
- José Arellano-Galindo
- Área de Virología, Laboratorio de Infectología, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Alberto Parra Barrera
- Laboratorio de Cáncer y Hematopoyesis, Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Ciudad de México, México
| | - Elva Jiménez-Hernández
- Departamento de Hematología Pediátrica, Unidad Médica de Alta Especialidad, Centro Médico Nacional la Raza, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Sergio Zavala-Vega
- Área de Virología, Laboratorio de Infectología, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Guillermina Campos-Valdéz
- Área de Virología, Laboratorio de Infectología, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Juan Xicohtencatl-Cortes
- Laboratorio de Investigación en Bacteriología Intestinal, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Sara A Ochoa
- Laboratorio de Investigación en Bacteriología Intestinal, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Ariadnna Cruz-Córdova
- Laboratorio de Investigación en Bacteriología Intestinal, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | | | - Juan Carlos Fernández-Macías
- Área de Virología, Laboratorio de Infectología, Hospital Infantil de México Federico Gómez, Ciudad de México, México
| | - Juan Manuel Mejía-Aranguré
- Unidad de Investigación en Epidemiología Clínica, Unidad Médica de Alta Especialidad, Hospital de Pediatría, Ciudad de México, México; Coordinación de Investigación en Salud, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México.
| |
Collapse
|
9
|
Gonzales-Gustavson E, Timoneda N, Fernandez-Cassi X, Caballero A, Abril JF, Buti M, Rodriguez-Frias F, Girones R. Identification of sapovirus GV.2, astrovirus VA3 and novel anelloviruses in serum from patients with acute hepatitis of unknown aetiology. PLoS One 2017; 12:e0185911. [PMID: 28982120 PMCID: PMC5628893 DOI: 10.1371/journal.pone.0185911] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 09/21/2017] [Indexed: 12/14/2022] Open
Abstract
Hepatitis is a general term meaning inflammation of the liver, which can be caused by a variety of viruses. However, a substantial number of cases remain with unknown aetiology. We analysed the serum of patients with clinical signs of hepatitis using a metagenomics approach to characterize their viral species composition. Four pools of patients with hepatitis without identified aetiological agents were evaluated. Additionally, one pool of patients with hepatitis E (HEV) and pools of healthy volunteers were included as controls. A high diversity of anelloviruses, including novel sequences, was found in pools from patients with hepatitis of unknown aetiology. Moreover, viruses recently associated with gastroenteritis as sapovirus GV.2 and astrovirus VA3 were also detected only in those pools. Besides, most of the HEV genome was recovered from the HEV pool. Finally, GB virus C and human endogenous retrovirus were found in the HEV and healthy pools. Our study provides an overview of the virome in serum from hepatitis patients suggesting a potential role of these viruses not previously described in cases of hepatitis. However, further epidemiologic studies are necessary to confirm their contribution to the development of hepatitis.
Collapse
Affiliation(s)
- Eloy Gonzales-Gustavson
- Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
| | - N. Timoneda
- Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
- Computational Genomics Lab, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
- Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Catalonia, Spain
| | - X. Fernandez-Cassi
- Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
| | - A. Caballero
- Hospital Universitari Vall d’Hebron and CIBEREHD del Instituto Carlos III, Barcelona, Catalonia, Spain
| | - J. F. Abril
- Computational Genomics Lab, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
- Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Catalonia, Spain
| | - M. Buti
- Hospital Universitari Vall d’Hebron and CIBEREHD del Instituto Carlos III, Barcelona, Catalonia, Spain
| | - F. Rodriguez-Frias
- Hospital Universitari Vall d’Hebron and CIBEREHD del Instituto Carlos III, Barcelona, Catalonia, Spain
| | - R. Girones
- Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
- * E-mail:
| |
Collapse
|
10
|
Arroyo Mühr LS, Hortlund M, Bzhalava Z, Nordqvist Kleppe S, Bzhalava D, Hultin E, Dillner J. Viruses in case series of tumors: Consistent presence in different cancers in the same subject. PLoS One 2017; 12:e0172308. [PMID: 28257474 PMCID: PMC5336194 DOI: 10.1371/journal.pone.0172308] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 02/02/2017] [Indexed: 12/20/2022] Open
Abstract
Studies investigating presence of viruses in cancer often analyze case series of cancers, resulting in detection of many viruses that are not etiologically linked to the tumors where they are found. The incidence of virus-associated cancers is greatly increased in immunocompromised individuals. Non-melanoma skin cancer (NMSC) is also greatly increased and a variety of viruses have been detected in NMSC. As immunosuppressed patients often develop multiple independent NMSCs, we reasoned that viruses consistently present in independent tumors might be more likely to be involved in tumorigenesis. We sequenced 8 different NMSCs from 1 patient in comparison to 8 different NMSCs from 8 different patients. Among the latter, 12 different virus sequences were detected, but none in more than 1 tumor each. In contrast, the patient with multiple NMSCs had human papillomavirus type 15 and type 38 present in 6 out of 8 NMSCs.
Collapse
Affiliation(s)
- Laila Sara Arroyo Mühr
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Maria Hortlund
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Zurab Bzhalava
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Sara Nordqvist Kleppe
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Davit Bzhalava
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Emilie Hultin
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Division of Pathology, Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|