1
|
Akinsulie OC, Idris I, Aliyu VA, Shahzad S, Banwo OG, Ogunleye SC, Olorunshola M, Okedoyin DO, Ugwu C, Oladapo IP, Gbadegoye JO, Akande QA, Babawale P, Rostami S, Soetan KO. The potential application of artificial intelligence in veterinary clinical practice and biomedical research. Front Vet Sci 2024; 11:1347550. [PMID: 38356661 PMCID: PMC10864457 DOI: 10.3389/fvets.2024.1347550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024] Open
Abstract
Artificial intelligence (AI) is a fast-paced technological advancement in terms of its application to various fields of science and technology. In particular, AI has the potential to play various roles in veterinary clinical practice, enhancing the way veterinary care is delivered, improving outcomes for animals and ultimately humans. Also, in recent years, the emergence of AI has led to a new direction in biomedical research, especially in translational research with great potential, promising to revolutionize science. AI is applicable in antimicrobial resistance (AMR) research, cancer research, drug design and vaccine development, epidemiology, disease surveillance, and genomics. Here, we highlighted and discussed the potential impact of various aspects of AI in veterinary clinical practice and biomedical research, proposing this technology as a key tool for addressing pressing global health challenges across various domains.
Collapse
Affiliation(s)
- Olalekan Chris Akinsulie
- Faculty of Veterinary Medicine, University of Ibadan, Ibadan, Nigeria
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Ibrahim Idris
- Faculty of Veterinary Medicine, Usman Danfodiyo University, Sokoto, Nigeria
| | | | - Sammuel Shahzad
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | | | - Seto Charles Ogunleye
- Faculty of Veterinary Medicine, University of Ibadan, Ibadan, Nigeria
- Department of Population Medicine and Pathobiology, College of Veterinary Medicine, Mississippi State University, Starkville, MS, United States
| | - Mercy Olorunshola
- Department of Pharmaceutical Microbiology, University of Ibadan, Ibadan, Nigeria
| | - Deborah O. Okedoyin
- Department of Animal Sciences, North Carolina Agricultural and Technical State University, Greensboro, NC, United States
| | - Charles Ugwu
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | | | - Joy Olaoluwa Gbadegoye
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Qudus Afolabi Akande
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, United States
| | - Pius Babawale
- Department of Pathobiological Sciences, School of Veterinary Medicine, Louisiana State University, Baton Rouge, LA, United States
| | - Sahar Rostami
- Department of Population Medicine and Pathobiology, College of Veterinary Medicine, Mississippi State University, Starkville, MS, United States
| | | |
Collapse
|
2
|
Gonzalez-Isunza G, Jawaid MZ, Liu P, Cox DL, Vazquez M, Arsuaga J. Using machine learning to detect coronaviruses potentially infectious to humans. Sci Rep 2023; 13:9319. [PMID: 37291260 PMCID: PMC10248971 DOI: 10.1038/s41598-023-35861-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 05/24/2023] [Indexed: 06/10/2023] Open
Abstract
Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), and Rhinolophus affinis coronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.
Collapse
Affiliation(s)
| | - M Zaki Jawaid
- Department of Physics, University of California, Davis, USA
| | - Pengyu Liu
- Department of Microbiology and Molecular Genetics, University of California, Davis, CA, USA
| | - Daniel L Cox
- Department of Physics, University of California, Davis, USA
| | - Mariel Vazquez
- Department of Microbiology and Molecular Genetics, University of California, Davis, CA, USA
- Department of Mathematics, University of California, Davis, CA, USA
| | - Javier Arsuaga
- Department of Molecular and Cellular Biology, University of California, Davis, CA, USA.
- Department of Mathematics, University of California, Davis, CA, USA.
| |
Collapse
|
3
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
4
|
Roy T, Sharma K, Dhall A, Patiyal S, Raghava GPS. In silico method for predicting infectious strains of influenza A virus from its genome and protein sequences. J Gen Virol 2022; 103. [PMID: 36318663 DOI: 10.1099/jgv.0.001802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023] Open
Abstract
Influenza A is a contagious viral disease responsible for four pandemics in the past and a major public health concern. Being zoonotic in nature, the virus can cross the species barrier and transmit from wild aquatic bird reservoirs to humans via intermediate hosts. In this study, we have developed a computational method for the prediction of human-associated and non-human-associated influenza A virus sequences. The models were trained and validated on proteins and genome sequences of influenza A virus. Firstly, we have developed prediction models for 15 types of influenza A proteins using composition-based and one-hot-encoding features. We have achieved a highest AUC of 0.98 for HA protein on a validation dataset using dipeptide composition-based features. Of note, we obtained a maximum AUC of 0.99 using one-hot-encoding features for protein-based models on a validation dataset. Secondly, we built models using whole genome sequences which achieved an AUC of 0.98 on a validation dataset. In addition, we showed that our method outperforms a similarity-based approach (i.e., blast) on the same validation dataset. Finally, we integrated our best models into a user-friendly web server 'FluSPred' (https://webs.iiitd.edu.in/raghava/fluspred/index.html) and a standalone version (https://github.com/raghavagps/FluSPred) for the prediction of human-associated/non-human-associated influenza A virus strains.
Collapse
Affiliation(s)
- Trinita Roy
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Khushal Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra Pal Singh Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
5
|
Xu Y, Wojtczak D. Dive into machine learning algorithms for influenza virus host prediction with hemagglutinin sequences. Biosystems 2022; 220:104740. [DOI: 10.1016/j.biosystems.2022.104740] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/02/2022] [Accepted: 07/16/2022] [Indexed: 11/26/2022]
|
6
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
7
|
Computational Viromics: Applications of the Computational Biology in Viromics Studies. Virol Sin 2021; 36:1256-1260. [PMID: 34057678 PMCID: PMC8165334 DOI: 10.1007/s12250-021-00395-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 04/14/2021] [Indexed: 12/30/2022] Open
|
8
|
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genom Bioinform 2021; 3:lqab004. [PMID: 33554119 PMCID: PMC7849996 DOI: 10.1093/nargab/lqab004] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 01/21/2023] Open
Abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| | - Anja Seidel
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| |
Collapse
|
9
|
Lu C, Cai Z, Zou Y, Zhang Z, Chen W, Deng L, Du X, Wu A, Yang L, Wang D, Shu Y, Jiang T, Peng Y. FluPhenotype-a one-stop platform for early warnings of the influenza A virus. Bioinformatics 2020; 36:3251-3253. [PMID: 32049310 DOI: 10.1093/bioinformatics/btaa083] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 01/06/2020] [Accepted: 02/04/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Newly emerging influenza viruses keep challenging global public health. To evaluate the potential risk of the viruses, it is critical to rapidly determine the phenotypes of the viruses, including the antigenicity, host, virulence and drug resistance. RESULTS Here, we built FluPhenotype, a one-stop platform to rapidly determinate the phenotypes of the influenza A viruses. The input of FluPhenotype is the complete or partial genomic/protein sequences of the influenza A viruses. The output presents five types of information about the viruses: (i) sequence annotation including the gene and protein names as well as the open reading frames, (ii) potential hosts and human-adaptation-associated amino acid markers, (iii) antigenic and genetic relationships with the vaccine strains of different HA subtypes, (iv) mammalian virulence-related amino acid markers and (v) drug resistance-related amino acid markers. FluPhenotype will be a useful bioinformatic tool for surveillance and early warnings of the newly emerging influenza A viruses. AVAILABILITY AND IMPLEMENTATION It is publicly available from: http://www.computationalbiology.cn : 18888/IVEW. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Congyu Lu
- College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Zena Cai
- College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | | | - Zheng Zhang
- College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Wenjun Chen
- College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Lizong Deng
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Xiangjun Du
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangdong 510275, China
| | - Aiping Wu
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Lei Yang
- National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China
| | - Dayan Wang
- National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China
| | - Yuelong Shu
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangdong 510275, China
| | - Taijiao Jiang
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Yousong Peng
- College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| |
Collapse
|
10
|
Kogay R, Neely TB, Birnbaum DP, Hankel CR, Shakya M, Zhaxybayeva O. Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents. Genome Biol Evol 2020; 11:2941-2953. [PMID: 31560374 PMCID: PMC6821227 DOI: 10.1093/gbe/evz206] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2019] [Indexed: 12/20/2022] Open
Abstract
Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the "head-tail" gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a "support vector machine" classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like "head-tail" gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.
Collapse
Affiliation(s)
- Roman Kogay
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire
| | - Taylor B Neely
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Amazon.com Inc., Seattle, WA
| | - Daniel P Birnbaum
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,School of Engineering and Applied Sciences, Harvard University, Cambridge, MA
| | - Camille R Hankel
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA
| | - Migun Shakya
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM
| | - Olga Zhaxybayeva
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Department of Computer Science, Dartmouth College, Hanover, New Hampshire
| |
Collapse
|
11
|
Zhang Z, Cai Z, Tan Z, Lu C, Jiang T, Zhang G, Peng Y. Rapid identification of human-infecting viruses. Transbound Emerg Dis 2019; 66:2517-2522. [PMID: 31373773 PMCID: PMC7168554 DOI: 10.1111/tbed.13314] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/01/2019] [Accepted: 07/26/2019] [Indexed: 01/08/2023]
Abstract
Viruses have caused much mortality and morbidity to humans and pose a serious threat to global public health. The virome with the potential of human infection is still far from complete. Novel viruses have been discovered at an unprecedented pace as the rapid development of viral metagenomics. However, there is still a lack of methodology for rapidly identifying novel viruses with the potential of human infection. This study built several machine learning models to discriminate human-infecting viruses from other viruses based on the frequency of k-mers in the viral genomic sequences. The k-nearest neighbor (KNN) model can predict the human-infecting viruses with an accuracy of over 90%. The performance of this KNN model built on the short contigs (≥1 kb) is comparable to those built on the viral genomes. We used a reported human blood virome to further validate this KNN model with an accuracy of over 80% based on very short raw reads (150 bp). Our work demonstrates a conceptual and generic protocol for the discovery of novel human-infecting viruses in viral metagenomics studies.
Collapse
Affiliation(s)
- Zheng Zhang
- College of BiologyHunan UniversityChangshaChina
| | - Zena Cai
- College of BiologyHunan UniversityChangshaChina
| | - Zhiying Tan
- College of Computer Science and Electronic EngineeringHunan UniversityChangshaChina
| | - Congyu Lu
- College of BiologyHunan UniversityChangshaChina
| | - Taijiao Jiang
- Suzhou Institute of Systems MedicineSuzhouChina
- Center of System Medicine, Institute of Basic Medical SciencesChinese Academy of Medical Sciences & Peking Union Medical CollegeBeijingChina
| | - Gaihua Zhang
- College of Life SciencesHunan Normal UniversityChangshaChina
| | | |
Collapse
|
12
|
Application of Support Vector Machines in Viral Biology. GLOBAL VIROLOGY III: VIROLOGY IN THE 21ST CENTURY 2019. [PMCID: PMC7114997 DOI: 10.1007/978-3-030-29022-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Novel experimental and sequencing techniques have led to an exponential explosion and spiraling of data in viral genomics. To analyse such data, rapidly gain information, and transform this information to knowledge, interdisciplinary approaches involving several different types of expertise are necessary. Machine learning has been in the forefront of providing models with increasing accuracy due to development of newer paradigms with strong fundamental bases. Support Vector Machines (SVM) is one such robust tool, based rigorously on statistical learning theory. SVM provides very high quality and robust solutions to classification and regression problems. Several studies in virology employ high performance tools including SVM for identification of potentially important gene and protein functions. This is mainly due to the highly beneficial aspects of SVM. In this chapter we briefly provide lucid and easy to understand details of SVM algorithms along with applications in virology.
Collapse
|