1
|
Kumar N, Srivastava R. Deep learning in structural bioinformatics: current applications and future perspectives. Brief Bioinform 2024; 25:bbae042. [PMID: 38701422 PMCID: PMC11066934 DOI: 10.1093/bib/bbae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/05/2024] [Accepted: 01/18/2024] [Indexed: 05/05/2024] Open
Abstract
In this review article, we explore the transformative impact of deep learning (DL) on structural bioinformatics, emphasizing its pivotal role in a scientific revolution driven by extensive data, accessible toolkits and robust computing resources. As big data continue to advance, DL is poised to become an integral component in healthcare and biology, revolutionizing analytical processes. Our comprehensive review provides detailed insights into DL, featuring specific demonstrations of its notable applications in bioinformatics. We address challenges tailored for DL, spotlight recent successes in structural bioinformatics and present a clear exposition of DL-from basic shallow neural networks to advanced models such as convolution, recurrent, artificial and transformer neural networks. This paper discusses the emerging use of DL for understanding biomolecular structures, anticipating ongoing developments and applications in the realm of structural bioinformatics.
Collapse
Affiliation(s)
- Niranjan Kumar
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Rakesh Srivastava
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
2
|
Liu GY, Yu D, Fan MM, Zhang X, Jin ZY, Tang C, Liu XF. Antimicrobial resistance crisis: could artificial intelligence be the solution? Mil Med Res 2024; 11:7. [PMID: 38254241 PMCID: PMC10804841 DOI: 10.1186/s40779-024-00510-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance is a global public health threat, and the World Health Organization (WHO) has announced a priority list of the most threatening pathogens against which novel antibiotics need to be developed. The discovery and introduction of novel antibiotics are time-consuming and expensive. According to WHO's report of antibacterial agents in clinical development, only 18 novel antibiotics have been approved since 2014. Therefore, novel antibiotics are critically needed. Artificial intelligence (AI) has been rapidly applied to drug development since its recent technical breakthrough and has dramatically improved the efficiency of the discovery of novel antibiotics. Here, we first summarized recently marketed novel antibiotics, and antibiotic candidates in clinical development. In addition, we systematically reviewed the involvement of AI in antibacterial drug development and utilization, including small molecules, antimicrobial peptides, phage therapy, essential oils, as well as resistance mechanism prediction, and antibiotic stewardship.
Collapse
Affiliation(s)
- Guang-Yu Liu
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Dan Yu
- National Key Discipline of Pediatrics Key Laboratory of Major Diseases in Children Ministry of Education, Laboratory of Dermatology, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Mei-Mei Fan
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Xu Zhang
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Ze-Yu Jin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Christoph Tang
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK.
| | - Xiao-Fen Liu
- Institute of Antibiotics, Huashan Hospital, Fudan University, Key Laboratory of Clinical Pharmacology of Antibiotics, National Health Commission of the People's Republic of China, National Clinical Research Centre for Aging and Medicine, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| |
Collapse
|
3
|
Cobián Güemes AG, Le T, Rojas MI, Jacobson NE, Villela H, McNair K, Hung SH, Han L, Boling L, Octavio JC, Dominguez L, Cantú VA, Archdeacon S, Vega AA, An MA, Hajama H, Burkeen G, Edwards RA, Conrad DJ, Rohwer F, Segall AM. Compounding Achromobacter Phages for Therapeutic Applications. Viruses 2023; 15:1665. [PMID: 37632008 PMCID: PMC10457797 DOI: 10.3390/v15081665] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/27/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open
Abstract
Achromobacter species colonization of Cystic Fibrosis respiratory airways is an increasing concern. Two adult patients with Cystic Fibrosis colonized by Achromobacter xylosoxidans CF418 or Achromobacter ruhlandii CF116 experienced fatal exacerbations. Achromobacter spp. are naturally resistant to several antibiotics. Therefore, phages could be valuable as therapeutics for the control of Achromobacter. In this study, thirteen lytic phages were isolated and characterized at the morphological and genomic levels for potential future use in phage therapy. They are presented here as the Achromobacter Kumeyaay phage collection. Six distinct Achromobacter phage genome clusters were identified based on a comprehensive phylogenetic analysis of the Kumeyaay collection as well as the publicly available Achromobacter phages. The infectivity of all phages in the Kumeyaay collection was tested in 23 Achromobacter clinical isolates; 78% of these isolates were lysed by at least one phage. A cryptic prophage was induced in Achromobacter xylosoxidans CF418 when infected with some of the lytic phages. This prophage genome was characterized and is presented as Achromobacter phage CF418-P1. Prophage induction during lytic phage preparation for therapy interventions require further exploration. Large-scale production of phages and removal of endotoxins using an octanol-based procedure resulted in a phage concentrate of 1 × 109 plaque-forming units per milliliter with an endotoxin concentration of 65 endotoxin units per milliliter, which is below the Food and Drugs Administration recommended maximum threshold for human administration. This study provides a comprehensive framework for the isolation, bioinformatic characterization, and safe production of phages to kill Achromobacter spp. in order to potentially manage Cystic Fibrosis (CF) pulmonary infections.
Collapse
Affiliation(s)
- Ana Georgina Cobián Güemes
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Tram Le
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Maria Isabel Rojas
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Nicole E. Jacobson
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Helena Villela
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
- Marine Microbiomes Lab, Red Sea Research Center, King Abdullah University of Science and Technology, Building 2, Level 3, Room 3216 WS03, Thuwal 23955-6900, Saudi Arabia
| | - Katelyn McNair
- Computational Sciences Research Center, San Diego State University, San Diego, CA 92182, USA
| | - Shr-Hau Hung
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Lili Han
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
- Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Lance Boling
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Jessica Claire Octavio
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Lorena Dominguez
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Vito Adrian Cantú
- Computational Sciences Research Center, San Diego State University, San Diego, CA 92182, USA
| | - Sinéad Archdeacon
- College of Biological Sciences, University of California Davis, Davis, CA 95616, USA
| | - Alejandro A. Vega
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90025, USA
| | - Michelle A. An
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Hamza Hajama
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Gregory Burkeen
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Robert A. Edwards
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
- Computational Sciences Research Center, San Diego State University, San Diego, CA 92182, USA
- Flinders Accelerator for Microbiome Exploration, Flinders University, Sturt Road, Bedford Park 5042, Australia
| | - Douglas J. Conrad
- Department of Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA 9500, USA
| | - Forest Rohwer
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
| | - Anca M. Segall
- Department of Biology, Viral Information Institute, San Diego State University, San Diego, CA 92182, USA
- Computational Sciences Research Center, San Diego State University, San Diego, CA 92182, USA
| |
Collapse
|
4
|
Shang J, Peng C, Tang X, Sun Y. PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer. Bioinformatics 2023; 39:i30-i39. [PMID: 37387136 DOI: 10.1093/bioinformatics/btad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification. RESULTS In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins. AVAILABILITY AND IMPLEMENTATION The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP.
Collapse
Affiliation(s)
- Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Cheng Peng
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Xubo Tang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| |
Collapse
|
5
|
Shakir S, Zaidi SSEA, Hashemi FSG, Nyirakanani C, Vanderschuren H. Harnessing plant viruses in the metagenomics era: from the development of infectious clones to applications. TRENDS IN PLANT SCIENCE 2023; 28:297-311. [PMID: 36379846 DOI: 10.1016/j.tplants.2022.10.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 10/17/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Recent metagenomic studies which focused on virus characterization in the entire plant environment have revealed a remarkable viral diversity in plants. The exponential discovery of viruses also requires the concomitant implementation of high-throughput methods to perform their functional characterization. Despite several limitations, the development of viral infectious clones remains a method of choice to understand virus biology, their role in the phytobiome, and plant resilience. Here, we review the latest approaches for efficient characterization of plant viruses and technical advances built on high-throughput sequencing and synthetic biology to streamline assembly of viral infectious clones. We then discuss the applications of plant viral vectors in fundamental and applied plant research as well as their technical and regulatory limitations, and we propose strategies for their safer field applications.
Collapse
Affiliation(s)
- Sara Shakir
- Plant Genetics and Rhizosphere Processes Laboratory, TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.
| | - Syed Shan-E-Ali Zaidi
- Plant Genetics and Rhizosphere Processes Laboratory, TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Farahnaz Sadat Golestan Hashemi
- Plant Genetics and Rhizosphere Processes Laboratory, TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Chantal Nyirakanani
- Plant Genetics and Rhizosphere Processes Laboratory, TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium; Department of Crop Science, School of Agriculture, University of Rwanda, Musanze, Rwanda
| | - Hervé Vanderschuren
- Plant Genetics and Rhizosphere Processes Laboratory, TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium; Laboratory of Tropical Crop Improvement, Division of Crop Biotechnics, Biosystems Department, KU Leuven, Leuven, Belgium.
| |
Collapse
|
6
|
Zhou F, Yang H, Si Y, Gan R, Yu L, Chen C, Ren C, Wu J, Zhang F. PhageTailFinder: A tool for phage tail module detection and annotation. Front Genet 2023; 14:947466. [PMID: 36755570 PMCID: PMC9901426 DOI: 10.3389/fgene.2023.947466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 01/05/2023] [Indexed: 01/24/2023] Open
Abstract
Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).
Collapse
Affiliation(s)
- Fengxia Zhou
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Han Yang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Si
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rui Gan
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ling Yu
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chuangeng Chen
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyan Ren
- Department of Hematology, Department of Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| | - Jiqiu Wu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Fan Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| |
Collapse
|
7
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
8
|
Braga LPP, Orland C, Emilson EJS, Fitch AA, Osterholz H, Dittmar T, Basiliko N, Mykytczuk NCS, Tanentzap AJ. Viruses direct carbon cycling in lake sediments under global change. Proc Natl Acad Sci U S A 2022; 119:e2202261119. [PMID: 36206369 PMCID: PMC9564219 DOI: 10.1073/pnas.2202261119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 08/18/2022] [Indexed: 11/18/2022] Open
Abstract
Global change is altering the vast amount of carbon cycled by microbes between land and freshwater, but how viruses mediate this process is poorly understood. Here, we show that viruses direct carbon cycling in lake sediments, and these impacts intensify with future changes in water clarity and terrestrial organic matter (tOM) inputs. Using experimental tOM gradients within sediments of a clear and a dark boreal lake, we identified 156 viral operational taxonomic units (vOTUs), of which 21% strongly increased with abundances of key bacteria and archaea, identified via metagenome-assembled genomes (MAGs). MAGs included the most abundant prokaryotes, which were themselves associated with dissolved organic matter (DOM) composition and greenhouse gas (GHG) concentrations. Increased abundances of virus-like particles were separately associated with reduced bacterial metabolism and with shifts in DOM toward amino sugars, likely released by cell lysis rather than higher molecular mass compounds accumulating from reduced tOM degradation. An additional 9.6% of vOTUs harbored auxiliary metabolic genes associated with DOM and GHGs. Taken together, these different effects on host dynamics and metabolism can explain why abundances of vOTUs rather than MAGs were better overall predictors of carbon cycling. Future increases in tOM quantity, but not quality, will change viral composition and function with consequences for DOM pools. Given their importance, viruses must now be explicitly considered in efforts to understand and predict the freshwater carbon cycle and its future under global environmental change.
Collapse
Affiliation(s)
- Lucas P. P. Braga
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
- Institute of Chemistry, University of Sao Paulo, Sao Paulo 05508-900, Brazil
| | - Chloé Orland
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Erik J. S. Emilson
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Amelia A. Fitch
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Helena Osterholz
- Institute for Chemistry and Biology of the Marine Environment and Helmholtz Institute for Functional Marine Biodiversity, University of Oldenburg, 26129 Oldenburg, Germany
| | - Thorsten Dittmar
- Institute for Chemistry and Biology of the Marine Environment and Helmholtz Institute for Functional Marine Biodiversity, University of Oldenburg, 26129 Oldenburg, Germany
| | - Nathan Basiliko
- Vale Living with Lakes Centre, Laurentian University, Sudbury, ON P3E2C6, Canada
| | | | - Andrew J. Tanentzap
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| |
Collapse
|
9
|
Microbiome-phage interactions in inflammatory bowel disease. Clin Microbiol Infect 2022:S1198-743X(22)00506-7. [PMID: 36191844 DOI: 10.1016/j.cmi.2022.08.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/23/2022] [Accepted: 08/29/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Inflammatory bowel diseases (IBD) constitute a group of auto-inflammatory disorders impacting the gastrointestinal tract and other systemic organs. The gut microbiome contributes to IBD pathology through multiple mechanisms. Bacteriophages (hence termed phages) are viruses that are able to specifically infect bacteria. Considered as part of the gut microbiome, phages may impact bacterial community structure in various clinical contexts. Additionally, exogenous phage administration may represent a means of suppressing IBD-associated pathobionts, yet utilization of phage therapy remains at an early developmental phase. OBJECTIVES Herein, we summarize the latest advances in understanding endogenous phage impacts on the gut microbiome in health and in IBD. We highlight the prospect of phage utilization as a targeted mode of pathobiont eradication, in preventing and treating IBD manifestations and complications. SOURCES Selected peer-reviewed publications regarding the role of phages in health and in IBD, published between 2013 and 2022. CONTENT The human gut microbiome is increasingly suggested to play a significant role in the onset and progression of multiple non-communicable diseases such as IBD. Several studies suggest that this effect may be mediated by discrete disease-contributing commensals. However, eradication of such pathogenic bacteria remains a daunting unmet task. Altered community structure in IBD may be influenced by blooms of phages within the gut bacterial ecosystem. Moreover, combinations of phages specifically targeting disease-contributing pathobiont strain clades may be harnessed as potential eradication treatment preventing and treating IBD, while bearing minimal adverse impacts on the surrounding bacterial microbiome. IMPLICATIONS Understanding endogenous phage-gut commensal interactions in health and in IBD may enable phage utilization in precision gut microbiome editing, towards treating IBD and other non-communicable microbiome-associated diseases. Nevertheless, developing phage combination-mediated IBD pathobiont eradication treatment modalities will likely necessitate better strain-level bacterial target identification and resolution of treatment-related challenges, such as phage delivery, off-target effects, and bacterial resistance.
Collapse
|
10
|
Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience 2022; 11:giac076. [PMID: 35950840 PMCID: PMC9366990 DOI: 10.1093/gigascience/giac076] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/08/2022] [Accepted: 07/11/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Muxuan Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
11
|
Ataee S, Brochet X, Peña-Reyes CA. Bacteriophage Genetic Edition Using LSTM. FRONTIERS IN BIOINFORMATICS 2022; 2:932319. [PMID: 36353213 PMCID: PMC9639385 DOI: 10.3389/fbinf.2022.932319] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 06/06/2022] [Indexed: 09/16/2023] Open
Abstract
Bacteriophages are gaining increasing interest as antimicrobial tools, largely due to the emergence of multi-antibiotic-resistant bacteria. Although their huge diversity and virulence make them particularly attractive for targeting a wide range of bacterial pathogens, it is difficult to select suitable phages due to their high specificity which limits their host range. In addition, other challenges remain such as structural fragility under certain environmental conditions, immunogenicity of phage therapy, or development of bacterial resistance. The use of genetically engineered phages may reduce characteristics that hinder prophylactic and therapeutic applications of phages. Nowadays, there is no systematic method to modify a given phage genome conferring its sought characteristics. We explore the use of artificial intelligence for this purpose as it has the potential to both guide and accelerate genome modification to generate phage variants with unique properties that overcome the limitations of natural phages. We propose an original architecture composed of two deep learning-driven components: a phage-bacterium interaction predictor and a phage genome-sequence generator. The former is a multi-branch 1-D convolutional neural network (1D-CNN) that analyses phage and bacterial genomes to predict interactions. The latter is a recurrent neural network, more particularly a long short-term memory (LSTM), that performs genomic modifications to a phage to offer substantial host range improvement. For this component, we developed two different architectures composed of one or two stacked LSTM layers with 256 neurons each. These generators are used to modify, more precisely to rewrite, the genome sequence of 42 selected phages, while the predictor is used to estimate the host range of the modified bacteriophages across 46 strains of Pseudomonas aeruginosa. The proposed generators, trained with an average accuracy of 96.1%, are able to improve the host range for an average of 18 phages among the 42 under study, increasing both their average host range, by 73.0 and 103.7%, and the maximum host ranges from 21 to 24 and 29, respectively. These promising results showed that the use of deep learning methodologies allows genetic modification of phages to extend, for instance, their host range, confirming the potential of these approaches to guide bacteriophage engineering.
Collapse
Affiliation(s)
- Shabnam Ataee
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Xavier Brochet
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carlos Andrés Peña-Reyes
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
12
|
Chu Y, Guo S, Cui D, Fu X, Ma Y. DeephageTP: a convolutional neural network framework for identifying phage-specific proteins from metagenomic sequencing data. PeerJ 2022; 10:e13404. [PMID: 35698617 PMCID: PMC9188312 DOI: 10.7717/peerj.13404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/18/2022] [Indexed: 01/14/2023] Open
Abstract
Bacteriophages (phages) are the most abundant and diverse biological entity on Earth. Due to the lack of universal gene markers and database representatives, there about 50-90% of genes of phages are unable to assign functions. This makes it a challenge to identify phage genomes and annotate functions of phage genes efficiently by homology search on a large scale, especially for newly phages. Portal (portal protein), TerL (large terminase subunit protein), and TerS (small terminase subunit protein) are three specific proteins of Caudovirales phage. Here, we developed a CNN (convolutional neural network)-based framework, DeephageTP, to identify the three specific proteins from metagenomic data. The framework takes one-hot encoding data of original protein sequences as the input and automatically extracts predictive features in the process of modeling. To overcome the false positive problem, a cutoff-loss-value strategy is introduced based on the distributions of the loss values of protein sequences within the same category. The proposed model with a set of cutoff-loss-values demonstrates high performance in terms of Precision in identifying TerL and Portal sequences (94% and 90%, respectively) from the mimic metagenomic dataset. Finally, we tested the efficacy of the framework using three real metagenomic datasets, and the results shown that compared to the conventional alignment-based methods, our proposed framework had a particular advantage in identifying the novel phage-specific protein sequences of portal and TerL with remote homology to their counterparts in the training datasets. In summary, our study for the first time develops a CNN-based framework for identifying the phage-specific protein sequences with high complexity and low conservation, and this framework will help us find novel phages in metagenomic sequencing data. The DeephageTP is available at https://github.com/chuym726/DeephageTP.
Collapse
Affiliation(s)
- Yunmeng Chu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China,Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, Fujian, P.R. China
| | - Shun Guo
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Dachao Cui
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Xiongfei Fu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Yingfei Ma
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| |
Collapse
|
13
|
Phage_UniR_LGBM: Phage Virion Proteins Classification with UniRep Features and LightGBM Model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:9470683. [PMID: 35465015 PMCID: PMC9033350 DOI: 10.1155/2022/9470683] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/15/2022] [Indexed: 11/23/2022]
Abstract
Phage, the most prevalent creature on the planet, serves a variety of critical roles. Phage's primary role is to facilitate gene-to-gene communication. The phage proteins can be defined as the virion proteins and the nonvirion ones. Nowadays, experimental identification is a difficult process that necessitates a significant amount of laboratory time and expense. Considering such situation, it is critical to design practical calculating techniques and develop well-performance tools. In this work, the Phage_UniR_LGBM has been proposed to classify the virion proteins. In detailed, such model utilizes the UniRep as the feature and the LightGBM algorithm as the classification model. And then, the training data train the model, and the testing data test the model with the cross-validation. The Phage_UniR_LGBM was compared with the several state-of-the-art features and classification algorithms. The performances of the Phage_UniR_LGBM are 88.51% in Sp,89.89% in Sn, 89.18% in Acc, 0.7873 in MCC, and 0.8925 in F1 score.
Collapse
|
14
|
Ahmad S, Charoenkwan P, Quinn JMW, Moni MA, Hasan MM, Lio' P, Shoombuatong W. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep 2022; 12:4106. [PMID: 35260777 PMCID: PMC8904530 DOI: 10.1038/s41598-022-08173-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/03/2022] [Indexed: 12/30/2022] Open
Abstract
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
Collapse
Affiliation(s)
- Saeed Ahmad
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Faculty of Health and Behavioural Sciences, School of Health and Rehabilitation Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Md Mehedi Hasan
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
15
|
Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI JOURNAL 2022; 21:11-29. [PMID: 35145365 PMCID: PMC8822302 DOI: 10.17179/excli2021-4411] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/29/2021] [Indexed: 12/11/2022]
Abstract
Phage virion proteins (PVPs) are effective at recognizing and binding to host cell receptors while having no deleterious effects on human or animal cells. Understanding their functional mechanisms is regarded as a critical goal that will aid in rational antibacterial drug discovery and development. Although high-throughput experimental methods for identifying PVPs are considered the gold standard for exploring crucial PVP features, these procedures are frequently time-consuming and labor-intensive. Thusfar, more than ten sequence-based predictors have been established for the in silico identification of PVPs in conjunction with traditional experimental approaches. As a result, a revised and more thorough assessment is extremely desirable. With this purpose in mind, we first conduct a thorough survey and evaluation of a vast array of 13 state-of-the-art PVP predictors. Among these PVP predictors, they can be classified into three groups according to the types of machine learning (ML) algorithms employed (i.e. traditional ML-based methods, ensemble-based methods and deep learning-based methods). Subsequently, we explored which factors are important for building more accurate and stable predictors and this included training/independent datasets, feature encoding algorithms, feature selection methods, core algorithms, performance evaluation metrics/strategies and web servers. Finally, we provide insights and future perspectives for the design and development of new and more effective computational approaches for the detection and characterization of PVPs.
Collapse
Affiliation(s)
- Muhammad Kabir
- School of Systems and Technology, Department of Computer Science, University of Management and Technology, Lahore, Pakistan, 54770
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand, 40002
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
16
|
Predicting the capsid architecture of phages from metagenomic data. Comput Struct Biotechnol J 2022; 20:721-732. [PMID: 35140890 PMCID: PMC8814770 DOI: 10.1016/j.csbj.2021.12.032] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/29/2022] Open
Abstract
Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.
Collapse
|
17
|
Gu X, Guo L, Liao B, Jiang Q. Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D. Front Genet 2021; 12:796327. [PMID: 34925468 PMCID: PMC8672092 DOI: 10.3389/fgene.2021.796327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Phages have seriously affected the biochemical systems of the world, and not only are phages related to our health, but medical treatments for many cancers and skin infections are related to phages; therefore, this paper sought to identify phage proteins. In this paper, a Pseudo-188D model was established. The digital features of the phage were extracted by PseudoKNC, an appropriate vector was selected by the AdaBoost tool, and features were extracted by 188D. Then, the extracted digital features were combined together, and finally, the viral proteins of the phage were predicted by a stochastic gradient descent algorithm. Our model effect reached 93.4853%. To verify the stability of our model, we randomly selected 80% of the downloaded data to train the model and used the remaining 20% of the data to verify the robustness of our model.
Collapse
Affiliation(s)
- Xiaomei Gu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Institute of Yangtze River Delta, University of Electronic Science and Technology of China, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lina Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Qinghua Jiang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
18
|
iPVP-MCV: A Multi-Classifier Voting Model for the Accurate Identification of Phage Virion Proteins. Symmetry (Basel) 2021. [DOI: 10.3390/sym13081506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The classic structure of a bacteriophage is commonly characterized by complex symmetry. The head of the structure features icosahedral symmetry, whereas the tail features helical symmetry. The phage virion protein (PVP), a type of bacteriophage structural protein, is an essential material of the infectious viral particles and is responsible for multiple biological functions. Accurate identification of PVPs is of great significance for comprehending the interaction between phages and host bacteria and developing new antimicrobial drugs or antibiotics. However, traditional experimental approaches for identifying PVPs are often time-consuming and laborious. Therefore, the development of computational methods that can efficiently and accurately identify PVPs is desired. In this study, we proposed a multi-classifier voting model called iPVP-MCV to enhance the predictive performance of PVPs based on their amino acid sequences. First, three types of evolutionary features were extracted from the position-specific scoring matrix (PSSM) profiles to represent PVPs and non-PVPs. Then, a set of baseline models were trained based on the support vector machine (SVM) algorithm combined with each type of feature descriptors. Finally, the outputs of these baseline models were integrated to construct the proposed method iPVP-MCV by using the majority voting strategy. Our results demonstrated that the proposed iPVP-MCV model was superior to existing methods when performing the rigorous independent dataset test.
Collapse
|
19
|
Nami Y, Imeni N, Panahi B. Application of machine learning in bacteriophage research. BMC Microbiol 2021; 21:193. [PMID: 34174831 PMCID: PMC8235560 DOI: 10.1186/s12866-021-02256-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/08/2021] [Indexed: 12/20/2022] Open
Abstract
Phages are one of the key components in the structure, dynamics, and interactions of microbial communities in different bins. It has a clear impact on human health and the food industry. Bacteriophage characterization using in vitro approaches are time/cost consuming and laborious tasks. On the other hand, with the advent of new high-throughput sequencing technology, the development of a powerful computational framework to characterize the newly identified bacteriophages is inevitable for future research. Machine learning includes powerful techniques that enable the analysis of complex datasets for knowledge discovery and pattern recognition. In this study, we have conducted a comprehensive review of machine learning methods application using different types of features were applied in various aspects of bacteriophage research including, automated curation, identification, classification, host species recognition, virion protein identification, and life cycle prediction. Moreover, potential limitations and advantages of the developed frameworks were discussed.
Collapse
Affiliation(s)
- Yousef Nami
- Department of Food Biotechnology, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Nazila Imeni
- Young Researchers and Elite Clube, Marand Branch, Islamic Azad University, Marand, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
20
|
Martínez-Ruiz EB, Cooper M, Barrero-Canosa J, Haryono MAS, Bessarab I, Williams RBH, Szewzyk U. Genome analysis of Pseudomonas sp. OF001 and Rubrivivax sp. A210 suggests multicopper oxidases catalyze manganese oxidation required for cylindrospermopsin transformation. BMC Genomics 2021; 22:464. [PMID: 34157973 PMCID: PMC8218464 DOI: 10.1186/s12864-021-07766-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 06/03/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cylindrospermopsin is a highly persistent cyanobacterial secondary metabolite toxic to humans and other living organisms. Strain OF001 and A210 are manganese-oxidizing bacteria (MOB) able to transform cylindrospermopsin during the oxidation of Mn2+. So far, the enzymes involved in manganese oxidation in strain OF001 and A210 are unknown. Therefore, we analyze the genomes of two cylindrospermopsin-transforming MOB, Pseudomonas sp. OF001 and Rubrivivax sp. A210, to identify enzymes that could catalyze the oxidation of Mn2+. We also investigated specific metabolic features related to pollutant degradation and explored the metabolic potential of these two MOB with respect to the role they may play in biotechnological applications and/or in the environment. RESULTS Strain OF001 encodes two multicopper oxidases and one haem peroxidase potentially involved in Mn2+ oxidation, with a high similarity to manganese-oxidizing enzymes described for Pseudomonas putida GB-1 (80, 83 and 42% respectively). Strain A210 encodes one multicopper oxidase potentially involved in Mn2+ oxidation, with a high similarity (59%) to the manganese-oxidizing multicopper oxidase in Leptothrix discophora SS-1. Strain OF001 and A210 have genes that might confer them the ability to remove aromatic compounds via the catechol meta- and ortho-cleavage pathway, respectively. Based on the genomic content, both strains may grow over a wide range of O2 concentrations, including microaerophilic conditions, fix nitrogen, and reduce nitrate and sulfate in an assimilatory fashion. Moreover, the strain A210 encodes genes which may convey the ability to reduce nitrate in a dissimilatory manner, and fix carbon via the Calvin cycle. Both MOB encode CRISPR-Cas systems, several predicted genomic islands, and phage proteins, which likely contribute to their genome plasticity. CONCLUSIONS The genomes of Pseudomonas sp. OF001 and Rubrivivax sp. A210 encode sequences with high similarity to already described MCOs which may catalyze manganese oxidation required for cylindrospermopsin transformation. Furthermore, the analysis of the general metabolism of two MOB strains may contribute to a better understanding of the niches of cylindrospermopsin-removing MOB in natural habitats and their implementation in biotechnological applications to treat water.
Collapse
Affiliation(s)
- Erika Berenice Martínez-Ruiz
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany.
| | - Myriel Cooper
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany.
| | - Jimena Barrero-Canosa
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany
| | - Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Irina Bessarab
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Ulrich Szewzyk
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany
| |
Collapse
|
21
|
Component Parts of Bacteriophage Virions Accurately Defined by a Machine-Learning Approach Built on Evolutionary Features. mSystems 2021; 6:e0024221. [PMID: 34042467 PMCID: PMC8269216 DOI: 10.1128/msystems.00242-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Antimicrobial resistance (AMR) continues to evolve as a major threat to human health, and new strategies are required for the treatment of AMR infections. Bacteriophages (phages) that kill bacterial pathogens are being identified for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Despite the great unsampled phage diversity for this purpose, an issue hampering the roll out of phage therapy is the poor quality annotation of many of the phage genomes, particularly for those from infrequently sampled environmental sources. We developed a computational tool called STEP3 to use the “evolutionary features” that can be recognized in genome sequences of diverse phages. These features, when integrated into an ensemble framework, achieved a stable and robust prediction performance when benchmarked against other prediction tools using phages from diverse sources. Validation of the prediction accuracy of STEP3 was conducted with high-resolution mass spectrometry analysis of two novel phages, isolated from a watercourse in the Southern Hemisphere. STEP3 provides a robust computational approach to distinguish specific and universal features in phages to improve the quality of phage cocktails and is available for use at http://step3.erc.monash.edu/. IMPORTANCE In response to the global problem of antimicrobial resistance, there are moves to use bacteriophages (phages) as therapeutic agents. Selecting which phages will be effective therapeutics relies on interpreting features contributing to shelf-life and applicability to diagnosed infections. However, the protein components of the phage virions that dictate these properties vary so much in sequence that best estimates suggest failure to recognize up to 90% of them. We have utilized this diversity in evolutionary features as an advantage, to apply machine learning for prediction accuracy for diverse components in phage virions. We benchmark this new tool showing the accurate recognition and evaluation of phage component parts using genome sequence data of phages from undersampled environments, where the richest diversity of phage still lies.
Collapse
|
22
|
Campbell DE, Ly LK, Ridlon JM, Hsiao A, Whitaker RJ, Degnan PH. Infection with Bacteroides Phage BV01 Alters the Host Transcriptome and Bile Acid Metabolism in a Common Human Gut Microbe. Cell Rep 2021; 32:108142. [PMID: 32937127 PMCID: PMC8354205 DOI: 10.1016/j.celrep.2020.108142] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/07/2020] [Accepted: 08/21/2020] [Indexed: 12/16/2022] Open
Abstract
Gut-associated phages are hypothesized to alter the abundance and activity of their bacterial hosts, contributing to human health and disease. Although temperate phages constitute a significant fraction of the gut virome, the effects of lysogenic infection are underexplored. We report that the temperate phage, Bacteroides phage BV01, broadly alters its host's transcriptome, the prominent human gut symbiont Bacteroides vulgatus. This alteration occurs through phage-induced repression of a tryptophan-rich sensory protein (TspO) and represses bile acid deconjugation. Because microbially modified bile acids are important signals for the mammalian host, this is a mechanism by which a phage may influence mammalian phenotypes. Furthermore, BV01 and its relatives in the proposed phage family Salyersviridae are ubiquitous in human gut metagenomes, infecting a broad range of Bacteroides hosts. These results demonstrate the complexity of phage-bacteria-mammal relationships and emphasize a need to better understand the role of temperate phages in the gut microbiome.
Collapse
Affiliation(s)
| | - Lindsey K Ly
- Division of Nutritional Sciences, University of Illinois, Urbana, IL 61801, USA; Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Jason M Ridlon
- Division of Nutritional Sciences, University of Illinois, Urbana, IL 61801, USA; Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| | - Ansel Hsiao
- Department of Microbiology and Plant Pathology, University of California, Riverside, Riverside, CA 92521, USA
| | - Rachel J Whitaker
- Department of Microbiology, University of Illinois, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| | - Patrick H Degnan
- Department of Microbiology and Plant Pathology, University of California, Riverside, Riverside, CA 92521, USA.
| |
Collapse
|
23
|
Mwesigwa S, Williams L, Retshabile G, Katagirya E, Mboowa G, Mlotshwa B, Kyobe S, Kateete DP, Wampande EM, Wayengera M, Mpoloka SW, Mirembe AN, Kasvosve I, Morapedi K, Kisitu GP, Kekitiinwa AR, Anabwani G, Joloba ML, Matovu E, Mulindwa J, Noyes H, Botha G, Brown CW, Mardon G, Matshaba M, Hanchard NA. Unmapped exome reads implicate a role for Anelloviridae in childhood HIV-1 long-term non-progression. NPJ Genom Med 2021; 6:24. [PMID: 33741997 PMCID: PMC7979878 DOI: 10.1038/s41525-021-00185-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/25/2021] [Indexed: 01/31/2023] Open
Abstract
Human immunodeficiency virus (HIV) infection remains a significant public health burden globally. The role of viral co-infection in the rate of progression of HIV infection has been suggested but not empirically tested, particularly among children. We extracted and classified 42 viral species from whole-exome sequencing (WES) data of 813 HIV-infected children in Botswana and Uganda categorised as either long-term non-progressors (LTNPs) or rapid progressors (RPs). The Ugandan participants had a higher viral community diversity index compared to Batswana (p = 4.6 × 10-13), and viral sequences were more frequently detected among LTNPs than RPs (24% vs 16%; p = 0.008; OR, 1.9; 95% CI, 1.6-2.3), with Anelloviridae showing strong association with LTNP status (p = 3 × 10-4; q = 0.004, OR, 3.99; 95% CI, 1.74-10.25). This trend was still evident when stratified by country, sex, and sequencing platform, and after a logistic regression analysis adjusting for age, sex, country, and the sequencing platform (p = 0.02; q = 0.03; OR, 7.3; 95% CI, 1.6-40.5). Torque teno virus (TTV), which made up 95% of the Anelloviridae reads, has been associated with reduced immune activation. We identify an association between viral co-infection and prolonged AIDs-free survival status that may have utility as a biomarker of LTNP and could provide mechanistic insights to HIV progression in children, demonstrating the added value of interrogating off-target WES reads in cohort studies.
Collapse
Affiliation(s)
| | | | | | - Eric Katagirya
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - Gerald Mboowa
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Samuel Kyobe
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - David P Kateete
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Misaki Wayengera
- College of Health Sciences, Makerere University, Kampala, Uganda
| | | | - Angella N Mirembe
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | | | | | - Grace P Kisitu
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | - Adeodata R Kekitiinwa
- Baylor College of Medicine Children's Foundation Uganda (Baylor Uganda), Kampala, Uganda
| | - Gabriel Anabwani
- Botswana-Baylor Children's Clinical Centre of Excellence, Gaborone, Botswana
| | - Moses L Joloba
- College of Health Sciences, Makerere University, Kampala, Uganda
| | - Enock Matovu
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Julius Mulindwa
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Harry Noyes
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Gerrit Botha
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Chester W Brown
- University of Tennessee Health Science Center, Le Bonheur Children's Hospital, Memphis, TN, USA
| | - Graeme Mardon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Mogomotsi Matshaba
- Botswana-Baylor Children's Clinical Centre of Excellence, Gaborone, Botswana
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Neil A Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
24
|
Fang Z, Zhou H. VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids. Front Microbiol 2021; 12:615711. [PMID: 33613485 PMCID: PMC7894196 DOI: 10.3389/fmicb.2021.615711] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/04/2021] [Indexed: 01/22/2023] Open
Abstract
Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial prokaryote virus proteins is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10-34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, China
| |
Collapse
|
25
|
Abstract
Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding of biological systems analysed from different perspectives is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data are of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data have been successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects. This article is part of the theme issue ‘Integrative research perspectives on marine conservation’.
Collapse
Affiliation(s)
- Ovidiu Popa
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Ellen Oldenburg
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, CEPLAS, Heinrich-Heine University Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, CEPLAS, Heinrich-Heine University Düsseldorf, Germany
| |
Collapse
|
26
|
Cantu VA, Salamon P, Seguritan V, Redfield J, Salamon D, Edwards RA, Segall AM. PhANNs, a fast and accurate tool and web server to classify phage structural proteins. PLoS Comput Biol 2020; 16:e1007845. [PMID: 33137102 PMCID: PMC7660903 DOI: 10.1371/journal.pcbi.1007845] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 11/12/2020] [Accepted: 09/26/2020] [Indexed: 02/07/2023] Open
Abstract
For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally. Bacteriophages (phages, viruses that infect bacteria) are the most abundant biological entity on Earth. They outnumber bacteria by a factor of ten. As phages are very different from each other and from bacteria, and we have relatively few phage genes in our database compared to bacterial genes, we are unable to assign function to 50–90% of phage genes. In this work, we developed PhANNs, a machine learning tool that can classify a phage gene as one of ten structural roles, or “other”. This approach does not require a similar gene to be known.
Collapse
Affiliation(s)
- Vito Adrian Cantu
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
| | - Peter Salamon
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America
| | - Victor Seguritan
- Computational Science Research Center, San Diego State University, San Diego, United States of America
| | - Jackson Redfield
- Department of Biology, San Diego State University, San Diego, United States of America
| | - David Salamon
- Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America
| | - Robert A. Edwards
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Biology, San Diego State University, San Diego, United States of America
| | - Anca M. Segall
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Biology, San Diego State University, San Diego, United States of America
- * E-mail:
| |
Collapse
|
27
|
Hryckowian AJ, Merrill BD, Porter NT, Van Treuren W, Nelson EJ, Garlena RA, Russell DA, Martens EC, Sonnenburg JL. Bacteroides thetaiotaomicron-Infecting Bacteriophage Isolates Inform Sequence-Based Host Range Predictions. Cell Host Microbe 2020; 28:371-379.e5. [PMID: 32652063 PMCID: PMC8045012 DOI: 10.1016/j.chom.2020.06.011] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 04/22/2020] [Accepted: 06/12/2020] [Indexed: 12/21/2022]
Abstract
Our emerging view of the gut microbiome largely focuses on bacteria, while less is known about other microbial components, such as bacteriophages (phages). Though phages are abundant in the gut, very few phages have been isolated from this ecosystem. Here, we report the genomes of 27 phages from the United States and Bangladesh that infect the prevalent human gut bacterium Bacteroides thetaiotaomicron. These phages are mostly distinct from previously sequenced phages with the exception of two, which are crAss-like phages. We compare these isolates to existing human gut metagenomes, revealing similarities to previously inferred phages and additional unexplored phage diversity. Finally, we use host tropisms of these phages to identify alleles of phage structural genes associated with infectivity. This work provides a detailed view of the gut's "viral dark matter" and a framework for future efforts to further integrate isolation- and sequencing-focused efforts to understand gut-resident phages.
Collapse
Affiliation(s)
- Andrew J Hryckowian
- Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | - Bryan D Merrill
- Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nathan T Porter
- Department of Microbiology & Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - William Van Treuren
- Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eric J Nelson
- Emerging Pathogens Institute, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca A Garlena
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Daniel A Russell
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Eric C Martens
- Department of Microbiology & Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Justin L Sonnenburg
- Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
28
|
Bodner K, Melkonian AL, Covert MW. A Protocol to Engineer Bacteriophages for Live-Cell Imaging of Bacterial Prophage Induction Inside Mammalian Cells. STAR Protoc 2020; 1:100084. [PMID: 33111117 PMCID: PMC7580223 DOI: 10.1016/j.xpro.2020.100084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The gut microbiome is dominated by lysogens, bacteria that carry bacterial viruses (phages). Uncovering the function of phages in the microbiome and observing interactions between phages, bacteria, and mammalian cells in real time in specific cell types are limited by the difficulty of engineering fluorescent markers into large, lysogenic phage genomes. Here, we present a method to multiplex the engineering of life-cycle reporters into lysogenic phages and how to infect macrophages with engineered lysogens to study these interactions at the single-cell level. For complete details on the use and execution of this protocol, please refer to Bodner et al. (2020). A λ phage with a fluorescent lysis reporter is generated by yeast recombineering E. coli are infected with recombinant phage to form lysogens The lysis reporter is validated by agarose pad imaging and plaque assay Macrophages are infected with the lysogens and imaged to track prophage induction
Collapse
Affiliation(s)
- Katie Bodner
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center for Systems Modeling of Infection, Stanford University, Stanford, CA 94305, USA
- Corresponding author
| | - Arin L. Melkonian
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center for Systems Modeling of Infection, Stanford University, Stanford, CA 94305, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Allen Discovery Center for Systems Modeling of Infection, Stanford University, Stanford, CA 94305, USA
- Corresponding author
| |
Collapse
|
29
|
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 2020; 34:1105-1116. [DOI: 10.1007/s10822-020-00323-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 06/10/2020] [Indexed: 12/11/2022]
|
30
|
|
31
|
Li HF, Wang XF, Tang H. Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features. Front Bioeng Biotechnol 2020; 8:183. [PMID: 32266225 PMCID: PMC7105632 DOI: 10.3389/fbioe.2020.00183] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 02/24/2020] [Indexed: 12/19/2022] Open
Abstract
Bacteriophage is a type of virus that could infect the host bacteria. They have been applied in the treatment of pathogenic bacterial infection. Phage enzymes and hydrolases play the most important role in the destruction of bacterial cells. Correctly identifying the hydrolases coded by phage is not only beneficial to their function study, but also conducive to antibacteria drug discovery. Thus, this work aims to recognize the enzymes and hydrolases in phage. A combination of different features was used to represent samples of phage and hydrolase. A feature selection technique called analysis of variance was developed to optimize features. The classification was performed by using support vector machine (SVM). The prediction process includes two steps. The first step is to identify phage enzymes. The second step is to determine whether a phage enzyme is hydrolase or not. The jackknife cross-validated results showed that our method could produce overall accuracies of 85.1 and 94.3%, respectively, for the two predictions, demonstrating that the proposed method is promising.
Collapse
Affiliation(s)
- Hong-Fei Li
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China.,School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China
| |
Collapse
|
32
|
Meng C, Zhang J, Ye X, Guo F, Zou Q. Review and comparative analysis of machine learning-based phage virion protein identification methods. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140406. [PMID: 32135196 DOI: 10.1016/j.bbapap.2020.140406] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/14/2020] [Accepted: 02/27/2020] [Indexed: 02/01/2023]
Abstract
Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Science City, Japan
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
33
|
Tisza MJ, Pastrana DV, Welch NL, Stewart B, Peretti A, Starrett GJ, Pang YYS, Krishnamurthy SR, Pesavento PA, McDermott DH, Murphy PM, Whited JL, Miller B, Brenchley J, Rosshart SP, Rehermann B, Doorbar J, Ta'ala BA, Pletnikova O, Troncoso JC, Resnick SM, Bolduc B, Sullivan MB, Varsani A, Segall AM, Buck CB. Discovery of several thousand highly diverse circular DNA viruses. eLife 2020; 9:51971. [PMID: 32014111 PMCID: PMC7000223 DOI: 10.7554/elife.51971] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 01/06/2020] [Indexed: 12/18/2022] Open
Abstract
Although millions of distinct virus species likely exist, only approximately 9000 are catalogued in GenBank's RefSeq database. We selectively enriched for the genomes of circular DNA viruses in over 70 animal samples, ranging from nematodes to human tissue specimens. A bioinformatics pipeline, Cenote-Taker, was developed to automatically annotate over 2500 complete genomes in a GenBank-compliant format. The new genomes belong to dozens of established and emerging viral families. Some appear to be the result of previously undescribed recombination events between ssDNA and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these ‘dark matter’ sequences, we used an artificial neural network to identify candidate viral capsid proteins, several of which formed virus-like particles when expressed in culture. These data further the understanding of viral sequence diversity and allow for high throughput documentation of the virosphere. When scientists hunt for new DNA sequences, sometimes they get a lot more than they bargained for. Such is the case in metagenomic surveys, which analyze not just DNA of a particular organism, but all the DNA in an environment at large. A vexing problem with these surveys is the overwhelming number of DNA sequences detected that are so different from any known microbe that they cannot be classified using traditional approaches. However, some of these “known unknowns” are undoubtedly viral sequences, because only a fraction of the enormous diversity of viruses has been characterized. This “viral dark matter” is a major obstacle for those studying viruses. This led Tisza et al. to attempt to classify some of the unknown viral sequences in their metagenomic surveys. The search, which specifically focused on viruses with circular DNA genomes, detected over 2,500 circular viral genomes. Intensive analysis revealed that many of these genomes had similar makeup to previously discovered viruses, but hundreds of them were totally different from any known virus, based on typical methods of comparison. Computational analysis of genes that were conserved among some of these brand-new circular sequences often revealed virus-like features. Experiments on a few of these genes showed that they encoded proteins capable of forming particles reminiscent of characteristic viral shells, implying that these new sequences are indeed viruses. Tisza et al. have added the 2,500 newly characterized viral sequences to the publicly accessible GenBank database, and the sequences are being considered for the more authoritative RefSeq database, which currently contains around 9,000 complete viral genomes. The expanded databases will hopefully now better equip scientists to explore the enormous diversity of viruses and help medics and veterinarians to detect disease-causing viruses in humans and other animals.
Collapse
Affiliation(s)
- Michael J Tisza
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Diana V Pastrana
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Nicole L Welch
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Brittany Stewart
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Alberto Peretti
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Gabriel J Starrett
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Yuk-Ying S Pang
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| | - Siddharth R Krishnamurthy
- Metaorganism Immunity Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States
| | - Patricia A Pesavento
- Department of Pathology, Microbiology, and Immunology, University of California, Davis, Davis, United States
| | - David H McDermott
- Molecular Signaling Section, Laboratory of Molecular Immunology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States
| | - Philip M Murphy
- Molecular Signaling Section, Laboratory of Molecular Immunology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States
| | - Jessica L Whited
- Department of Orthopedic Surgery, Harvard Medical School, The Harvard Stem Cell Institute, Brigham and Women's Hospital, Boston, United States.,Broad Institute of MIT and Harvard, Cambridge, United States.,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States
| | - Bess Miller
- Department of Orthopedic Surgery, Harvard Medical School, The Harvard Stem Cell Institute, Brigham and Women's Hospital, Boston, United States.,Broad Institute of MIT and Harvard, Cambridge, United States
| | - Jason Brenchley
- Barrier Immunity Section, Lab of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Cambridge, United States
| | - Stephan P Rosshart
- Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, United States
| | - Barbara Rehermann
- Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, United States
| | - John Doorbar
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | | | - Olga Pletnikova
- Department of Pathology (Neuropathology), Johns Hopkins University School of Medicine, Baltimore, United States
| | - Juan C Troncoso
- Department of Pathology (Neuropathology), Johns Hopkins University School of Medicine, Baltimore, United States
| | - Susan M Resnick
- Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, United States
| | - Ben Bolduc
- Department of Microbiology, Ohio State University, Columbus, United States
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, United States.,Civil Environmental and Geodetic Engineering, Ohio State University, Columbus, United States
| | - Arvind Varsani
- The Biodesign Center of Fundamental and Applied Microbiomics, School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, United States.,Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Rondebosch, South Africa
| | - Anca M Segall
- Viral Information Institute and Department of Biology, San Diego State University, San Diego, United States
| | - Christopher B Buck
- Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States
| |
Collapse
|
34
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
35
|
Benler S, Hung SH, Vander Griend JA, Peters GA, Rohwer F, Segall AM. Gp4 is a nuclease required for morphogenesis of T4-like bacteriophages. Virology 2020; 543:7-12. [PMID: 32056848 DOI: 10.1016/j.virol.2020.01.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 01/15/2020] [Accepted: 01/15/2020] [Indexed: 11/26/2022]
Abstract
An essential step in the morphogenesis of tailed bacteriophages is the joining of heads and tails to form infectious virions. Our understanding of the maturation of complete virus particles remains incomplete. Through an unknown mechanism, phage T4 gene product 4 (gp4) plays an essential role in the head-tail joining step of T4-like phages. Alignment of T4 gp4 homologs identified a type II restriction endonuclease motif. Purified gp4 from both T4 and a marine T4-like bacteriophage, YC, have non-specific nuclease activity in vitro. Mutation of a single conserved amino acid residue in the endonuclease fold of T4 and YC gp4 abrogates nuclease activity. When expressed in trans, the wild type T4 gp4, but neither the mutated T4 protein nor the YC homolog, rescues a T4 gene 4 amber mutant phage. Thus the nuclease activity appears essential for morphogenesis, potentially by cleaving packaged DNA to enable the joining of heads to tails.
Collapse
Affiliation(s)
- Sean Benler
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA.
| | - Shr-Hau Hung
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Jacob A Vander Griend
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Gregory A Peters
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Forest Rohwer
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Anca M Segall
- Department of Biology and Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA.
| |
Collapse
|
36
|
Shkoporov AN, Hill C. Bacteriophages of the Human Gut: The "Known Unknown" of the Microbiome. Cell Host Microbe 2019; 25:195-209. [PMID: 30763534 DOI: 10.1016/j.chom.2019.01.017] [Citation(s) in RCA: 348] [Impact Index Per Article: 69.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The human gut microbiome is a dense and taxonomically diverse consortium of microorganisms. While the bacterial components of the microbiome have received considerable attention, comparatively little is known about the composition and physiological significance of human gut-associated bacteriophage populations (phageome). By extrapolating our knowledge of phage-host interactions from other environments, one could expect that >1012 viruses reside in the human gut, and we can predict that they play important roles in regulating the complex microbial networks operating in this habitat. Before delving into their function, we need to first overcome the challenges associated with studying and characterizing the phageome. In this Review, we summarize the available methods and main findings regarding taxonomic composition, community structure, and population dynamics in the human gut phageome. We also discuss the main challenges in the field and identify promising avenues for future research.
Collapse
Affiliation(s)
- Andrey N Shkoporov
- APC Microbiome Ireland & School of Microbiology, University College Cork, Co. Cork, Ireland.
| | - Colin Hill
- APC Microbiome Ireland & School of Microbiology, University College Cork, Co. Cork, Ireland
| |
Collapse
|
37
|
Arif M, Ali F, Ahmad S, Kabir M, Ali Z, Hayat M. Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. Genomics 2019; 112:1565-1574. [PMID: 31526842 DOI: 10.1016/j.ygeno.2019.09.006] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/27/2019] [Accepted: 09/11/2019] [Indexed: 10/26/2022]
Abstract
Bacteriophage virion proteins (BVPs) are bacterial viruses that have a great impact on different biological functions of bacteria. They are significantly used in genetic engineering and phage therapy applications. Correct identification of BVP through conventional pathogen methods are slow and expensive. Thus, designing a Bioinformatics predictor is urgently desirable to accelerate correct identification of BVPs within a huge volume of proteins. However, available prediction tools performance is inadequate due to the lack of useful feature representation and severe imbalance issue. In the present study, we propose an intelligent model, called Pred-BVP-Unb for discrimination of BVPs that employed three nominal sequences-driven descriptors, i.e. Bi-PSSM evolutionary information, composition & translation, and split amino acid composition. The imbalance phenomena between classes were coped with the help of a synthetic minority oversampling technique. The essential attributes are selected by a robust algorithm called recursive feature elimination. Finally, the optimal feature space is provided to support vector machine classifier using a radial base kernel in order to train the model. Our predictor remarkably outperforms than existing approaches in the literature by achieving the highest accuracy of 92.54% and 83.06% respectively on the benchmark and independent datasets. We expect that Pred-BVP-Unb tool can provide useful hints for designing antibacterial drugs and also helpful to expedite large scale discovery of new bacteriophage virion proteins. The source code and all datasets are publicly available at https://github.com/Muhammad-Arif-NUST/BVP_Pred_Unb.
Collapse
Affiliation(s)
- Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| | - Saeed Ahmad
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Muhammad Kabir
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zakir Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
| |
Collapse
|
38
|
Takahashi MB, Coelho de Oliveira H, Fernández Núñez EG, Rocha JC. Brewing process optimization by artificial neural network and evolutionary algorithm approach. J FOOD PROCESS ENG 2019. [DOI: 10.1111/jfpe.13103] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Maria Beatriz Takahashi
- Departamento de Ciências BiológicasUniversidade Estadual Paulista‐UNESP/Assis Assis São Paulo Brazil
| | | | - Eutimio Gustavo Fernández Núñez
- Centro de Ciências Naturais e Humanas (CCNH)Universidade Federal do ABC Santo André São Paulo Brazil
- Escola de Artes, Ciências e Humanidades (EACH)Universidade de São Paulo São Paulo São Paulo Brazil
| | - José Celso Rocha
- Departamento de Ciências BiológicasUniversidade Estadual Paulista‐UNESP/Assis Assis São Paulo Brazil
| |
Collapse
|
39
|
Ru X, Li L, Wang C. Identification of Phage Viral Proteins With Hybrid Sequence Features. Front Microbiol 2019; 10:507. [PMID: 30972038 PMCID: PMC6443926 DOI: 10.3389/fmicb.2019.00507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Accepted: 02/27/2019] [Indexed: 02/01/2023] Open
Abstract
The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.
Collapse
Affiliation(s)
- Xiaoqing Ru
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
40
|
Yang L, Gao H, Liu Z, Tang L. Identification of Phage Virion Proteins by Using the g-gap Tripeptide Composition. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180910112813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Phages are widely distributed in locations populated by bacterial hosts. Phage proteins can be divided into two main categories, that is, virion and non-virion proteins with different functions. In practice, people mainly use phage virion proteins to clarify the lysis mechanism of bacterial cells and develop new antibacterial drugs. Accurate identification of phage virion proteins is therefore essential to understanding the phage lysis mechanism. Although some computational methods have been focused on identifying virion proteins, the result is not satisfying which gives more room for improvement. In this study, a new sequence-based method was proposed to identify phage virion proteins using g-gap tripeptide composition. In this approach, the protein features were firstly extracted from the ggap tripeptide composition. Subsequently, we obtained an optimal feature subset by performing incremental feature selection (IFS) with information gain. Finally, the support vector machine (SVM) was used as the classifier to discriminate virion proteins from non-virion proteins. In 10-fold crossvalidation test, our proposed method achieved an accuracy of 97.40% with AUC of 0.9958, which outperforms state-of-the-art methods. The result reveals that our proposed method could be a promising method in the work of phage virion proteins identification.
Collapse
Affiliation(s)
- Liangwei Yang
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhen Liu
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
41
|
Pan Y, Gao H, Lin H, Liu Z, Tang L, Li S. Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree. Int J Mol Sci 2018; 19:E1779. [PMID: 29914091 PMCID: PMC6032154 DOI: 10.3390/ijms19061779] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 06/12/2018] [Accepted: 06/12/2018] [Indexed: 01/29/2023] Open
Abstract
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.
Collapse
Affiliation(s)
- Yanyuan Pan
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen Liu
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Songtao Li
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
42
|
Manavalan B, Shin TH, Lee G. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine. Front Microbiol 2018; 9:476. [PMID: 29616000 PMCID: PMC5864850 DOI: 10.3389/fmicb.2018.00476] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 02/28/2018] [Indexed: 12/29/2022] Open
Abstract
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
Collapse
Affiliation(s)
| | - Tae H Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea
| |
Collapse
|
43
|
McNair K, Aziz RK, Pusch GD, Overbeek R, Dutilh BE, Edwards R. Phage Genome Annotation Using the RAST Pipeline. Methods Mol Biol 2018; 1681:231-238. [PMID: 29134599 DOI: 10.1007/978-1-4939-7343-9_17] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Phages are complex biomolecular machineries that have to survive in a bacterial world. Phage genomes show many adaptations to their lifestyle such as shorter genes, reduced capacity for redundant DNA sequences, and the inclusion of tRNAs in their genomes. In addition, phages are not free-living, they require a host for replication and survival. These unique adaptations provide challenges for the bioinformatics analysis of phage genomes. In particular, ORF calling, genome annotation, noncoding RNA (ncRNA) identification, and the identification of transposons and insertions are all complicated in phage genome analysis. We provide a road map through the phage genome annotation pipeline, and discuss the challenges and solutions for phage genome annotation as we have implemented in the rapid annotation using subsystems (RAST) pipeline.
Collapse
Affiliation(s)
- Katelyn McNair
- Computational Sciences Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Ramy Karam Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, 11562, Egypt.,Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Gordon D Pusch
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Ross Overbeek
- Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL, 60439, USA
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584, Utrecht, The Netherlands.,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525, Nijmegen, The Netherlands
| | - Robert Edwards
- Computational Sciences Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA. .,Departments of Biology and Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA.
| |
Collapse
|
44
|
Shatabda S, Saha S, Sharma A, Dehzangi A. iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features. J Theor Biol 2017; 435:229-237. [DOI: 10.1016/j.jtbi.2017.09.022] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 09/18/2017] [Accepted: 09/20/2017] [Indexed: 10/18/2022]
|
45
|
A Method Based on Artificial Intelligence To Fully Automatize The Evaluation of Bovine Blastocyst Images. Sci Rep 2017; 7:7659. [PMID: 28794478 PMCID: PMC5550425 DOI: 10.1038/s41598-017-08104-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 07/06/2017] [Indexed: 11/28/2022] Open
Abstract
Morphological analysis is the standard method of assessing embryo quality; however, its inherent subjectivity tends to generate discrepancies among evaluators. Using genetic algorithms and artificial neural networks (ANNs), we developed a new method for embryo analysis that is more robust and reliable than standard methods. Bovine blastocysts produced in vitro were classified as grade 1 (excellent or good), 2 (fair), or 3 (poor) by three experienced embryologists according to the International Embryo Technology Society (IETS) standard. The images (n = 482) were subjected to automatic feature extraction, and the results were used as input for a supervised learning process. One part of the dataset (15%) was used for a blind test posterior to the fitting, for which the system had an accuracy of 76.4%. Interestingly, when the same embryologists evaluated a sub-sample (10%) of the dataset, there was only 54.0% agreement with the standard (mode for grades). However, when using the ANN to assess this sub-sample, there was 87.5% agreement with the modal values obtained by the evaluators. The presented methodology is covered by National Institute of Industrial Property (INPI) and World Intellectual Property Organization (WIPO) patents and is currently undergoing a commercial evaluation of its feasibility.
Collapse
|
46
|
Krishnamurthy SR, Wang D. Origins and challenges of viral dark matter. Virus Res 2017; 239:136-142. [DOI: 10.1016/j.virusres.2017.02.002] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/31/2017] [Accepted: 02/06/2017] [Indexed: 02/07/2023]
|
47
|
Nkili-Meyong AA, Bigarré L, Labouba I, Vallaeys T, Avarre JC, Berthet N. Contribution of Next-Generation Sequencing to Aquatic and Fish Virology. Intervirology 2017; 59:285-300. [PMID: 28668959 DOI: 10.1159/000477808] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 05/27/2017] [Indexed: 12/13/2022] Open
Abstract
The recent technological advances in nucleic acid sequencing, called next-generation sequencing (NGS), have revolutionized the field of genomics and have also influenced viral research. Aquatic viruses, and especially those infecting fish, have also greatly benefited from NGS technologies, which provide a huge amount of molecular information at a low cost in a relatively short period of time. Here, we review the use of the current high-throughput sequencing platforms with a special focus on the associated challenges (regarding sample preparation and bioinformatics) in their applications to the field of aquatic virology, especially for: (i) discovering novel viruses that may be associated with fish mortalities, (ii) elucidating the mechanisms of pathogenesis, and finally (iii) studying the molecular epidemiology of these pathogens.
Collapse
Affiliation(s)
- Andriniaina Andy Nkili-Meyong
- Département Zoonoses et Maladies Emergentes, Centre International de Recherches Médicales de Franceville (CIRMF), Franceville, Gabon
| | | | | | | | | | | |
Collapse
|
48
|
Martinez-Hernandez F, Fornas O, Lluesma Gomez M, Bolduc B, de la Cruz Peña MJ, Martínez JM, Anton J, Gasol JM, Rosselli R, Rodriguez-Valera F, Sullivan MB, Acinas SG, Martinez-Garcia M. Single-virus genomics reveals hidden cosmopolitan and abundant viruses. Nat Commun 2017; 8:15892. [PMID: 28643787 PMCID: PMC5490008 DOI: 10.1038/ncomms15892] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 05/10/2017] [Indexed: 12/22/2022] Open
Abstract
Microbes drive ecosystems under constraints imposed by viruses. However, a lack of virus genome information hinders our ability to answer fundamental, biological questions concerning microbial communities. Here we apply single-virus genomics (SVGs) to assess whether portions of marine viral communities are missed by current techniques. The majority of the here-identified 44 viral single-amplified genomes (vSAGs) are more abundant in global ocean virome data sets than published metagenome-assembled viral genomes or isolates. This indicates that vSAGs likely best represent the dsDNA viral populations dominating the oceans. Species-specific recruitment patterns and virome simulation data suggest that vSAGs are highly microdiverse and that microdiversity hinders the metagenomic assembly, which could explain why their genomes have not been identified before. Altogether, SVGs enable the discovery of some of the likely most abundant and ecologically relevant marine viral species, such as vSAG 37-F6, which were overlooked by other methodologies.
Collapse
Affiliation(s)
- Francisco Martinez-Hernandez
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Carretera San Vicente del Raspeig, San Vicente del Raspeig, Alicante 03690, Spain
| | - Oscar Fornas
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Carrer del Doctor Aiguader, 88, PRBB Building, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Carrer del Doctor Aiguader, 88, PRBB Building, Barcelona 08003, Spain
| | - Monica Lluesma Gomez
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Carretera San Vicente del Raspeig, San Vicente del Raspeig, Alicante 03690, Spain
| | - Benjamin Bolduc
- Department of Microbiology, The Ohio State University, 105 Biological Sciences Building, 484 West 12th Avenue Columbus, Ohio 43210, USA
| | - Maria Jose de la Cruz Peña
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Carretera San Vicente del Raspeig, San Vicente del Raspeig, Alicante 03690, Spain
| | - Joaquín Martínez Martínez
- Bigelow Laboratory for Ocean Sciences, 60 Bigelow Drive, PO Box 380, East Boothbay, Maine 04544, USA
| | - Josefa Anton
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Carretera San Vicente del Raspeig, San Vicente del Raspeig, Alicante 03690, Spain
| | - Josep M. Gasol
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Passeig Marítim, 47, Barcelona 08003, Spain
| | - Riccardo Rosselli
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Campus San Juan, San Juan, Alicante 03550, Spain
| | - Francisco Rodriguez-Valera
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Campus San Juan, San Juan, Alicante 03550, Spain
| | - Matthew B. Sullivan
- Department of Microbiology, The Ohio State University, 105 Biological Sciences Building, 484 West 12th Avenue Columbus, Ohio 43210, USA
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, The Ohio State University, 105 Biological Sciences Building, 484 West 12th Avenue Columbus, Ohio 43210, USA
| | - Silvia G. Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Passeig Marítim, 47, Barcelona 08003, Spain
| | - Manuel Martinez-Garcia
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Carretera San Vicente del Raspeig, San Vicente del Raspeig, Alicante 03690, Spain
| |
Collapse
|
49
|
|
50
|
Galiez C, Magnan CN, Coste F, Baldi P. VIRALpro: a tool to identify viral capsid and tail sequences. Bioinformatics 2016; 32:1405-7. [PMID: 26733451 PMCID: PMC5860506 DOI: 10.1093/bioinformatics/btv727] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 11/24/2015] [Accepted: 12/07/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Not only sequence data continue to outpace annotation information, but also the problem is further exacerbated when organisms are underrepresented in the annotation databases. This is the case with non-human-pathogenic viruses which occur frequently in metagenomic projects. Thus, there is a need for tools capable of detecting and classifying viral sequences. RESULTS We describe VIRALpro a new effective tool for identifying capsid and tail protein sequences, which are the cornerstones toward viral sequence annotation and viral genome classification. AVAILABILITY AND IMPLEMENTATION The data, software and corresponding web server are available from http://scratch.proteomics.ics.uci.edu as part of the SCRATCH suite. CONTACT clovis.galiez@inria.fr or pfbaldi@uci.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clovis Galiez
- INRIA, Campus De Beaulieu, Rennes Cedex, 35042, France
| | - Christophe N Magnan
- Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Pierre Baldi
- Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|