1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Phumiphanjarphak W, Aiewsakun P. Entourage: all-in-one sequence analysis software for genome assembly, virus detection, virus discovery, and intrasample variation profiling. BMC Bioinformatics 2024; 25:222. [PMID: 38914932 PMCID: PMC11197340 DOI: 10.1186/s12859-024-05846-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
BACKGROUND Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. RESULTS Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. CONCLUSIONS Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.
Collapse
Affiliation(s)
- Worakorn Phumiphanjarphak
- Department of Microbiology, Faculty of Science, Mahidol University, Ratchathewi District, 272 Rama VI Road, Bangkok, 10400, Thailand
- Pornchai Matangkasombut Center for Microbial Genomics, Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Pakorn Aiewsakun
- Department of Microbiology, Faculty of Science, Mahidol University, Ratchathewi District, 272 Rama VI Road, Bangkok, 10400, Thailand.
- Pornchai Matangkasombut Center for Microbial Genomics, Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
3
|
Kadlečková D, Saláková M, Erban T, Tachezy R. Discovery and characterization of novel DNA viruses in Apis mellifera: expanding the honey bee virome through metagenomic analysis. mSystems 2024; 9:e0008824. [PMID: 38441971 PMCID: PMC11019937 DOI: 10.1128/msystems.00088-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 02/09/2024] [Indexed: 03/07/2024] Open
Abstract
To date, many viruses have been discovered to infect honey bees. In this study, we used high-throughput sequencing to expand the known virome of the honey bee, Apis mellifera, by identifying several novel DNA viruses. While the majority of previously identified bee viruses are RNA, our study reveals nine new genomes from the Parvoviridae family, tentatively named Bee densoviruses 1 to 9. In addition, we characterized a large DNA virus, Apis mellifera filamentous-like virus (AmFLV), which shares limited protein identities with the known Apis mellifera filamentous virus. The complete sequence of AmFLV, obtained by a combination of laboratory techniques and bioinformatics, spans 152,678 bp. Linear dsDNA genome encodes for 112 proteins, of which 49 are annotated. Another large virus we discovered is Apis mellifera nudivirus, which belongs to a group of Alphanudivirus. The virus has a length of 129,467 bp and a circular dsDNA genome, and has 106 protein encoding genes. The virus contains most of the core genes of the family Nudiviridae. This research demonstrates the effectiveness of viral binning in identifying viruses in honey bee virology, showcasing its initial application in this field.IMPORTANCEHoney bees contribute significantly to food security by providing pollination services. Understanding the virome of honey bees is crucial for the health and conservation of bee populations and also for the stability of the ecosystems and economies for which they are indispensable. This study unveils previously unknown DNA viruses in the honey bee virome, expanding our knowledge of potential threats to bee health. The use of the viral binning approach we employed in this study offers a promising method to uncovering and understanding the vast viral diversity in these essential pollinators.
Collapse
Affiliation(s)
- Dominika Kadlečková
- Department of Genetics and Microbiology, Faculty of Science BIOCEV, Charles University, Vestec, Průmyslová, Czechia
| | - Martina Saláková
- Department of Genetics and Microbiology, Faculty of Science BIOCEV, Charles University, Vestec, Průmyslová, Czechia
| | - Tomáš Erban
- Crop Research Institute, Drnovská, Prague, Czechia
| | - Ruth Tachezy
- Department of Genetics and Microbiology, Faculty of Science BIOCEV, Charles University, Vestec, Průmyslová, Czechia
| |
Collapse
|
4
|
Zolfo M, Silverj A, Blanco-Míguez A, Manghi P, Rota-Stabelli O, Heidrich V, Jensen J, Maharjan S, Franzosa E, Menni C, Visconti A, Pinto F, Ciciani M, Huttenhower C, Cereseto A, Asnicar F, Kitano H, Yamada T, Segata N. Discovering and exploring the hidden diversity of human gut viruses using highly enriched virome samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.19.580813. [PMID: 38464031 PMCID: PMC10925137 DOI: 10.1101/2024.02.19.580813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Viruses are an abundant and crucial component of the human microbiome, but accurately discovering them via metagenomics is still challenging. Currently, the available viral reference genomes poorly represent the diversity in microbiome samples, and expanding such a set of viral references is difficult. As a result, many viruses are still undetectable through metagenomics even when considering the power of de novo metagenomic assembly and binning, as viruses lack universal markers. Here, we describe a novel approach to catalog new viral members of the human gut microbiome and show how the resulting resource improves metagenomic analyses. We retrieved >3,000 viral-like particles (VLP) enriched metagenomic samples (viromes), evaluated the efficiency of the enrichment in each sample to leverage the viromes of highest purity, and applied multiple analysis steps involving assembly and comparison with hundreds of thousands of metagenome-assembled genomes to discover new viral genomes. We reported over 162,000 viral sequences passing quality control from thousands of gut metagenomes and viromes. The great majority of the retrieved viral sequences (~94.4%) were of unknown origin, most had a CRISPR spacer matching host bacteria, and four of them could be detected in >50% of a set of 18,756 gut metagenomes we surveyed. We included the obtained collection of sequences in a new MetaPhlAn 4.1 release, which can quantify reads within a metagenome matching the known and newly uncovered viral diversity. Additionally, we released the viral database for further virome and metagenomic studies of the human microbiome.
Collapse
Affiliation(s)
- Moreno Zolfo
- Department CIBIO, University of Trento, Italy
- Integrated Open Systems Unit, Okinawa Institute of Science and Technology (OIST), Okinawa, Japan
| | - Andrea Silverj
- Department CIBIO, University of Trento, Italy
- Center Agriculture Food Environment (C3A), University of Trento, Italy
- Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
| | | | | | - Omar Rota-Stabelli
- Department CIBIO, University of Trento, Italy
- Center Agriculture Food Environment (C3A), University of Trento, Italy
- Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
| | | | - Jordan Jensen
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sagun Maharjan
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric Franzosa
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Cristina Menni
- Department of Twin Research & Genetic Epidemiology, King’s College London, London, UK
| | - Alessia Visconti
- Center for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
| | | | | | - Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | | | - Hiroaki Kitano
- Integrated Open Systems Unit, Okinawa Institute of Science and Technology (OIST), Okinawa, Japan
- The Systems Biology Institute (SBI), Tokyo, Japan
- IOM Bioworks Pvt. Ltd., Centre for Cellular and Molecular Platforms (C-CAMP), GKVK Post, Bellary Rd, Bengaluru, Karnataka-560065, India
| | - Takuji Yamada
- Integrated Open Systems Unit, Okinawa Institute of Science and Technology (OIST), Okinawa, Japan
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
- Metagen, Inc., Yamagata, Japan
- Metagen Therapeutics, Inc., Yamagata, Japan
- digzyme, Inc., Tokyo, Japan
| | - Nicola Segata
- Department CIBIO, University of Trento, Italy
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| |
Collapse
|
5
|
Yin H, Wu S, Tan J, Guo Q, Li M, Guo J, Wang Y, Jiang X, Zhu H. IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning. Gigascience 2024; 13:giae018. [PMID: 38649300 PMCID: PMC11034026 DOI: 10.1093/gigascience/giae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. FINDINGS We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. CONCLUSIONS IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
Collapse
Affiliation(s)
- Hengchuang Yin
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Jie Tan
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Mo Li
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Jinyuan Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Yaqi Wang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
6
|
Hsieh SY, Savva GM, Telatin A, Tiwari SK, Tariq MA, Newberry F, Seton KA, Booth C, Bansal AS, Wileman T, Adriaenssens EM, Carding SR. Investigating the Human Intestinal DNA Virome and Predicting Disease-Associated Virus-Host Interactions in Severe Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). Int J Mol Sci 2023; 24:17267. [PMID: 38139096 PMCID: PMC10744171 DOI: 10.3390/ijms242417267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 11/27/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open
Abstract
Understanding how the human virome, and which of its constituents, contributes to health or disease states is reliant on obtaining comprehensive virome profiles. By combining DNA viromes from isolated virus-like particles (VLPs) and whole metagenomes from the same faecal sample of a small cohort of healthy individuals and patients with severe myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), we have obtained a more inclusive profile of the human intestinal DNA virome. Key features are the identification of a core virome comprising tailed phages of the class Caudoviricetes, and a greater diversity of DNA viruses including extracellular phages and integrated prophages. Using an in silico approach, we predicted interactions between members of the Anaerotruncus genus and unique viruses present in ME/CFS microbiomes. This study therefore provides a framework and rationale for studies of larger cohorts of patients to further investigate disease-associated interactions between the intestinal virome and the bacteriome.
Collapse
Affiliation(s)
- Shen-Yuan Hsieh
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - George M. Savva
- Core Science Resources, Quadram Institute Bioscience, Norwich NR4 7UQ, UK; (G.M.S.); (C.B.)
| | - Andrea Telatin
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Sumeet K. Tiwari
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Mohammad A. Tariq
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Fiona Newberry
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Katharine A. Seton
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Catherine Booth
- Core Science Resources, Quadram Institute Bioscience, Norwich NR4 7UQ, UK; (G.M.S.); (C.B.)
| | | | - Thomas Wileman
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
- Norwich Medical School, University of East Anglia, Norwich NR4 7TJ, UK
| | - Evelien M. Adriaenssens
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
| | - Simon R. Carding
- Food, Microbiome, and Health Research Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK; (S.-Y.H.); (A.T.); (S.K.T.); (M.A.T.); (F.N.); (K.A.S.); (T.W.)
- Norwich Medical School, University of East Anglia, Norwich NR4 7TJ, UK
| |
Collapse
|