1
|
Dantas CWD, Martins DT, Nogueira WG, Alegria OVC, Ramos RTJ. Tools and methodology to in silico phage discovery in freshwater environments. Front Microbiol 2024; 15:1390726. [PMID: 38881659 PMCID: PMC11176557 DOI: 10.3389/fmicb.2024.1390726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/16/2024] [Indexed: 06/18/2024] Open
Abstract
Freshwater availability is essential, and its maintenance has become an enormous challenge. Due to population growth and climate changes, freshwater sources are becoming scarce, imposing the need for strategies for its reuse. Currently, the constant discharge of waste into water bodies from human activities leads to the dissemination of pathogenic bacteria, negatively impacting water quality from the source to the infrastructure required for treatment, such as the accumulation of biofilms. Current water treatment methods cannot keep pace with bacterial evolution, which increasingly exhibits a profile of multidrug resistance to antibiotics. Furthermore, using more powerful disinfectants may affect the balance of aquatic ecosystems. Therefore, there is a need to explore sustainable ways to control the spreading of pathogenic bacteria. Bacteriophages can infect bacteria and archaea, hijacking their host machinery to favor their replication. They are widely abundant globally and provide a biological alternative to bacterial treatment with antibiotics. In contrast to common disinfectants and antibiotics, bacteriophages are highly specific, minimizing adverse effects on aquatic microbial communities and offering a lower cost-benefit ratio in production compared to antibiotics. However, due to the difficulty involving cultivating and identifying environmental bacteriophages, alternative approaches using NGS metagenomics in combination with some bioinformatic tools can help identify new bacteriophages that can be useful as an alternative treatment against resistant bacteria. In this review, we discuss advances in exploring the virome of freshwater, as well as current applications of bacteriophages in freshwater treatment, along with current challenges and future perspectives.
Collapse
Affiliation(s)
- Carlos Willian Dias Dantas
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - David Tavares Martins
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Wylerson Guimarães Nogueira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Oscar Victor Cardenas Alegria
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
2
|
Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024; 16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.
Collapse
Affiliation(s)
- Qinzhong Tian
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Pinglu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yixiao Zhai
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| |
Collapse
|
3
|
Wu Y, Gao N, Sun C, Feng T, Liu Q, Chen WH. A compendium of ruminant gastrointestinal phage genomes revealed a higher proportion of lytic phages than in any other environments. MICROBIOME 2024; 12:69. [PMID: 38576042 PMCID: PMC10993611 DOI: 10.1186/s40168-024-01784-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
BACKGROUND Ruminants are important livestock animals that have a unique digestive system comprising multiple stomach compartments. Despite significant progress in the study of microbiome in the gastrointestinal tract (GIT) sites of ruminants, we still lack an understanding of the viral community of ruminants. Here, we surveyed its viral ecology using 2333 samples from 10 sites along the GIT of 8 ruminant species. RESULTS We present the Unified Ruminant Phage Catalogue (URPC), a comprehensive survey of phages in the GITs of ruminants including 64,922 non-redundant phage genomes. We characterized the distributions of the phage genomes in different ruminants and GIT sites and found that most phages were organism-specific. We revealed that ~ 60% of the ruminant phages were lytic, which was the highest as compared with those in all other environments and certainly will facilitate their applications in microbial interventions. To further facilitate the future applications of the phages, we also constructed a comprehensive virus-bacteria/archaea interaction network and identified dozens of phages that may have lytic effects on methanogenic archaea. CONCLUSIONS The URPC dataset represents a useful resource for future microbial interventions to improve ruminant production and ecological environmental qualities. Phages have great potential for controlling pathogenic bacterial/archaeal species and reducing methane emissions. Our findings provide insights into the virome ecology research of the ruminant GIT and offer a starting point for future research on phage therapy in ruminants. Video Abstract.
Collapse
Affiliation(s)
- Yingjian Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Na Gao
- Department of Laboratory Medicine, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, 430071, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Tong Feng
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Qingyou Liu
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China.
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China.
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-Imaging, Center for Artificial Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China.
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai, 264003, China.
| |
Collapse
|
4
|
Chen J, Sun C, Dong Y, Jin M, Lai S, Jia L, Zhao X, Wang H, Gao NL, Bork P, Liu Z, Chen W, Zhao X. Efficient Recovery of Complete Gut Viral Genomes by Combined Short- and Long-Read Sequencing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305818. [PMID: 38240578 PMCID: PMC10987132 DOI: 10.1002/advs.202305818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/01/2023] [Indexed: 04/04/2024]
Abstract
Current metagenome assembled human gut phage catalogs contained mostly fragmented genomes. Here, comprehensive gut virome detection procedure is developed involving virus-like particle (VLP) enrichment from ≈500 g feces and combined sequencing of short- and long-read. Applied to 135 samples, a Chinese Gut Virome Catalog (CHGV) is assembled consisting of 21,499 non-redundant viral operational taxonomic units (vOTUs) that are significantly longer than those obtained by short-read sequencing and contained ≈35% (7675) complete genomes, which is ≈nine times more than those in the Gut Virome Database (GVD, ≈4%, 1,443). Interestingly, the majority (≈60%, 13,356) of the CHGV vOTUs are obtained by either long-read or hybrid assemblies, with little overlap with those assembled from only the short-read data. With this dataset, vast diversity of the gut virome is elucidated, including the identification of 32% (6,962) novel vOTUs compare to public gut virome databases, dozens of phages that are more prevalent than the crAssphages and/or Gubaphages, and several viral clades that are more diverse than the two. Finally, the functional capacities are also characterized of the CHGV encoded proteins and constructed a viral-host interaction network to facilitate future research and applications.
Collapse
Affiliation(s)
- Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Yanqi Dong
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Menglu Jin
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Senying Lai
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Longhao Jia
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Xueyang Zhao
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Na L. Gao
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- Department of Laboratory MedicineZhongnan Hospital of Wuhan UniversityWuhan UniversityWuhan430071China
| | - Peer Bork
- European Molecular Biology LaboratoryStructural and Computational Biology Unit69117HeidelbergGermany
- Max Delbrück Centre for Molecular Medicine13125BerlinGermany
- Yonsei Frontier Lab (YFL)Yonsei University03722SeoulSouth Korea
- Department of BioinformaticsBiocenterUniversity of Würzburg97070WürzburgGermany
| | - Zhi Liu
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and Technology430074WuhanChina
| | - Wei‐Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
- Institution of Medical Artificial IntelligenceBinzhou Medical UniversityYantai264003China
| | - Xing‐Ming Zhao
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
- MOE Key Laboratory of Computational Neuroscience and Brain‐Inspired Intelligenceand MOE Frontiers Center for Brain ScienceFudan UniversityShanghai200433China
- State Key Laboratory of Medical NeurobiologyInstitute of Brain ScienceFudan UniversityShanghai200433China
- International Human Phenome Institutes (Shanghai)Shanghai200433China
| |
Collapse
|
5
|
Parra B, Cockx B, Lutz VT, Brøndsted L, Smets BF, Dechesne A. Isolation and characterization of novel plasmid-dependent phages infecting bacteria carrying diverse conjugative plasmids. Microbiol Spectr 2024; 12:e0253723. [PMID: 38063386 PMCID: PMC10782986 DOI: 10.1128/spectrum.02537-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/12/2023] [Indexed: 12/23/2023] Open
Abstract
IMPORTANCE This work was undertaken because plasmid-dependent phages can reduce the prevalence of conjugative plasmids and can be leveraged to prevent the acquisition and dissemination of ARGs by bacteria. The two novel phages described in this study, Lu221 and Hi226, can infect Escherichia coli, Salmonella enterica, Kluyvera sp. and Enterobacter sp. carrying conjugative plasmids. This was verified with plasmids carrying resistance determinants and belonging to the most common plasmid families among Gram-negative pathogens. Therefore, the newly isolated phages could have the potential to help control the spread of ARGs and thus help combat the antimicrobial resistance crisis.
Collapse
Affiliation(s)
- Boris Parra
- Department of Environmental Engineering and Resource Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
- Laboratorio de Investigación de Agentes Antibacterianos, Departamento de Microbiología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción, Chile
- Instituto de Ciencias Naturales, Facultad de Medicina Veterinaria y Agronomía, Universidad de las Américas, Concepción, Chile
| | - Bastiaan Cockx
- Department of Environmental Engineering and Resource Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Veronika T. Lutz
- Department of Veterinary and Animal Sciences, University of Copenhagen, København, Denmark
| | - Lone Brøndsted
- Department of Veterinary and Animal Sciences, University of Copenhagen, København, Denmark
| | - Barth F. Smets
- Department of Environmental Engineering and Resource Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Arnaud Dechesne
- Department of Environmental Engineering and Resource Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
6
|
Ha AD, Aylward FO. Automated classification of giant virus genomes using a random forest model built on trademark protein families. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566645. [PMID: 38014039 PMCID: PMC10680617 DOI: 10.1101/2023.11.10.566645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Viruses of the phylum Nucleocytoviricota , often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG ( T axonomic Information of G iant viruses using T rademark O rthologous G roups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1,531 quality-checked, phylogenetically diverse Nucleocytoviricota genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% to the order level and 97.3% to the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% to the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.
Collapse
|
7
|
Cao Y, Feng T, Wu Y, Xu Y, Du L, Wang T, Luo Y, Wang Y, Li Z, Xuan Z, Chen S, Yao N, Gao NL, Xiao Q, Huang K, Wang X, Cui K, Rehman SU, Tang X, Liu D, Han H, Li Y, Chen WH, Liu Q. The multi-kingdom microbiome of the goat gastrointestinal tract. MICROBIOME 2023; 11:219. [PMID: 37779211 PMCID: PMC10544373 DOI: 10.1186/s40168-023-01651-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 08/14/2023] [Indexed: 10/03/2023]
Abstract
BACKGROUND Goat is an important livestock worldwide, which plays an indispensable role in human life by providing meat, milk, fiber, and pelts. Despite recent significant advances in microbiome studies, a comprehensive survey on the goat microbiomes covering gastrointestinal tract (GIT) sites, developmental stages, feeding styles, and geographical factors is still unavailable. Here, we surveyed its multi-kingdom microbial communities using 497 samples from ten sites along the goat GIT. RESULTS We reconstructed a goat multi-kingdom microbiome catalog (GMMC) including 4004 bacterial, 71 archaeal, and 7204 viral genomes and annotated over 4,817,256 non-redundant protein-coding genes. We revealed patterns of feeding-driven microbial community dynamics along the goat GIT sites which were likely associated with gastrointestinal food digestion and absorption capabilities and disease risks, and identified an abundance of large intestine-enriched genera involved in plant fiber digestion. We quantified the effects of various factors affecting the distribution and abundance of methane-producing microbes including the GIT site, age, feeding style, and geography, and identified 68 virulent viruses targeting the methane producers via a comprehensive virus-bacterium/archaea interaction network. CONCLUSIONS Together, our GMMC catalog provides functional insights of the goat GIT microbiota through microbiome-host interactions and paves the way to microbial interventions for better goat and eco-environmental qualities. Video Abstract.
Collapse
Affiliation(s)
- Yanhong Cao
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
- Guangxi Vocational University of Agriculture, Nanning, Guangxi, 530007, China
| | - Tong Feng
- Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China.
| | - Yingjian Wu
- Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Yixue Xu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Li Du
- Hainan Key Lab of Tropical Animal Reproduction and Breeding and Epidemic Disease Research, College of Animal Science and Technology, Hainan University, Haikou, 570000, Hainan, China
| | - Teng Wang
- Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Yuhong Luo
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Yan Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Zhipeng Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Zeyi Xuan
- Animal Husbandry Research Institute of Guangxi Zhuang Autonomous Region, Nanning, 530001, Guangxi, China
| | - Shaomei Chen
- Animal Husbandry Research Institute of Guangxi Zhuang Autonomous Region, Nanning, 530001, Guangxi, China
| | - Na Yao
- Animal Husbandry Research Institute of Guangxi Zhuang Autonomous Region, Nanning, 530001, Guangxi, China
| | - Na L Gao
- Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Qian Xiao
- Hainan Key Lab of Tropical Animal Reproduction and Breeding and Epidemic Disease Research, College of Animal Science and Technology, Hainan University, Haikou, 570000, Hainan, China
| | - Kongwei Huang
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
| | - Xiaobo Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Kuiqing Cui
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Saif Ur Rehman
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China
| | - Xiangfang Tang
- State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Dewu Liu
- South China Agricultural University, Guangzhou, 510642, China
| | - Hongbing Han
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ying Li
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
| | - Wei-Hua Chen
- Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai, 264003, China.
| | - Qingyou Liu
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China.
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning, 530005, China.
| |
Collapse
|
8
|
Gomes RAL, Zerbini FM. ConCreT, a 2D convolutional neural network for taxonomic classification applied to viruses in the phylum Cressdnaviricota. J Virol Methods 2023; 320:114789. [PMID: 37536450 DOI: 10.1016/j.jviromet.2023.114789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/19/2023] [Accepted: 07/31/2023] [Indexed: 08/05/2023]
Abstract
Taxonomic assignments allow scientists to communicate better with each other. In virology, taxonomy is continually improving towards a more precise and comprehensive framework. With the huge numbers of new viruses being described in metagenomic studies, automated taxonomy tools are urgently needed. A number of such tools have been proposed, and those applying machine learning (ML), mainly in the deep learning branch, stand out with accurate results. Still, there is a demand for tools that are less computationally intensive and that can classify viruses down to the ranks of genus and species. Cressdnaviruses are good subjects for testing such tools, due to their small, circular genomes and the existence of several families and genera with a highly imbalanced number of species. We developed a 2D convolutional neural network for virus taxonomy and tested it for classification of viruses from the phylum Cressdnaviricota. We obtained >98 % accuracy in the final pipeline tested, which we named ConCreT (Convolutional Neural Network for Cressdnavirus Taxonomy). The mixture of augmentation for more imbalanced groups with no augmentation for more balanced ones achieved the best score in the final test.
Collapse
Affiliation(s)
- Ruither A L Gomes
- Dep. de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil; National Institute for Science and Technology on Plant-Pest Interactions, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil
| | - F Murilo Zerbini
- Dep. de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil; National Institute for Science and Technology on Plant-Pest Interactions, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| |
Collapse
|
9
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:v15020587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
- Correspondence:
| |
Collapse
|
10
|
Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus. Appl Environ Microbiol 2023; 89:e0152222. [PMID: 36541780 PMCID: PMC9888279 DOI: 10.1128/aem.01522-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations.
Collapse
|