1
|
Argentel-Martínez L, Peñuelas-Rubio O, Herrera-Sepúlveda A, González-Aguilera J, Sudheer S, Salim LM, Lal S, Pradeep CK, Ortiz A, Sansinenea E, Hathurusinghe SHK, Shin JH, Babalola OO, Azizoglu U. Biotechnological advances in plant growth-promoting rhizobacteria for sustainable agriculture. World J Microbiol Biotechnol 2024; 41:21. [PMID: 39738995 DOI: 10.1007/s11274-024-04231-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Accepted: 12/13/2024] [Indexed: 01/02/2025]
Abstract
The rhizosphere, the soil zone surrounding plant roots, serves as a reservoir for numerous beneficial microorganisms that enhance plant productivity and crop yield, with substantial potential for application as biofertilizers. These microbes play critical roles in ecological processes such as nutrient recycling, organic matter decomposition, and mineralization. Plant growth-promoting rhizobacteria (PGPR) represent a promising tool for sustainable agriculture, enabling green management of crop health and growth, being eco-friendly alternatives to replace chemical fertilizers and pesticides. In this sense, biotechnological advancements respecting genomics and gene editing have been crucial to develop microbiome engineering which is pivotal in developing microbial consortia to improve crop production. Genome mining, which involves comprehensive analysis of the entire genome sequence data of PGPR, is crucial for identifying genes encoding valuable bacterial enzymes and metabolites. The CRISPR-Cas system, a cutting-edge genome-editing technology, has shown significant promise in beneficial microbial species. Advances in genetic engineering, particularly CRISPR-Cas, have markedly enhanced grain output, plant biomass, resistance to pests, and the sensory and nutritional quality of crops. There has been a great advance about the use of PGPR in important crops; however, there is a need to go further studying synthetic microbial communities, microbiome engineering, and gene editing approaches in field trials. This review focuses on future research directions involving several factors and topics around the use of PGPR putting special emphasis on biotechnological advances.
Collapse
Affiliation(s)
- Leandris Argentel-Martínez
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico.
| | - Ofelda Peñuelas-Rubio
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico
| | - Angélica Herrera-Sepúlveda
- Tecnológico Nacional de México/Instituto Tecnológico del Valle del Yaqui, CP: 85260, Bácum, Sonora, Mexico
| | - Jorge González-Aguilera
- Department of Agronomy, Universidad Estadual de Mato Grosso Do Sul (UEMS), Cassilândia, MS, 79540-000, Brazil
| | - Surya Sudheer
- Institute of Ecology and Earth Sciences, Department of Botany, University of Tartu, 51005, Tartu, Estonia
| | - Linu M Salim
- Faculty of Fisheries Engineering, Kerala University of Fisheries and Ocean Studies, Cochin, India
| | - Sunaina Lal
- Department of Biochemistry, Sikkim Manipal Institute of Medical Sciences, Gangtok, Sikkim, India
| | - Chittethu Kunjan Pradeep
- Microbiology Division, Jawaharlal Nehru Tropical Botanic Garden & Research Institute, Palode, Thiruvananthapuram, Kerala, 695562, India
| | - Aurelio Ortiz
- Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, C.P. 72570, Puebla, Puebla, México
| | - Estibaliz Sansinenea
- Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, C.P. 72570, Puebla, Puebla, México
| | | | - Jae-Ho Shin
- School of Applied Biosciences, College of Agriculture and Life Sciences, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Olubukola Oluranti Babalola
- Food Security and Safety Focus Area, Faculty of Natural and Agricultural Sciences, North-West University, Private Bag X2046, Mmabatho, 2735, South Africa
| | - Ugur Azizoglu
- Department of Crop and Animal Production, Safiye Cikrikcioglu Vocational College, Kayseri University, Kayseri, Türkiye.
- Genome and Stem Cell Research Center, Erciyes University, Kayseri, Türkiye.
| |
Collapse
|
2
|
Ribeiro S, Chaumet G, Alves K, Nourikyan J, Shi L, Lavergne JP, Mijakovic I, de Bernard S, Buffat L. BacSPaD: A Robust Bacterial Strains' Pathogenicity Resource Based on Integrated and Curated Genomic Metadata. Pathogens 2024; 13:672. [PMID: 39204272 PMCID: PMC11357117 DOI: 10.3390/pathogens13080672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/06/2024] [Accepted: 08/07/2024] [Indexed: 09/03/2024] Open
Abstract
The vast array of omics data in microbiology presents significant opportunities for studying bacterial pathogenesis and creating computational tools for predicting pathogenic potential. However, the field lacks a comprehensive, curated resource that catalogs bacterial strains and their ability to cause human infections. Current methods for identifying pathogenicity determinants often introduce biases and miss critical aspects of bacterial pathogenesis. In response to this gap, we introduce BacSPaD (Bacterial Strains' Pathogenicity Database), a thoroughly curated database focusing on pathogenicity annotations for a wide range of high-quality, complete bacterial genomes. Our rule-based annotation workflow combines metadata from trusted sources with automated keyword matching, extensive manual curation, and detailed literature review. Our analysis classified 5502 genomes as pathogenic to humans (HP) and 490 as non-pathogenic to humans (NHP), encompassing 532 species, 193 genera, and 96 families. Statistical analysis demonstrated a significant but moderate correlation between virulence factors and HP classification, highlighting the complexity of bacterial pathogenicity and the need for ongoing research. This resource is poised to enhance our understanding of bacterial pathogenicity mechanisms and aid in the development of predictive models. To improve accessibility and provide key visualization statistics, we developed a user-friendly web interface.
Collapse
Affiliation(s)
- Sara Ribeiro
- AltraBio SAS, 69007 Lyon, France (L.B.)
- Bases Moléculaires et Structurales des Systèmes Infectieux, IBCP, Université Lyon 1, CNRS, UMR 5086, 69007 Lyon, France
| | | | | | | | - Lei Shi
- Division of Systems and Synthetic Biology, Department of Life Sciences, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Jean-Pierre Lavergne
- Bases Moléculaires et Structurales des Systèmes Infectieux, IBCP, Université Lyon 1, CNRS, UMR 5086, 69007 Lyon, France
| | - Ivan Mijakovic
- Division of Systems and Synthetic Biology, Department of Life Sciences, Chalmers University of Technology, 412 96 Göteborg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
| | | | | |
Collapse
|
3
|
Zhang X, Zhang D, Zhang X, Zhang X. Artificial intelligence applications in the diagnosis and treatment of bacterial infections. Front Microbiol 2024; 15:1449844. [PMID: 39165576 PMCID: PMC11334354 DOI: 10.3389/fmicb.2024.1449844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 07/04/2024] [Indexed: 08/22/2024] Open
Abstract
The diagnosis and treatment of bacterial infections in the medical and public health field in the 21st century remain significantly challenging. Artificial Intelligence (AI) has emerged as a powerful new tool in diagnosing and treating bacterial infections. AI is rapidly revolutionizing epidemiological studies of infectious diseases, providing effective early warning, prevention, and control of outbreaks. Machine learning models provide a highly flexible way to simulate and predict the complex mechanisms of pathogen-host interactions, which is crucial for a comprehensive understanding of the nature of diseases. Machine learning-based pathogen identification technology and antimicrobial drug susceptibility testing break through the limitations of traditional methods, significantly shorten the time from sample collection to the determination of result, and greatly improve the speed and accuracy of laboratory testing. In addition, AI technology application in treating bacterial infections, particularly in the research and development of drugs and vaccines, and the application of innovative therapies such as bacteriophage, provides new strategies for improving therapy and curbing bacterial resistance. Although AI has a broad application prospect in diagnosing and treating bacterial infections, significant challenges remain in data quality and quantity, model interpretability, clinical integration, and patient privacy protection. To overcome these challenges and, realize widespread application in clinical practice, interdisciplinary cooperation, technology innovation, and policy support are essential components of the joint efforts required. In summary, with continuous advancements and in-depth application of AI technology, AI will enable doctors to more effectivelyaddress the challenge of bacterial infection, promoting the development of medical practice toward precision, efficiency, and personalization; optimizing the best nursing and treatment plans for patients; and providing strong support for public health safety.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Deng Zhang
- Department of Infectious Diseases, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Xifan Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Xin Zhang
- First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China
| |
Collapse
|
4
|
Rao PS, Downie DL, David-Ferdon C, Beekmann SE, Santibanez S, Polgreen PM, Kuehnert M, Courtney S, Lee JS, Chaitram J, Salerno RM, Gundlapalli AV. Pathogen-Agnostic Advanced Molecular Diagnostic Testing for Difficult-to-Diagnose Clinical Syndromes-Results of an Emerging Infections Network Survey of Frontline US Infectious Disease Clinicians, May 2023. Open Forum Infect Dis 2024; 11:ofae395. [PMID: 39113826 PMCID: PMC11304606 DOI: 10.1093/ofid/ofae395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
During routine clinical practice, infectious disease physicians encounter patients with difficult-to-diagnose clinical syndromes and may order advanced molecular testing to detect pathogens. These tests may identify potential infectious causes for illness and allow clinicians to adapt treatments or stop unnecessary antimicrobials. Cases of pathogen-agnostic disease testing also provide an important window into known, emerging, and reemerging pathogens and may be leveraged as part of national sentinel surveillance. A survey of Emerging Infections Network members, a group of infectious disease providers in North America, was conducted in May 2023. The objective of the survey was to gain insight into how and when infectious disease physicians use advanced molecular testing for patients with difficult-to-diagnose infectious diseases, as well as to explore the usefulness of advanced molecular testing and barriers to use. Overall, 643 providers answered at least some of the survey questions; 478 (74%) of those who completed the survey had ordered advanced molecular testing in the last two years, and formed the basis for this study. Respondents indicated that they most often ordered broad-range 16S rRNA gene sequencing, followed by metagenomic next-generation sequencing and whole genome sequencing; and commented that in clinical practice, some, but not all tests were useful. Many physicians also noted several barriers to use, including a lack of national guidelines and cost, while others commented that whole genome sequencing had potential for use in outbreak surveillance. Improving frontline physician access, availability, affordability, and developing clear national guidelines for interpretation and use of advanced molecular testing could potentially support clinical practice and public health surveillance.
Collapse
Affiliation(s)
- Preetika S Rao
- Office of Public Health Data, Surveillance and Technology, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Diane L Downie
- Office of Readiness and Response, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Corinne David-Ferdon
- National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Susan E Beekmann
- Emerging Infections Network, University of Iowa, Iowa City, Iowa, USA
| | - Scott Santibanez
- National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Philip M Polgreen
- Emerging Infections Network, University of Iowa, Iowa City, Iowa, USA
| | - Matthew Kuehnert
- National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Sean Courtney
- Office of Laboratory Systems and Response, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Justin S Lee
- Global Health Center, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Jasmine Chaitram
- Office of Laboratory Systems and Response, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Reynolds M Salerno
- Office of Laboratory Systems and Response, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Adi V Gundlapalli
- Office of Public Health Data, Surveillance and Technology, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| |
Collapse
|
5
|
Domrazek K, Jurka P. Application of Next-Generation Sequencing (NGS) Techniques for Selected Companion Animals. Animals (Basel) 2024; 14:1578. [PMID: 38891625 PMCID: PMC11171117 DOI: 10.3390/ani14111578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
Next-Generation Sequencing (NGS) techniques have revolutionized veterinary medicine for cats and dogs, offering insights across various domains. In veterinary parasitology, NGS enables comprehensive profiling of parasite populations, aiding in understanding transmission dynamics and drug resistance mechanisms. In infectious diseases, NGS facilitates rapid pathogen identification, characterization of virulence factors, and tracking of outbreaks. Moreover, NGS sheds light on metabolic processes by elucidating gene expression patterns and metabolic pathways, essential for diagnosing metabolic disorders and designing tailored treatments. In autoimmune diseases, NGS helps identify genetic predispositions and molecular mechanisms underlying immune dysregulation. Veterinary oncology benefits from NGS through personalized tumor profiling, mutation analysis, and identification of therapeutic targets, fostering precision medicine approaches. Additionally, NGS plays a pivotal role in veterinary genetics, unraveling the genetic basis of inherited diseases and facilitating breeding programs for healthier animals. Physiological investigations leverage NGS to explore complex biological systems, unraveling gene-environment interactions and molecular pathways governing health and disease. Application of NGS in treatment planning enhances precision and efficacy by enabling personalized therapeutic strategies tailored to individual animals and their diseases, ultimately advancing veterinary care for companion animals.
Collapse
Affiliation(s)
- Kinga Domrazek
- Institute of Veterinary Medicine, Warsaw University of Life Sciences—SGGW, Nowoursynowska 159c, 02-776 Warsaw, Poland;
| | | |
Collapse
|
6
|
Ma C, Liu S, Koslicki D. MetagenomicKG: a knowledge graph for metagenomic applications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585056. [PMID: 38559251 PMCID: PMC10980061 DOI: 10.1101/2024.03.14.585056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Motivation The sheer volume and variety of genomic content within microbial communities makes metagenomics a field rich in biomedical knowledge. To traverse these complex communities and their vast unknowns, metagenomic studies often depend on distinct reference databases, such as the Genome Taxonomy Database (GTDB), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), for various analytical purposes. These databases are crucial for genetic and functional annotation of microbial communities. Nevertheless, the inconsistent nomenclature or identifiers of these databases present challenges for effective integration, representation, and utilization. Knowledge graphs (KGs) offer an appropriate solution by organizing biological entities and their interrelations into a cohesive network. The graph structure not only facilitates the unveiling of hidden patterns but also enriches our biological understanding with deeper insights. Despite KGs having shown potential in various biomedical fields, their application in metagenomics remains underexplored. Results We present MetagenomicKG, a novel knowledge graph specifically tailored for metagenomic analysis. MetagenomicKG integrates taxonomic, functional, and pathogenesis-related information from widely used databases, and further links these with established biomedical knowledge graphs to expand biological connections. Through several use cases, we demonstrate its utility in enabling hypothesis generation regarding the relationships between microbes and diseases, generating sample-specific graph embeddings, and providing robust pathogen prediction. Availability and Implementation The source code and technical details for constructing the MetagenomicKG and reproducing all analyses are available at Github: https://github.com/KoslickiLab/MetagenomicKG. We also host a Neo4j instance: http://mkg.cse.psu.edu:7474 for accessing and querying this graph.
Collapse
Affiliation(s)
- Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| | - David Koslicki
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania, USA
- Department of Biology, Pennsylvania State University, State College, Pennsylvania, USA
- The One Health Microbiome Center, Huck Institutes of the Life Sciences, Pennsylvania State University, State College, Pennsylvania, USA
| |
Collapse
|
7
|
Akinsulie OC, Idris I, Aliyu VA, Shahzad S, Banwo OG, Ogunleye SC, Olorunshola M, Okedoyin DO, Ugwu C, Oladapo IP, Gbadegoye JO, Akande QA, Babawale P, Rostami S, Soetan KO. The potential application of artificial intelligence in veterinary clinical practice and biomedical research. Front Vet Sci 2024; 11:1347550. [PMID: 38356661 PMCID: PMC10864457 DOI: 10.3389/fvets.2024.1347550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024] Open
Abstract
Artificial intelligence (AI) is a fast-paced technological advancement in terms of its application to various fields of science and technology. In particular, AI has the potential to play various roles in veterinary clinical practice, enhancing the way veterinary care is delivered, improving outcomes for animals and ultimately humans. Also, in recent years, the emergence of AI has led to a new direction in biomedical research, especially in translational research with great potential, promising to revolutionize science. AI is applicable in antimicrobial resistance (AMR) research, cancer research, drug design and vaccine development, epidemiology, disease surveillance, and genomics. Here, we highlighted and discussed the potential impact of various aspects of AI in veterinary clinical practice and biomedical research, proposing this technology as a key tool for addressing pressing global health challenges across various domains.
Collapse
Affiliation(s)
- Olalekan Chris Akinsulie
- Faculty of Veterinary Medicine, University of Ibadan, Ibadan, Nigeria
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Ibrahim Idris
- Faculty of Veterinary Medicine, Usman Danfodiyo University, Sokoto, Nigeria
| | | | - Sammuel Shahzad
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | | | - Seto Charles Ogunleye
- Faculty of Veterinary Medicine, University of Ibadan, Ibadan, Nigeria
- Department of Population Medicine and Pathobiology, College of Veterinary Medicine, Mississippi State University, Starkville, MS, United States
| | - Mercy Olorunshola
- Department of Pharmaceutical Microbiology, University of Ibadan, Ibadan, Nigeria
| | - Deborah O. Okedoyin
- Department of Animal Sciences, North Carolina Agricultural and Technical State University, Greensboro, NC, United States
| | - Charles Ugwu
- College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | | | - Joy Olaoluwa Gbadegoye
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Qudus Afolabi Akande
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, United States
| | - Pius Babawale
- Department of Pathobiological Sciences, School of Veterinary Medicine, Louisiana State University, Baton Rouge, LA, United States
| | - Sahar Rostami
- Department of Population Medicine and Pathobiology, College of Veterinary Medicine, Mississippi State University, Starkville, MS, United States
| | | |
Collapse
|
8
|
Goldberg Z, Linder AG, Miller LN, Sorrell EM. Wastewater Collection and Sequencing as a Proactive Approach to Utilizing Threat Agnostic Biological Defense. Health Secur 2024; 22:11-15. [PMID: 37856169 DOI: 10.1089/hs.2023.0075] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023] Open
Affiliation(s)
- Zev Goldberg
- Zev Goldberg, MSc, was a 2022-2023 Griffin Fellow; Elizabeth R. Griffin Program, Center for Global Health Science and Security, Georgetown University, Washington, DC
| | - Alexander G Linder
- Alexander G. Linder, MSc, is Junior Scientists; Elizabeth R. Griffin Program, Center for Global Health Science and Security, Georgetown University, Washington, DC
| | - Lauren N Miller
- Lauren N. Miller, MSc, is Junior Scientists; Elizabeth R. Griffin Program, Center for Global Health Science and Security, Georgetown University, Washington, DC
| | - Erin M Sorrell
- Erin M. Sorrell, PhD, MSc, is a Senior Scholar, Johns Hopkins Center for Health Security, and an Associate Professor, Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| |
Collapse
|
9
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
10
|
Johnson MA, Vinatzer BA, Li S. Reference-Free Plant Disease Detection Using Machine Learning and Long-Read Metagenomic Sequencing. Appl Environ Microbiol 2023; 89:e0026023. [PMID: 37184398 PMCID: PMC10304783 DOI: 10.1128/aem.00260-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 04/14/2023] [Indexed: 05/16/2023] Open
Abstract
Surveillance for early disease detection is crucial to reduce the threat of plant diseases to food security. Metagenomic sequencing and taxonomic classification have recently been used to detect and identify plant pathogens. However, for an emerging pathogen, its genome may not be similar enough to any public genome to permit reference-based tools to identify infected samples. Also, in the case of point-of care diagnosis in the field, database access may be limited. Therefore, here we explore reference-free detection of plant pathogens using metagenomic sequencing and machine learning (ML). We used long-read metagenomes from healthy and infected plants as our model system and constructed k-mer frequency tables to test eight different ML models. The accuracy in classifying individual reads as coming from a healthy or infected metagenome were compared. Of all models, random forest (RF) had the best combination of short run-time and high accuracy (over 0.90) using tomato metagenomes. We further evaluated the RF model with a different tomato sample infected with the same pathogen or a different pathogen and a grapevine sample infected with a grapevine pathogen and achieved similar performances. ML models can thus learn features to successfully perform reference-free detection of plant diseases whereby a model trained with one pathogen-host system can also be used to detect different pathogens on different hosts. Potential and challenges of applying ML to metagenomics in plant disease detection are discussed. IMPORTANCE Climate change may lead to the emergence of novel plant diseases caused by yet unknown pathogens. Surveillance for emerging plant diseases is crucial to reduce their threat to food security. However, conventional genomic based methods require knowledge of existing plant pathogens and cannot be applied to detecting newly emerged pathogens. In this work, we explored reference-free, meta-genomic sequencing-based disease detection using machine learning. By sequencing the genomes of all microbial species extracted from an infected plant sample, we were able to train machine learning models to accurately classify individual sequencing reads as coming from a healthy or an infected plant sample. This method has the potential to be integrated into a generic pipeline for a meta-genomic based plant disease surveillance approach but also has limitations that still need to be overcome.
Collapse
Affiliation(s)
- Marcela A. Johnson
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
- Graduate Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, Virginia, USA
| | - Boris A. Vinatzer
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA
| |
Collapse
|
11
|
Yu L, Zhang Y, Qi X, Bai K, Zhang Z, Bu H. Next-generation sequencing for the diagnosis of Listeria monocytogenes meningoencephalitis: a case series of five consecutive patients. J Med Microbiol 2023; 72. [PMID: 36748504 DOI: 10.1099/jmm.0.001641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Introduction. The prompt and specific diagnosis of Listeria monocytogenes meningoencephalitis (LMM) is challenging. Next-generation sequencing (NGS) of cerebrospinal fluid (CSF) is an emerging technique for diagnosing infrequent causative pathogens.Hypothesis/Gap statement. We hypothesized that NGS of CSF is an effective approach for diagnosing LMM.Aim. To evaluate the effectiveness of NGS, we present five cases of LMM diagnosed using NGS of the CSF.Methodology. Between August 2017 and 30 September 2020, we used NGS of the CSF to detect pathogens in patients with clinically suspected central nervous system infections. The clinical characteristics, laboratory tests, imaging findings and NGS results are reviewed.Results. Five patients were diagnosed with LMM using NGS of the CSF within 2 to 4 days, although the clinical manifestations, medical history and imaging findings varied strikingly. NGS of CSF showed sequence reads corresponding to L. monocytogenes species ranging from 118 to 1997 bp, genomic coverage of 0.29-5.96 %, relative abundance of 14.83-32.16 % and sequencing depth of 1.12 to 1.35. The prompt diagnosis resulted in targeted and effective treatment with the appropriate antibiotics, although two patients with the most severe cerebral parenchymal lesions showed little improvement.Conclusion. Our results demonstrate the power of NGS of CSF for the prompt diagnosis of LMM. NGS of CSF is an important complementary tool for identifying L. monocytogenes.
Collapse
Affiliation(s)
- Lili Yu
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| | - Yu Zhang
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| | - Xuejiao Qi
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| | - Kaixuan Bai
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| | - Zhenyuan Zhang
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| | - Hui Bu
- Department of Neurology, the Second Hospital of Hebei Medical University, Shijiazhuang 050000, Hebei, PR China.,Key Laboratory of Neurology of Hebei Province, Shijiazhuang 050000, Hebei, PR China
| |
Collapse
|
12
|
Eberl L. Rehabilitation of the 'bad guys' for biocontrol applications. Environ Microbiol 2023; 25:97-101. [PMID: 36168979 DOI: 10.1111/1462-2920.16216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 01/24/2023]
Affiliation(s)
- Leo Eberl
- Department of Plant and Microbial biology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
13
|
Gupta A, Malwe AS, Srivastava GN, Thoudam P, Hibare K, Sharma VK. MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets. BMC Bioinformatics 2022; 23:507. [PMID: 36443666 PMCID: PMC9703692 DOI: 10.1186/s12859-022-05061-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 11/16/2022] [Indexed: 11/29/2022] Open
Abstract
Bacteria can exceptionally evolve and develop pathogenic features making it crucial to determine novel pathogenic proteins for specific therapeutic interventions. Therefore, we have developed a machine-learning tool that predicts and functionally classifies pathogenic proteins into their respective pathogenic classes. Through construction of pathogenic proteins database and optimization of ML algorithms, Support Vector Machine was selected for the model construction. The developed SVM classifier yielded an accuracy of 81.72% on the blind-dataset and classified the proteins into three classes: Non-pathogenic proteins (Class-1), Antibiotic Resistance Proteins and Toxins (Class-2), and Secretory System Associated and capsular proteins (Class-3). The classifier provided an accuracy of 79% on real dataset-1, and 72% on real dataset-2. Based on the probability of prediction, users can estimate the pathogenicity and annotation of proteins under scrutiny. Tool will provide accurate prediction of pathogenic proteins in genomic and metagenomic datasets providing leads for experimental validations. Tool is available at: http://metagenomics.iiserb.ac.in/mp4 .
Collapse
Affiliation(s)
- Ankit Gupta
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| | - Aditya S. Malwe
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| | - Gopal N. Srivastava
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| | - Parikshit Thoudam
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| | - Keshav Hibare
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| | - Vineet K. Sharma
- grid.462376.20000 0004 1763 8131MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh India
| |
Collapse
|
14
|
A computational approach to biological pathogenicity. Mol Genet Genomics 2022; 297:1741-1754. [PMID: 36125534 PMCID: PMC9486766 DOI: 10.1007/s00438-022-01951-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 08/28/2022] [Indexed: 11/03/2022]
Abstract
The current pandemic (COVID-19) has made evident the need to approach pathogenicity from a deeper and more systematic perspective that might lead to methodologies to quickly predict new strains of microbes that could be pathogenic to humans. Here we propose as a solution a general and principled definition of pathogenicity that can be practically implemented in operational ways in a framework for characterizing and assessing the (degree of) potential pathogenicity of a microbe to a given host (e.g., a human individual) just based on DNA biomarkers, and to the point of predicting its impact on a host a priori to a meaningful degree of accuracy. The definition is based on basic biochemistry, the Gibbs free Energy of duplex formation between oligonucleotides and some deep structural properties of DNA revealed by an approximation with certain properties. We propose two operational tests based on the nearest neighbor (NN) model of the Gibbs Energy and an approximating metric (the h-distance.) Quality assessments demonstrate that these tests predict pathogenicity with an accuracy of over 80%, and sensitivity and specificity over 90%. Other tests obtained by training machine learning models on deep features extracted from DNA sequences yield scores of 90% for accuracy, 100% for sensitivity and 80% for specificity. These results hint towards the possibility of an operational, objective, and general conceptual framework for prior identification of pathogens and their impact without the cost of death or sickness in a host (e.g., humans.) Consequently, a reasonable prediction of possible pathogens might pave the way to eventually transform the way we handle and prepare for future pandemic events and mitigate the adverse impact on human health, while reducing the number of clinical trials to obtain similar results.
Collapse
|
15
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
16
|
Costantini C, Nunzi E, Romani L. From the nose to the lungs: the intricate journey of airborne pathogens amidst commensal bacteria. Am J Physiol Cell Physiol 2022; 323:C1036-C1043. [PMID: 36036448 PMCID: PMC9529274 DOI: 10.1152/ajpcell.00287.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The recent COVID-19 pandemic has dramatically brought the pitfalls of airborne pathogens to the attention of the scientific community. Not only viruses but also bacteria and fungi may exploit air transmission to colonize and infect potential hosts and be the cause of significant morbidity and mortality in susceptible populations. The efforts to decipher the mechanisms of pathogenicity of airborne microbes have brought to light the delicate equilibrium that governs the homeostasis of mucosal membranes. The microorganisms already thriving in the permissive environment of the respiratory tract represent a critical component of this equilibrium and a potent barrier to infection by means of direct competition with airborne pathogens or indirectly via modulation of the immune response. Moving down the respiratory tract, physicochemical and biological constraints promote site-specific expansion of microbes that engage in cross talk with the local immune system to maintain homeostasis and promote protection. In this review, we critically assess the site-specific microbial communities that an airborne pathogen encounters in its hypothetical travel along the respiratory tract and discuss the changes in the composition and function of the microbiome in airborne diseases by taking fungal and SARS-CoV-2 infections as examples. Finally, we discuss how technological and bioinformatics advancements may turn microbiome analysis into a valuable tool in the hands of clinicians to predict the risk of disease onset, the clinical course, and the response to treatment of individual patients in the direction of personalized medicine implementation.
Collapse
Affiliation(s)
- Claudio Costantini
- Department of Medicine and Surgery, University of Perugia, Perugia, Italy
| | - Emilia Nunzi
- Department of Medicine and Surgery, University of Perugia, Perugia, Italy
| | - Luigina Romani
- Department of Medicine and Surgery, University of Perugia, Perugia, Italy
| |
Collapse
|
17
|
Naor-Hoffmann S, Svetlitsky D, Sal-Man N, Orenstein Y, Ziv-Ukelson M. Predicting the pathogenicity of bacterial genomes using widely spread protein families. BMC Bioinformatics 2022; 23:253. [PMID: 35751023 PMCID: PMC9233384 DOI: 10.1186/s12859-022-04777-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/13/2022] [Indexed: 11/15/2022] Open
Abstract
Background The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. Results We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host.
Collapse
Affiliation(s)
- Shaked Naor-Hoffmann
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Dina Svetlitsky
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Michal Ziv-Ukelson
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel.
| |
Collapse
|
18
|
Wani AK, Roy P, Kumar V, Mir TUG. Metagenomics and artificial intelligence in the context of human health. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 100:105267. [PMID: 35278679 DOI: 10.1016/j.meegid.2022.105267] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022]
Abstract
Human microbiome is ubiquitous, dynamic, and site-specific consortia of microbial communities. The pathogenic nature of microorganisms within human tissues has led to an increase in microbial studies. Characterization of genera, like Streptococcus, Cutibacterium, Staphylococcus, Bifidobacterium, Lactococcus and Lactobacillus through culture-dependent and culture-independent techniques has been reported. However, due to the unique environment within human tissues, it is difficult to culture these microorganisms making their molecular studies strenuous. MGs offer a gateway to explore and characterize hidden microbial communities through a culture-independent mode by direct DNA isolation. By function and sequence-based MGs, Scientists can explore the mechanistic details of numerous microbes and their interaction with the niche. Since the data generated from MGs studies is highly complex and multi-dimensional, it requires accurate analytical tools to evaluate and interpret the data. Artificial intelligence (AI) provides the luxury to automatically learn the data dimensionality and ease its complexity that makes the disease diagnosis and disease response easy, accurate and timely. This review provides insight into the human microbiota and its exploration and expansion through MG studies. The review elucidates the significance of MGs in studying the changing microbiota during disease conditions besides highlighting the role of AI in computational analysis of MG data.
Collapse
Affiliation(s)
- Atif Khurshid Wani
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| | - Priyanka Roy
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India
| | - Vijay Kumar
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India.
| | - Tahir Ul Gani Mir
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| |
Collapse
|
19
|
Simón D, Borsani O, Filippi CV. RFPDR: a random forest approach for plant disease resistance protein prediction. PeerJ 2022; 10:e11683. [PMID: 35480565 PMCID: PMC9037127 DOI: 10.7717/peerj.11683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 06/06/2021] [Indexed: 01/06/2023] Open
Abstract
Background Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10-cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.
Collapse
Affiliation(s)
- Diego Simón
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Omar Borsani
- Departamento de Biología Vegetal, Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| | - Carla Valeria Filippi
- Departamento de Biología Vegetal, Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
20
|
Abstract
Assessing the threat posed by bacterial samples is fundamentally important to safeguarding human health. Whole-genome sequence analysis of bacteria provides a route to achieving this goal. However, this approach is fundamentally constrained by the scope, the diversity, and our understanding of the bacterial genome sequences that are available for devising threat assessment schemes. For example, genome-based strategies offer limited utility for assessing the threat associated with pathogens that exploit novel virulence mechanisms or are recently emergent. To address these limitations, we developed PathEngine, a machine learning strategy that features the use of phenotypic hallmarks of pathogenesis to assess pathogenic threat. PathEngine successfully classified potential pathogenic threats with high accuracy and thereby establishes a phenotype-based, sequence-independent pipeline for threat assessment. Bacterial pathogen identification, which is critical for human health, has historically relied on culturing organisms from clinical specimens. More recently, the application of machine learning (ML) to whole-genome sequences (WGSs) has facilitated pathogen identification. However, relying solely on genetic information to identify emerging or new pathogens is fundamentally constrained, especially if novel virulence factors exist. In addition, even WGSs with ML pipelines are unable to discern phenotypes associated with cryptic genetic loci linked to virulence. Here, we set out to determine if ML using phenotypic hallmarks of pathogenesis could assess potential pathogenic threat without using any sequence-based analysis. This approach successfully classified potential pathogenetic threat associated with previously machine-observed and unobserved bacteria with 99% and 85% accuracy, respectively. This work establishes a phenotype-based pipeline for potential pathogenic threat assessment, which we term PathEngine, and offers strategies for the identification of bacterial pathogens.
Collapse
|
21
|
Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021; 16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Collapse
Affiliation(s)
- Benjamin Voigt
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Oliver Fischer
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Krumnow
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Herta
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Piotr Wojciech Dabrowski
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| |
Collapse
|
22
|
Arning N, Sheppard SK, Bayliss S, Clifton DA, Wilson DJ. Machine learning to predict the source of campylobacteriosis using whole genome data. PLoS Genet 2021; 17:e1009436. [PMID: 34662334 PMCID: PMC8553134 DOI: 10.1371/journal.pgen.1009436] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 10/28/2021] [Accepted: 08/26/2021] [Indexed: 11/18/2022] Open
Abstract
Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
Collapse
Affiliation(s)
- Nicolas Arning
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
- * E-mail:
| | - Samuel K. Sheppard
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Sion Bayliss
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - David A. Clifton
- Department of Engineering Science, University of Oxford, Oxford, UK; Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Daniel J. Wilson
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
| |
Collapse
|
23
|
The fate of plant growth-promoting rhizobacteria in soilless agriculture: future perspectives. 3 Biotech 2021; 11:382. [PMID: 34350087 DOI: 10.1007/s13205-021-02941-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
The application of plant growth-promoting rhizobacteria (PGPRs) can be an excellent and eco-friendly alternative to the use of chemical fertilizers. While PGPRs are often used in traditional agriculture to facilitate yield increases, their use in soilless agriculture has been limited. Soilless agriculture is growing in popularity among commercial farmers because it eliminates soil-borne problems, and the essential strategy is to keep the system as clean as possible. However, a new trend is the inclusion of PGPRs to enhance plant development. Despite the plethora of research that has been performed to date, there remains a huge knowledge gap that needs to be addressed to facilitate the commercialization of PGPRs for sustainable soilless agriculture. Hence, the development of proper strategies and additional research and trials are required. The present review provides an update on recent developments in the use of PGPRs in soilless agriculture, examining these bacteria from different perspectives in an attempt to generate critical discussion and aid in the understanding of the interaction between soilless agriculture and PGPRs.
Collapse
|
24
|
Bartoszewicz JM, Genske U, Renard BY. Deep learning-based real-time detection of novel pathogens during sequencing. Brief Bioinform 2021; 22:6326527. [PMID: 34297793 DOI: 10.1093/bib/bbab269] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Ulrich Genske
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| | - Bernhard Y Renard
- Digital Engineering Faculty, Hasso Plattner Institute, University of Postdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Brandenburg, Germany
| |
Collapse
|
25
|
Donaghy JA, Danyluk MD, Ross T, Krishna B, Farber J. Big Data Impacting Dynamic Food Safety Risk Management in the Food Chain. Front Microbiol 2021; 12:668196. [PMID: 34093486 PMCID: PMC8177817 DOI: 10.3389/fmicb.2021.668196] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/01/2021] [Indexed: 01/11/2023] Open
Abstract
Foodborne pathogens are a major contributor to foodborne illness worldwide. The adaptation of a more quantitative risk-based approach, with metrics such as Food safety Objectives (FSO) and Performance Objectives (PO) necessitates quantitative inputs from all stages of the food value chain. The potential exists for utilization of big data, generated through digital transformational technologies, as inputs to a dynamic risk management concept for food safety microbiology. The industrial revolution in Internet of Things (IoT) will leverage data inputs from precision agriculture, connected factories/logistics, precision healthcare, and precision food safety, to improve the dynamism of microbial risk management. Furthermore, interconnectivity of public health databases, social media, and e-commerce tools as well as technologies such as blockchain will enhance traceability for retrospective and real-time management of foodborne cases. Despite the enormous potential of data volume and velocity, some challenges remain, including data ownership, interoperability, and accessibility. This paper gives insight to the prospective use of big data for dynamic risk management from a microbiological safety perspective in the context of the International Commission on Microbiological Specifications for Foods (ICMSF) conceptual equation, and describes examples of how a dynamic risk management system (DRMS) could be used in real-time to identify hazards and control Shiga toxin-producing Escherichia coli risks related to leafy greens.
Collapse
Affiliation(s)
- John A Donaghy
- Corporate Operations - Quality Management (Food Safety) Société des Produits Nestlé S.A., Vevey, Switzerland
| | - Michelle D Danyluk
- IFAS Food Science and Human Nutrition, University of Florida, Gainesville, FL, United States
| | - Tom Ross
- Centre for Food Safety and Innovation, University of Tasmania, Hobart, TSA, Australia
| | - Bobby Krishna
- Department of Food Safety, Dubai Municipality, Dubai, United Arab Emirates
| | - Jeff Farber
- Department of Food Science, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
26
|
Parker MT, Kunjapur AM. Deployment of Engineered Microbes: Contributions to the Bioeconomy and Considerations for Biosecurity. Health Secur 2021; 18:278-296. [PMID: 32816583 DOI: 10.1089/hs.2020.0010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Engineering at microscopic scales has an immense effect on the modern bioeconomy. Microbes contribute to such disparate markets as chemical manufacturing, fuel production, crop optimization, and pharmaceutical synthesis, to name a few. Due to new and emerging synthetic biology technologies, and the sophistication and control afforded by them, we are on the brink of deploying engineered microbes to not only enhance traditional applications but also to introduce these microbes to sectors, contexts, and formats not previously attempted. In microbially managed medicine, microbial engineering holds promise for increasing efficacy, improving tissue penetration, and sustaining treatment. In the environment, the most effective areas for deployment are in the management of crops and protection of ecosystems. However, caution is warranted before introducing engineered organisms to new environments where they may proliferate without control and could cause unforeseen effects. We summarize ideas and data that can inform identification and assessment of the risks that these tools present to ensure that realistic hazards are described and unrealistic ones do not hinder advancement. Further, because modes of containment are crucial complements to deployment, we describe the state of the art in microbial biocontainment strategies, current gaps, and how these gaps might be addressed through technological advances in synthetic engineering. Collectively, this work highlights engineered microbes as a foundational and expanding facet of the bioeconomy, projects their utility in upcoming deployments outside the laboratory, and identifies knowns and unknowns that will be necessary considerations and points of focus in this endeavor.
Collapse
Affiliation(s)
- Michael T Parker
- Michael T. Parker, PhD, is an Assistant Dean, Office of the Dean, Georgetown University, Washington, DC. Aditya M. Kunjapur, PhD, is an Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware, Newark, DE
| | - Aditya M Kunjapur
- Michael T. Parker, PhD, is an Assistant Dean, Office of the Dean, Georgetown University, Washington, DC. Aditya M. Kunjapur, PhD, is an Assistant Professor, Chemical and Biomolecular Engineering, University of Delaware, Newark, DE
| |
Collapse
|
27
|
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genom Bioinform 2021; 3:lqab004. [PMID: 33554119 PMCID: PMC7849996 DOI: 10.1093/nargab/lqab004] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/04/2021] [Accepted: 01/15/2021] [Indexed: 01/21/2023] Open
Abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| | - Anja Seidel
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, 14482 Potsdam, Brandenburg, Germany
- Digital Engineering Faculty, University of Postdam, 14482 Potsdam, Brandenburg, Germany
| |
Collapse
|
28
|
O'Brien JT, Nelson C. Assessing the Risks Posed by the Convergence of Artificial Intelligence and Biotechnology. Health Secur 2020; 18:219-227. [PMID: 32559154 PMCID: PMC7310294 DOI: 10.1089/hs.2019.0122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 03/04/2020] [Accepted: 04/29/2020] [Indexed: 12/22/2022] Open
Abstract
Rapid developments are currently taking place in the fields of artificial intelligence (AI) and biotechnology, and applications arising from the convergence of these 2 fields are likely to offer immense opportunities that could greatly benefit human health and biosecurity. The combination of AI and biotechnology could potentially lead to breakthroughs in precision medicine, improved biosurveillance, and discovery of novel medical countermeasures as well as facilitate a more effective public health emergency response. However, as is the case with many preceding transformative technologies, new opportunities often present new risks in parallel. Understanding the current and emerging risks at the intersection of AI and biotechnology is crucial for health security specialists and unlikely to be achieved by examining either field in isolation. Uncertainties multiply as technologies merge, showcasing the need to identify robust assessment frameworks that could adequately analyze the risk landscape emerging at the convergence of these 2 domains.This paper explores the criteria needed to assess risks associated with Artificial intelligence and biotechnology and evaluates 3 previously published risk assessment frameworks. After highlighting their strengths and limitations and applying to relevant Artificial intelligence and biotechnology examples, the authors suggest a hybrid framework with recommendations for future approaches to risk assessment for convergent technologies.
Collapse
Affiliation(s)
- John T. O'Brien
- John T. O'Brien, MS, is a Research Associate, Bipartisan Commission on Biodefense, Washington, DC
| | - Cassidy Nelson
- Cassidy Nelson, MBBS, MPH, is a Research Scholar, Future of Humanity Institute, University of Oxford, Oxford, UK
| |
Collapse
|
29
|
Genomic Investigation into the Virulome, Pathogenicity, Stress Response Factors, Clonal Lineages, and Phylogenetic Relationship of Escherichia coli Strains Isolated from Meat Sources in Ghana. Genes (Basel) 2020; 11:genes11121504. [PMID: 33327465 PMCID: PMC7764966 DOI: 10.3390/genes11121504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 11/29/2020] [Accepted: 12/03/2020] [Indexed: 12/26/2022] Open
Abstract
Escherichia coli are among the most common foodborne pathogens associated with infections reported from meat sources. This study investigated the virulome, pathogenicity, stress response factors, clonal lineages, and the phylogenomic relationship of E. coli isolated from different meat sources in Ghana using whole-genome sequencing. Isolates were screened from five meat sources (beef, chevon, guinea fowl, local chicken, and mutton) and five areas (Aboabo, Central market, Nyorni, Victory cinema, and Tishegu) based in the Tamale Metropolis, Ghana. Following microbial identification, the E. coli strains were subjected to whole-genome sequencing. Comparative visualisation analyses showed different DNA synteny of the strains. The isolates consisted of diverse sequence types (STs) with the most common being ST155 (n = 3/14). Based Upon Related Sequence Types (eBURST) analyses of the study sequence types identified four similar clones, five single-locus variants, and two satellite clones (more distantly) with global curated E. coli STs. All the isolates possessed at least one restriction-modification (R-M) and CRISPR defence system. Further analysis revealed conserved stress response mechanisms (detoxification, osmotic, oxidative, and periplasmic stress) in the strains. Estimation of pathogenicity predicted a higher average probability score (Pscore ≈ 0.937), supporting their pathogenic potential to humans. Diverse virulence genes that were clonal-specific were identified. Phylogenomic tree analyses coupled with metadata insights depicted the high genetic diversity of the E. coli isolates with no correlation with their meat sources and areas. The findings of this bioinformatic analyses further our understanding of E. coli in meat sources and are broadly relevant to the design of contamination control strategies in meat retail settings in Ghana.
Collapse
|
30
|
Chen J, Karanth S, Pradhan AK. Quantitative microbial risk assessment for Salmonella: Inclusion of whole genome sequencing and genomic epidemiological studies, and advances in the bioinformatics pipeline. JOURNAL OF AGRICULTURE AND FOOD RESEARCH 2020; 2:100045. [DOI: 10.1016/j.jafr.2020.100045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
31
|
Jaiswal S, Kumar M, Mandeep, Sunita, Singh Y, Shukla P. Systems Biology Approaches for Therapeutics Development Against COVID-19. Front Cell Infect Microbiol 2020; 10:560240. [PMID: 33194800 PMCID: PMC7655984 DOI: 10.3389/fcimb.2020.560240] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 09/29/2020] [Indexed: 12/13/2022] Open
Abstract
Understanding the systems biology approaches for promoting the development of new therapeutic drugs is attaining importance nowadays. The threat of COVID-19 outbreak needs to be vanished for global welfare, and every section of research is focusing on it. There is an opportunity for finding new, quick, and accurate tools for developing treatment options, including the vaccine against COVID-19. The review at this moment covers various aspects of pathogenesis and host factors for exploring the virus target and developing suitable therapeutic solutions through systems biology tools. Furthermore, this review also covers the extensive details of multiomics tools i.e., transcriptomics, proteomics, genomics, lipidomics, immunomics, and in silico computational modeling aiming towards the study of host-virus interactions in search of therapeutic targets against the COVID-19.
Collapse
Affiliation(s)
- Shweta Jaiswal
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Mohit Kumar
- Soil Microbial Ecology and Environmental Toxicology Laboratory, Department of Zoology, University of Delhi, Delhi, India
- Department of Zoology, Hindu College, University of Delhi, Delhi, India
| | - Mandeep
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Sunita
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
- Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi, Delhi, India
| | - Yogendra Singh
- Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi, Delhi, India
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| |
Collapse
|
32
|
Bartoszewicz JM, Seidel A, Rentzsch R, Renard BY. DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks. Bioinformatics 2020; 36:81-89. [PMID: 31298694 DOI: 10.1093/bioinformatics/btz541] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 06/22/2019] [Accepted: 07/10/2019] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. RESULTS We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. AVAILABILITY AND IMPLEMENTATION The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Anja Seidel
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Robert Rentzsch
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department of Methodology and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
33
|
Agany DD, Pietri JE, Gnimpieba EZ. Assessment of vector-host-pathogen relationships using data mining and machine learning. Comput Struct Biotechnol J 2020; 18:1704-1721. [PMID: 32670510 PMCID: PMC7340972 DOI: 10.1016/j.csbj.2020.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022] Open
Abstract
Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.
Collapse
Affiliation(s)
- Diing D.M. Agany
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| | - Jose E. Pietri
- University of South Dakota, Sanford School of Medicine, Division of Basic Biomedical Sciences, Vermillion, SD, United States
| | - Etienne Z. Gnimpieba
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| |
Collapse
|
34
|
Barash E, Sal-Man N, Sabato S, Ziv-Ukelson M. BacPaCS-Bacterial Pathogenicity Classification via Sparse-SVM. Bioinformatics 2020; 35:2001-2008. [PMID: 30407484 DOI: 10.1093/bioinformatics/bty928] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 08/30/2018] [Accepted: 11/07/2018] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. RESULTS We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-'Bacterial Pathogenicity Classification via sparse-SVM' (BacPaCS)-which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported. AVAILABILITY AND IMPLEMENTATION The code and the resulting model are available at: https://github.com/barashe/bacpacs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eran Barash
- Department of Computer Science, Faculty of Natural Sciences
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, BeerSheva, Israel
| | - Sivan Sabato
- Department of Computer Science, Faculty of Natural Sciences
| | | |
Collapse
|
35
|
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, Tausch SH, Malorny B. Typing methods based on whole genome sequencing data. ONE HEALTH OUTLOOK 2020; 2:3. [PMID: 33829127 PMCID: PMC7993478 DOI: 10.1186/s42522-020-0010-1] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/08/2020] [Indexed: 05/12/2023]
Abstract
Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in the genome sequence of bacterial pathogens. In addition, its highly discriminative power enables the comparison of genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, and environment) for the investigation of disease outbreaks, source attribution, and improved risk characterization models. In order to extract relevant information from the large quantity and complex data produced by WGS, a host of bioinformatics tools has been developed, allowing users to analyze and interpret sequencing data, starting from simple gene-searches to complex phylogenetic studies. Depending on the research question, the complexity of the dataset and their bioinformatics skill set, users can choose between a great variety of tools for the analysis of WGS data. In this review, we describe the relevant approaches for phylogenomic studies for outbreak studies and give an overview of selected tools for the characterization of foodborne pathogens based on WGS data. Despite the efforts of the last years, harmonization and standardization of typing tools are still urgently needed to allow for an easy comparison of data between laboratories, moving towards a one health worldwide surveillance system for foodborne pathogens.
Collapse
Affiliation(s)
- Laura Uelze
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Josephine Grützke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Maria Borowiak
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Jens Andre Hammerl
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Katharina Juraschek
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Carlus Deneke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Simon H. Tausch
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Burkhard Malorny
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| |
Collapse
|
36
|
Leimeister CA, Dencker T, Morgenstern B. Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points. Bioinformatics 2019; 35:211-218. [PMID: 29992260 PMCID: PMC6330006 DOI: 10.1093/bioinformatics/bty592] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/09/2018] [Indexed: 01/30/2023] Open
Abstract
Motivation Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. Results In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. Availability and implementation http://spacedanchor.gobics.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics.,Center for Computational Sciences, University of Goettingen, Goettingen, Germany
| |
Collapse
|
37
|
Amoako DG, Somboro AM, Abia ALK, Allam M, Ismail A, Bester LA, Essack SY. Genome Mining and Comparative Pathogenomic Analysis of An Endemic Methicillin-Resistant Staphylococcus Aureus (MRSA) Clone, ST612-CC8-t1257-SCCmec_IVd(2B), Isolated in South Africa. Pathogens 2019; 8:E166. [PMID: 31569754 PMCID: PMC6963616 DOI: 10.3390/pathogens8040166] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/19/2022] Open
Abstract
This study undertook genome mining and comparative genomics to gain genetic insights into the dominance of the methicillin-resistant Staphylococcus aureus (MRSA) endemic clone ST612-CC8-t1257-SCCmec_IVd(2B), obtained from the poultry food chain in South Africa. Functional annotation of the genome revealed a vast array of similar central metabolic, cellular and biochemical networks within the endemic clone crucial for its survival in the microbial community. In-silico analysis of the clone revealed the possession of uniform defense systems, restriction-modification system (type I and IV), accessory gene regulator (type I), arginine catabolic mobile element (type II), and type 1 clustered, regularly interspaced, short palindromic repeat (CRISPR)Cas array (N = 7 ± 1), which offer protection against exogenous attacks. The estimated pathogenic potential predicted a higher probability (average Pscore ≈ 0.927) of the clone being pathogenic to its host. The clone carried a battery of putative virulence determinants whose expression are critical for establishing infection. However, there was a slight difference in their possession of adherence factors (biofilm operon system) and toxins (hemolysins and enterotoxins). Further analysis revealed a conserved environmental tolerance and persistence mechanisms related to stress (oxidative and osmotic), heat shock, sporulation, bacteriocins, and detoxification, which enable it to withstand lethal threats and contribute to its success in diverse ecological niches. Phylogenomic analysis with close sister lineages revealed that the clone was closely related to the MRSA isolate SHV713 from Australia. The results of this bioinformatic analysis provide valuable insights into the biology of this endemic clone.
Collapse
Affiliation(s)
- Daniel Gyamfi Amoako
- Infection Genomics and Applied Bioinformatics Division, Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
| | - Anou M Somboro
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| | - Akebe Luther King Abia
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| | - Mushal Allam
- Sequencing Core Facility, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg 2131, South Africa.
| | - Arshad Ismail
- Sequencing Core Facility, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg 2131, South Africa.
| | - Linda A Bester
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
| | - Sabiha Y Essack
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| |
Collapse
|
38
|
Vilne B, Meistere I, Grantiņa-Ieviņa L, Ķibilds J. Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks. Front Microbiol 2019; 10:1722. [PMID: 31447800 PMCID: PMC6691741 DOI: 10.3389/fmicb.2019.01722] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/12/2019] [Indexed: 12/14/2022] Open
Abstract
Foodborne diseases (FBDs) are infections of the gastrointestinal tract caused by foodborne pathogens (FBPs) such as bacteria [Salmonella, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)] and several viruses, but also parasites and some fungi. Artificial intelligence (AI) and its sub-discipline machine learning (ML) are re-emerging and gaining an ever increasing popularity in the scientific community and industry, and could lead to actionable knowledge in diverse ranges of sectors including epidemiological investigations of FBD outbreaks and antimicrobial resistance (AMR). As genotyping using whole-genome sequencing (WGS) is becoming more accessible and affordable, it is increasingly used as a routine tool for the detection of pathogens, and has the potential to differentiate between outbreak strains that are closely related, identify virulence/resistance genes and provide improved understanding of transmission events within hours to days. In most cases, the computational pipeline of WGS data analysis can be divided into four (though, not necessarily consecutive) major steps: de novo genome assembly, genome characterization, comparative genomics, and inference of phylogeny or phylogenomics. In each step, ML could be used to increase the speed and potentially the accuracy (provided increasing amounts of high-quality input data) of identification of the source of ongoing outbreaks, leading to more efficient treatment and prevention of additional cases. In this review, we explore whether ML or any other form of AI algorithms have already been proposed for the respective tasks and compare those with mechanistic model-based approaches.
Collapse
Affiliation(s)
- Baiba Vilne
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
- SIA net-OMICS, Riga, Latvia
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| | | | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| |
Collapse
|
39
|
Abstract
The implementation of whole-genome sequencing in food safety has revolutionized foodborne pathogen tracking and outbreak investigations. The vast amounts of genomic data that are being produced through ongoing surveillance efforts continue advancing our understanding of pathogen diversity and genome biology. Produced genomic data are also supporting the use of metagenomics and metatranscriptomics for detection and functional characterization of microbiological hazards in foods and food processing environments. In addition to that, many studies have shown that metabolic and pathogenic potential, antimicrobial resistance, and other phenotypes relevant to food safety can be predicted from whole-genome sequences, omitting the need for multiple laboratory tests. Nevertheless, further work in the area of functional inference is necessary to enable accurate interpretation of functional information inferred from genomic and metagenomic data, as well as real-time detection and tracking of high-risk pathogen subtypes and microbiomes.
Collapse
|
40
|
Coagulase-Negative Staphylococci Pathogenomics. Int J Mol Sci 2019; 20:ijms20051215. [PMID: 30862021 PMCID: PMC6429511 DOI: 10.3390/ijms20051215] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 02/28/2019] [Accepted: 03/07/2019] [Indexed: 01/16/2023] Open
Abstract
Coagulase-negative Staphylococci (CoNS) are skin commensal bacteria. Besides their role in maintaining homeostasis, CoNS have emerged as major pathogens in nosocomial settings. Several studies have investigated the molecular basis for this emergence and identified multiple putative virulence factors with regards to Staphylococcus aureus pathogenicity. In the last decade, numerous CoNS whole-genome sequences have been released, leading to the identification of numerous putative virulence factors. Koch’s postulates and the molecular rendition of these postulates, established by Stanley Falkow in 1988, do not explain the microbial pathogenicity of CoNS. However, whole-genome sequence data has shed new light on CoNS pathogenicity. In this review, we analyzed the contribution of genomics in defining CoNS virulence, focusing on the most frequent and pathogenic CoNS species: S. epidermidis, S. haemolyticus, S. saprophyticus, S. capitis, and S. lugdunensis.
Collapse
|
41
|
Where are we going with genomics in plant pathogenic bacteria? Genomics 2018; 111:729-736. [PMID: 29678682 DOI: 10.1016/j.ygeno.2018.04.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 04/13/2018] [Indexed: 12/12/2022]
Abstract
Genome sequencing is commonly used in research laboratories right now thanks to the rise of high-throughput sequencing with higher speed and output-to-cost ratios. Here, we summarized the application of genomics in different aspects of plant bacterial pathosystems. Genomics has been used in studying the mechanisms of plant-bacteria interactions, and host specificity. It also helps with taxonomy, study of non-cultured bacteria, identification of causal agent, single cell sequencing, population genetics, and meta-transcriptomic. Overall, genomics has significantly improved our understanding of plant-microbe interaction.
Collapse
|
42
|
Saeb ATM. Current Bioinformatics resources in combating infectious diseases. Bioinformation 2018; 14:31-35. [PMID: 29497257 PMCID: PMC5818640 DOI: 10.6026/97320630014031] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/13/2022] Open
Abstract
Bioinformatics tools and techniques analyzing next-generation sequencing (NGS) data are increasingly used for the diagnosis and monitoring of infectious diseases. It is of interest to review the application of bioinformatics tools, commonly used databases and NGS data in clinical microbiology, focusing on molecular identification, genotypic, microbiome research, antimicrobial resistance analysis and detection of unknown disease-associated pathogens in clinical specimens. This review documents available bioinformatics resources and databases that are used by medical microbiology scientists and physicians to control emerging infectious pathogens.
Collapse
Affiliation(s)
- Amr T. M. Saeb
- Genetics and Biotechnology Department, Strategic Center for Diabetes Research, College of medicine, King Saud University, KSA
| |
Collapse
|
43
|
Kulkarni P, Frommolt P. Challenges in the Setup of Large-scale Next-Generation Sequencing Analysis Workflows. Comput Struct Biotechnol J 2017; 15:471-477. [PMID: 29158876 PMCID: PMC5683667 DOI: 10.1016/j.csbj.2017.10.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 09/29/2017] [Accepted: 10/06/2017] [Indexed: 11/18/2022] Open
Abstract
While Next-Generation Sequencing (NGS) can now be considered an established analysis technology for research applications across the life sciences, the analysis workflows still require substantial bioinformatics expertise. Typical challenges include the appropriate selection of analytical software tools, the speedup of the overall procedure using HPC parallelization and acceleration technology, the development of automation strategies, data storage solutions and finally the development of methods for full exploitation of the analysis results across multiple experimental conditions. Recently, NGS has begun to expand into clinical environments, where it facilitates diagnostics enabling personalized therapeutic approaches, but is also accompanied by new technological, legal and ethical challenges. There are probably as many overall concepts for the analysis of the data as there are academic research institutions. Among these concepts are, for instance, complex IT architectures developed in-house, ready-to-use technologies installed on-site as well as comprehensive Everything as a Service (XaaS) solutions. In this mini-review, we summarize the key points to consider in the setup of the analysis architectures, mostly for scientific rather than diagnostic purposes, and provide an overview of the current state of the art and challenges of the field.
Collapse
Affiliation(s)
- Pranav Kulkarni
- Bioinformatics Core Facility, CECAD Research Center, University of Cologne, Germany
| | - Peter Frommolt
- Bioinformatics Core Facility, CECAD Research Center, University of Cologne, Germany
| |
Collapse
|