1
|
Kader Chowdhury QMM, Islam S, Narayanan L, Ogunleye SC, Wang S, Thu D, Freitag NE, Lawrence ML, Abdelhamed H. An insight into the role of branched-chain α-keto acid dehydrogenase (BKD) complex in branched-chain fatty acid biosynthesis and virulence of Listeria monocytogenes. J Bacteriol 2024; 206:e0003324. [PMID: 38899896 PMCID: PMC11270904 DOI: 10.1128/jb.00033-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/31/2024] [Indexed: 06/21/2024] Open
Abstract
Listeria monocytogenes is a foodborne bacterial pathogen that causes listeriosis. Positive regulatory factor A (PrfA) is a pleiotropic master activator of virulence genes of L. monocytogenes that becomes active upon the entry of the bacterium into the cytosol of infected cells. L. monocytogenes can survive and multiply at low temperatures; this is accomplished through the maintenance of appropriate membrane fluidity via branched-chain fatty acid (BCFA) synthesis. Branched-chain α-keto acid dehydrogenase (BKD), which is composed of four polypeptides encoded by lpd, bkdA1, bkdA2, and bkdB, is known to play a vital role in BCFA biosynthesis. Here, we constructed BKD-deficient Listeria strains by in-frame deletion of lpd, bkdA1, bkdA2, and bkdB genes. To determine the role in in vivo and in vitro, mouse model challenges, plaque assay in murine L2 fibroblast, and intracellular replication in J744A.1 macrophage were conducted. BKD-deficient strains exhibited defects in BCFA composition, virulence, and PrfA-regulon function within the host cells. Transcriptomics analysis revealed that the transcript level of the PrfA-regulon was lower in ΔbkdA1 strain than those in the wild-type. This study demonstrates that L. monocytogenes strains lacking BKD complex components were defective in PrfA-regulon function, and full activation of wild-type prfA may not occur within host cells in the absence of BKD. Further study will investigate the consequences of BKD deletion on PrfA function through altering BCFA catabolism.IMPORTANCEListeria monocytogenes is the causative agent of listeriosis, a disease with a high mortality rate. In this study, we have shown that the deletion of BKD can impact the function of PrfA and the PrfA-regulon. The production of virulence proteins within host cells is necessary for L. monocytogenes to promote its intracellular survival and is likely dependent on membrane integrity. We thus report a link between L. monocytogenes membrane integrity and the function of PrfA. This knowledge will increase our understanding of L. monocytogenes pathogenesis, which may provide insight into the development of antimicrobial agents.
Collapse
Affiliation(s)
- Q M Monzur Kader Chowdhury
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| | - Shamima Islam
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| | - Lakshmi Narayanan
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| | - Seto C. Ogunleye
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| | - Shangshang Wang
- Department of Animal and Dairy Sciences, Mississippi State University, Mississippi State, Mississippi, USA
| | - Dinh Thu
- Tyson Foods, R&D Ingredient Solutions, Springdale, Arkansas, USA
| | - Nancy E. Freitag
- Department of Pharmaceutical Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Mark L. Lawrence
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| | - Hossam Abdelhamed
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, USA
| |
Collapse
|
2
|
Dulyayangkul P, Sealey JE, Lee WWY, Satapoomin N, Reding C, Heesom KJ, Williams PB, Avison MB. Improving nitrofurantoin resistance prediction in Escherichia coli from whole-genome sequence by integrating NfsA/B enzyme assays. Antimicrob Agents Chemother 2024; 68:e0024224. [PMID: 38767379 PMCID: PMC11232377 DOI: 10.1128/aac.00242-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/13/2024] [Indexed: 05/22/2024] Open
Abstract
Nitrofurantoin resistance in Escherichia coli is primarily caused by mutations damaging two enzymes, NfsA and NfsB. Studies based on small isolate collections with defined nitrofurantoin MICs have found significant random genetic drift in nfsA and nfsB, making it extremely difficult to predict nitrofurantoin resistance from whole-genome sequence (WGS) where both genes are not obviously disrupted by nonsense or frameshift mutations or insertional inactivation. Here, we report a WGS survey of 200 oqxAB-negative E. coli from community urine samples, of which 34 were nitrofurantoin resistant. We characterized individual non-synonymous mutations seen in nfsA and nfsB among this collection using complementation cloning and NfsA/B enzyme assays in cell extracts. We definitively identified R203C, H11Y, W212R, A112E, and A112T in NfsA and R121C, Q142H, F84S, P163H, W46R, K57E, and V191G in NfsB as amino acid substitutions that reduce enzyme activity sufficiently to cause resistance. In contrast, E58D, I117T, K141E, L157F, A172S, G187D, and A188V in NfsA and G66D, M75I, V93A, and A174E in NfsB are functionally silent in this context. We identified that 9/166 (5.4%) nitrofurantoin-susceptible isolates were "pre-resistant," defined as having loss of function mutations in nfsA or nfsB. Finally, using NfsA/B enzyme assays and proteomics, we demonstrated that 9/34 (26.5%) ribE wild-type nitrofurantoin-resistant isolates also carried functionally wild-type nfsB or nfsB/nfsA. In these cases, NfsA/B activity was reduced through downregulated gene expression. Our biological understanding of nitrofurantoin resistance is greatly improved by this analysis but is still insufficient to allow its reliable prediction from WGS data.
Collapse
Affiliation(s)
- Punyawee Dulyayangkul
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
- Laboratory of Biotechnology, Chulabhorn Research Institute, Bangkok, Thailand
| | - Jordan E Sealey
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Winnie W Y Lee
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Naphat Satapoomin
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Carlos Reding
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Kate J Heesom
- University of Bristol Proteomics Facility, Bristol, United Kingdom
| | - Philip B Williams
- University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, United Kingdom
| | - Matthew B Avison
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
3
|
Zheng L, Shen J, Chen R, Hu Y, Zhao W, Leung ELH, Dai L. Genome engineering of the human gut microbiome. J Genet Genomics 2024; 51:479-491. [PMID: 38218395 DOI: 10.1016/j.jgg.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 01/15/2024]
Abstract
The human gut microbiome, a complex ecosystem, significantly influences host health, impacting crucial aspects such as metabolism and immunity. To enhance our comprehension and control of the molecular mechanisms orchestrating the intricate interplay between gut commensal bacteria and human health, the exploration of genome engineering for gut microbes is a promising frontier. Nevertheless, the complexities and diversities inherent in the gut microbiome pose substantial challenges to the development of effective genome engineering tools for human gut microbes. In this comprehensive review, we provide an overview of the current progress and challenges in genome engineering of human gut commensal bacteria, whether executed in vitro or in situ. A specific focus is directed towards the advancements and prospects in cargo DNA delivery and high-throughput techniques. Additionally, we elucidate the immense potential of genome engineering methods to enhance our understanding of the human gut microbiome and engineer the microorganisms to enhance human health.
Collapse
Affiliation(s)
- Linggang Zheng
- Dr Neher's Biophysics Laboratory for Innovative Drug Discovery/State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Juntao Shen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ruiyue Chen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yucan Hu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Zhao
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Elaine Lai-Han Leung
- Cancer Center, Faculty of Health Science, University of Macau, Macau 999078, China; MOE Frontiers Science Center for Precision Oncology, University of Macau, Macau 999078, China.
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
4
|
Paul S, Olymon K, Martinez GS, Sarkar S, Yella VR, Kumar A. MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI. J Chem Inf Model 2024; 64:2705-2719. [PMID: 38258978 DOI: 10.1021/acs.jcim.3c02017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.
Collapse
Affiliation(s)
- Subhojit Paul
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| | - Kaushika Olymon
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| | - Gustavo Sganzerla Martinez
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
| | - Sharmilee Sarkar
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| | - Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 522302, Andhra Pradesh, India
| | - Aditya Kumar
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| |
Collapse
|
5
|
Yang G, Li J, Hu J, Shi JY. Recognition of cyanobacteria promoters via Siamese network-based contrastive learning under novel non-promoter generation. Brief Bioinform 2024; 25:bbae193. [PMID: 38701419 PMCID: PMC11066903 DOI: 10.1093/bib/bbae193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/08/2024] [Accepted: 04/05/2024] [Indexed: 05/05/2024] Open
Abstract
It is a vital step to recognize cyanobacteria promoters on a genome-wide scale. Computational methods are promising to assist in difficult biological identification. When building recognition models, these methods rely on non-promoter generation to cope with the lack of real non-promoters. Nevertheless, the factitious significant difference between promoters and non-promoters causes over-optimistic prediction. Moreover, designed for E. coli or B. subtilis, existing methods cannot uncover novel, distinct motifs among cyanobacterial promoters. To address these issues, this work first proposes a novel non-promoter generation strategy called phantom sampling, which can eliminate the factitious difference between promoters and generated non-promoters. Furthermore, it elaborates a novel promoter prediction model based on the Siamese network (SiamProm), which can amplify the hidden difference between promoters and non-promoters through a joint characterization of global associations, upstream and downstream contexts, and neighboring associations w.r.t. k-mer tokens. The comparison with state-of-the-art methods demonstrates the superiority of our phantom sampling and SiamProm. Both comprehensive ablation studies and feature space illustrations also validate the effectiveness of the Siamese network and its components. More importantly, SiamProm, upon our phantom sampling, finds a novel cyanobacterial promoter motif ('GCGATCGC'), which is palindrome-patterned, content-conserved, but position-shifted.
Collapse
Affiliation(s)
- Guang Yang
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jianing Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jinlu Hu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, 710072, China
| |
Collapse
|
6
|
Martinez GS, Perez-Rueda E, Kumar A, Dutt M, Maya CR, Ledesma-Dominguez L, Casa PL, Kumar A, de Avila e Silva S, Kelvin DJ. CDBProm: the Comprehensive Directory of Bacterial Promoters. NAR Genom Bioinform 2024; 6:lqae018. [PMID: 38385146 PMCID: PMC10880602 DOI: 10.1093/nargab/lqae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/29/2024] [Indexed: 02/23/2024] Open
Abstract
The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https://aw.iimas.unam.mx/cdbprom/.
Collapse
Affiliation(s)
- Gustavo Sganzerla Martinez
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autonóma de México, Unidad Académica del Estado de Yucatán, Mérida 97302, Yucatán, Mexico
| | - Anuj Kumar
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Mansi Dutt
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Cinthia Rodríguez Maya
- Facultad de Ciencias e Ingeniería, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Leonardo Ledesma-Dominguez
- Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Pedro Lenz Casa
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - Aditya Kumar
- Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Scheila de Avila e Silva
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - David J Kelvin
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| |
Collapse
|
7
|
Aguilar-Carrillo Y, Soto-Urzúa L, Martínez-Martínez MDLÁ, Becerril-Ramírez M, Martínez-Morales LJ. Computational Analysis of the Tripartite Interaction of Phasins (PhaP4 and 5)-Sigma Factor (σ 24)-DNA of Azospirillum brasilense Sp7. Polymers (Basel) 2024; 16:611. [PMID: 38475295 DOI: 10.3390/polym16050611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/04/2024] [Accepted: 02/07/2024] [Indexed: 03/14/2024] Open
Abstract
Azospirillum brasilense Sp7 produces PHB, which is covered by granule-associated proteins (GAPs). Phasins are the main GAPs. Previous studies have shown phasins can regulate PHB synthesis. When A. brasilense grows under stress conditions, it uses sigma factors to transcribe genes for survival. One of these factors is the σ24 factor. This study determined the possible interaction between phasins and the σ24 factor or phasin-σ24 factor complex and DNA. Three-dimensional structures of phasins and σ24 factor structures were predicted using the I-TASSER and SWISS-Model servers, respectively. Subsequently, a molecular docking between phasins and the σ24 factor was performed using the ClusPro 2.0 server, followed by molecular docking between protein complexes and DNA using the HDOCK server. Evaluation of the types of ligand-receptor interactions was performed using the BIOVIA Discovery Visualizer for three-dimensional diagrams, as well as the LigPlot server to obtain bi-dimensional diagrams. The results showed the phasins (Pha4Abs7 or Pha5Abs7)-σ24 factor complex was bound near the -35 box of the promoter region of the phaC gene. However, in the individual interaction of PhaP5Abs7 and the σ24 factor, with DNA, both proteins were bound to the -35 box. This did not occur with PhaP4Abs7, which was bound to the -10 box. This change could affect the transcription level of the phaC gene and possibly affect PHB synthesis.
Collapse
Affiliation(s)
- Yovani Aguilar-Carrillo
- Centro de Investigaciones en Ciencias Microbiológicas, Instituto de Ciencias, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y Av. 24 Sur, Col. San Manuel Ciudad Universitaria, Puebla 72570, Mexico
| | - Lucía Soto-Urzúa
- Centro de Investigaciones en Ciencias Microbiológicas, Instituto de Ciencias, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y Av. 24 Sur, Col. San Manuel Ciudad Universitaria, Puebla 72570, Mexico
| | - María De Los Ángeles Martínez-Martínez
- Centro de Investigaciones en Ciencias Microbiológicas, Instituto de Ciencias, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y Av. 24 Sur, Col. San Manuel Ciudad Universitaria, Puebla 72570, Mexico
| | - Mirian Becerril-Ramírez
- Centro de Investigaciones en Ciencias Microbiológicas, Instituto de Ciencias, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y Av. 24 Sur, Col. San Manuel Ciudad Universitaria, Puebla 72570, Mexico
| | - Luis Javier Martínez-Morales
- Centro de Investigaciones en Ciencias Microbiológicas, Instituto de Ciencias, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y Av. 24 Sur, Col. San Manuel Ciudad Universitaria, Puebla 72570, Mexico
| |
Collapse
|
8
|
Park JH, Lee S, Shin E, Abdi Nansa S, Lee SJ. The Transposition of Insertion Sequences in Sigma-Factor- and LysR-Deficient Mutants of Deinococcus geothermalis. Microorganisms 2024; 12:328. [PMID: 38399731 PMCID: PMC10892881 DOI: 10.3390/microorganisms12020328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 01/29/2024] [Accepted: 02/01/2024] [Indexed: 02/25/2024] Open
Abstract
Some insertion sequence (IS) elements were actively transposed using oxidative stress conditions, including gamma irradiation and hydrogen peroxide treatment, in Deinococcus geothermalis, a radiation-resistant bacterium. D. geothermalis wild-type (WT), sigma factor gene-disrupted (∆dgeo_0606), and LysR gene-disrupted (∆dgeo_1692) mutants were examined for IS induction that resulted in non-pigmented colonies after gamma irradiation (5 kGy) exposure. The loss of pigmentation occurred because dgeo_0524, which encodes a phytoene desaturase in the carotenoid pathway, was disrupted by the transposition of IS elements. The types and loci of the IS elements were identified as ISDge2 and ISDge6 in the ∆dgeo_0606 mutant and ISDge5 and ISDge7 in the ∆dgeo_1692 mutant, but were not identified in the WT strain. Furthermore, 80 and 100 mM H2O2 treatments induced different transpositions of IS elements in ∆dgeo_0606 (ISDge5, ISDge6, and ISDge7) and WT (ISDge6). However, no IS transposition was observed in the ∆dgeo_1692 mutant. The complementary strain of the ∆dgeo_0606 mutation showed recovery effects in the viability assay; however, the growth-delayed curve did not return because the neighboring gene dgeo_0607 was overexpressed, probably acting as an anti-sigma factor. The expression levels of certain transposases, recognized as pivotal contributors to IS transposition, did not precisely correlate with active transposition in varying oxidation environments. Nevertheless, these findings suggest that specific IS elements integrated into dgeo_0524 in a target-gene-deficient and oxidation-source-dependent manner.
Collapse
Affiliation(s)
| | | | | | | | - Sung-Jae Lee
- Department of Biology, Kyung Hee University, Seoul 02447, Republic of Korea; (J.H.P.); (S.L.); (E.S.); (S.A.N.)
| |
Collapse
|
9
|
Ligeti B, Szepesi-Nagy I, Bodnár B, Ligeti-Nagy N, Juhász J. ProkBERT family: genomic language models for microbiome applications. Front Microbiol 2024; 14:1331233. [PMID: 38282738 PMCID: PMC10810988 DOI: 10.3389/fmicb.2023.1331233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 01/30/2024] Open
Abstract
Background In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing novel functionalities within extensive datasets. However, the effectiveness of these methods in microbiology faces challenges due to the complex and heterogeneous nature of microbial data, further complicated by low signal-to-noise ratios, context-dependency, and a significant shortage of appropriately labeled datasets. This study introduces the ProkBERT model family, a collection of large language models, designed for genomic tasks. It provides a generalizable sequence representation for nucleotide sequences, learned from unlabeled genome data. This approach helps overcome the above-mentioned limitations in the field, thereby improving our understanding of microbial ecosystems and their impact on health and disease. Methods ProkBERT models are based on transfer learning and self-supervised methodologies, enabling them to use the abundant yet complex microbial data effectively. The introduction of the novel Local Context-Aware (LCA) tokenization technique marks a significant advancement, allowing ProkBERT to overcome the contextual limitations of traditional transformer models. This methodology not only retains rich local context but also demonstrates remarkable adaptability across various bioinformatics tasks. Results In practical applications such as promoter prediction and phage identification, the ProkBERT models show superior performance. For promoter prediction tasks, the top-performing model achieved a Matthews Correlation Coefficient (MCC) of 0.74 for E. coli and 0.62 in mixed-species contexts. In phage identification, ProkBERT models consistently outperformed established tools like VirSorter2 and DeepVirFinder, achieving an MCC of 0.85. These results underscore the models' exceptional accuracy and generalizability in both supervised and unsupervised tasks. Conclusions The ProkBERT model family is a compact yet powerful tool in the field of microbiology and bioinformatics. Its capacity for rapid, accurate analyses and its adaptability across a spectrum of tasks marks a significant advancement in machine learning applications in microbiology. The models are available on GitHub (https://github.com/nbrg-ppcu/prokbert) and HuggingFace (https://huggingface.co/nerualbioinfo) providing an accessible tool for the community.
Collapse
Affiliation(s)
- Balázs Ligeti
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - István Szepesi-Nagy
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Babett Bodnár
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Noémi Ligeti-Nagy
- Language Technology Research Group, HUN-REN Hungarian Research Centre for Linguistics, Budapest, Hungary
| | - János Juhász
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
- Institute of Medical Microbiology, Semmelweis University, Budapest, Hungary
| |
Collapse
|
10
|
Lechtenberg T, Wynands B, Wierckx N. Engineering 5-hydroxymethylfurfural (HMF) oxidation in Pseudomonas boosts tolerance and accelerates 2,5-furandicarboxylic acid (FDCA) production. Metab Eng 2024; 81:262-272. [PMID: 38154655 DOI: 10.1016/j.ymben.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/21/2023] [Indexed: 12/30/2023]
Abstract
Due to its tolerance properties, Pseudomonas has gained particular interest as host for oxidative upgrading of the toxic aldehyde 5-hydroxymethylfurfural (HMF) into 2,5-furandicarboxylic acid (FDCA), a promising biobased alternative to terephthalate in polyesters. However, until now, the native enzymes responsible for aldehyde oxidation are unknown. Here, we report the identification of the primary HMF-converting enzymes of P. taiwanensis VLB120 and P. putida KT2440 by extended gene deletions. The key players in HMF oxidation are a molybdenum-dependent periplasmic oxidoreductase and a cytoplasmic dehydrogenase. Deletion of the corresponding genes almost completely abolished HMF oxidation, leading instead to aldehyde reduction. In this context, two HMF-reducing dehydrogenases were also revealed. These discoveries enabled enhancement of Pseudomonas' furanic aldehyde oxidation machinery by genomic overexpression of the respective genes. The resulting BOX strains (Boosted OXidation) represent superior hosts for biotechnological synthesis of FDCA from HMF. The increased oxidation rates provide greatly elevated HMF tolerance, thus tackling one of the major drawbacks of whole-cell catalysis with this aldehyde. Furthermore, the ROX (Reduced OXidation) and ROAR (Reduced Oxidation And Reduction) deletion mutants offer a solid foundation for future development of Pseudomonads as biotechnological chassis notably for scenarios where rapid HMF conversion is undesirable.
Collapse
Affiliation(s)
- Thorsten Lechtenberg
- Institute of Bio- and Geosciences IBG-1: Biotechnology, Forschungszentrum Jülich, 52425 Jülich, Germany.
| | - Benedikt Wynands
- Institute of Bio- and Geosciences IBG-1: Biotechnology, Forschungszentrum Jülich, 52425 Jülich, Germany.
| | - Nick Wierckx
- Institute of Bio- and Geosciences IBG-1: Biotechnology, Forschungszentrum Jülich, 52425 Jülich, Germany.
| |
Collapse
|
11
|
Zhu Q, Bai X, Li Q, Zhang M, Hu G, Pan K, Liu H, Ke Z, Hong Q, Qiu J. PcaR, a GntR/FadR Family Transcriptional Repressor Controls the Transcription of Phenazine-1-Carboxylic Acid 1,2-Dioxygenase Gene Cluster in Sphingomonas histidinilytica DS-9. Appl Environ Microbiol 2023; 89:e0212122. [PMID: 37191535 PMCID: PMC10304782 DOI: 10.1128/aem.02121-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 04/29/2023] [Indexed: 05/17/2023] Open
Abstract
In our previous study, the phenazine-1-carboxylic acid (PCA) 1,2-dioxygenase gene cluster (pcaA1A2A3A4 cluster) in Sphingomonas histidinilytica DS-9 was identified to be responsible for the conversion of PCA to 1,2-dihydroxyphenazine (Ren Y, Zhang M, Gao S, Zhu Q, et al. 2022. Appl Environ Microbiol 88:e00543-22). However, the regulatory mechanism of the pcaA1A2A3A4 cluster has not been elucidated yet. In this study, the pcaA1A2A3A4 cluster was found to be transcribed as two divergent operons: pcaA3-ORF5205 (named A3-5205 operon) and pcaA1A2-ORF5208-pcaA4-ORF5210 (named A1-5210 operon). The promoter regions of the two operons were overlapped. PcaR acts as a transcriptional repressor of the pcaA1A2A3A4 cluster, and it belongs to GntR/FadR family transcriptional regulator. Gene disruption of pcaR can shorten the lag phase of PCA degradation. The results of electrophoretic mobility shift assay and DNase I footprinting showed that PcaR binds to a 25-bp motif in the ORF5205-pcaA1 intergenic promoter region to regulate the expression of two operons. The 25-bp motif covers the -10 region of the promoter of A3-5205 operon and the -35 region and -10 region of the promoter of A1-5210 operon. The TNGT/ANCNA box within the motif was essential for PcaR binding to the two promoters. PCA acted as an effector of PcaR, preventing it from binding to the promoter region and repressing the transcription of the pcaA1A2A3A4 cluster. In addition, PcaR represses its own transcription, and this repression can be relieved by PCA. This study reveals the regulatory mechanism of PCA degradation in strain DS-9, and the identification of PcaR increases the variety of regulatory model of the GntR/FadR-type regulator. IMPORTANCE Sphingomonas histidinilytica DS-9 is a phenazine-1-carboxylic acid (PCA)-degrading strain. The 1,2-dioxygenase gene cluster (pcaA1A2A3A4 cluster, encoding dioxygenase PcaA1A2, reductase PcaA3, and ferredoxin PcaA4) is responsible for the initial degradation step of PCA and widely distributed in Sphingomonads, but its regulatory mechanism has not been investigated yet. In this study, a GntR/FadR-type transcriptional regulator PcaR repressing the transcription of pcaA1A2A3A4 cluster and pcaR gene was identified and characterized. The binding site of PcaR in ORF5205-pcaA1 intergenic promoter region contains a TNGT/ANCNA box, which is important for the binding. These findings enhance our understanding of the molecular mechanism of PCA degradation.
Collapse
Affiliation(s)
- Qian Zhu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Xuekun Bai
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Qian Li
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Mingliang Zhang
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Gang Hu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Kaihua Pan
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Hongfei Liu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Zhijian Ke
- School of Biological and Chemical Engineering, Ningbo Tech University, Ningbo, Zhejiang, People’s Republic of China
| | - Qing Hong
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Jiguo Qiu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| |
Collapse
|
12
|
Sharma D, Sharma K, Mishra A, Siwach P, Mittal A, Jayaram B. Molecular dynamics simulation-based trinucleotide and tetranucleotide level structural and energy characterization of the functional units of genomic DNA. Phys Chem Chem Phys 2023; 25:7323-7337. [PMID: 36825435 DOI: 10.1039/d2cp04820e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genomes of most organisms on earth are written in a universal language of life, made up of four units - adenine (A), thymine (T), guanine (G), and cytosine (C), and understanding the way they are put together has been a great challenge to date. Multiple efforts have been made to annotate this wonderfully engineered string of DNA using different methods but they lack a universal character. In this article, we have investigated the structural and energetic profiles of both prokaryotes and eukaryotes by considering two essential genomic sites, viz., the transcription start sites (TSS) and exon-intron boundaries. We have characterized these sites by mapping the structural and energy features of DNA obtained from molecular dynamics simulations, which considers all possible trinucleotide and tetranucleotide steps. For DNA, these physicochemical properties show distinct signatures at the TSS and intron-exon boundaries. Our results firmly convey the idea that DNA uses the same dialect for prokaryotes and eukaryotes and that it is worth going beyond sequence-level analyses to physicochemical space to determine the functional destiny of DNA sequences.
Collapse
Affiliation(s)
- Dinesh Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Kopal Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Aditya Mittal
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India.
| |
Collapse
|
13
|
Xu SQ, Wang X, Xu L, Wang KX, Jiang YH, Zhang FY, Hong Q, He J, Liu SJ, Qiu JG. The MocR family transcriptional regulator DnfR has multiple binding sites and regulates Dirammox gene transcription in Alcaligenes faecalis JQ135. Environ Microbiol 2023; 25:675-688. [PMID: 36527381 DOI: 10.1111/1462-2920.16318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022]
Abstract
Microbial ammonia oxidation is vital to the nitrogen cycle. A biological process, called Dirammox (direct ammonia oxidation, NH3 →NH2 OH→N2 ), has been recently identified in Alcaligenes ammonioxydans and Alcaligenes faecalis. However, its transcriptional regulatory mechanism has not yet been fully elucidated. The present study characterized a new MocR-like transcription factor DnfR that is involved in the Dirammox process in A. faecalis strain JQ135. The entire dnf cluster was composed of 10 genes and transcribed as five transcriptional units, that is, dnfIH, dnfR, dnfG, dnfABCDE and dnfF. DnfR activates the transcription of dnfIH, dnfG and dnfABCDE genes, and represses its own transcription. The intact 1506-bp dnfR gene was required for activation of Dirammox. Electrophoretic mobility shift assays and DNase I footprinting analyses showed that DnfR has one binding site in the dnfH-dnfR intergenic region and two binding sites in the dnfG-dnfA intergenic region. Three binding sites of DnfR shared a 6-bp repeated conserved sequence 5'-GGTCTG-N17 -GGTCTG-3' which was essential for the transcription of downstream target genes. Cysteine and glutamate act as possible effectors of DnfR to activate the transcription of transcriptional units of dnfG and dnfABCDE, respectively. This study provided new insights in the transcriptional regulation mechanism of Dirammox by DnfR in A. faecalis JQ135.
Collapse
Affiliation(s)
- Si-Qiong Xu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Xiao Wang
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Lu Xu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Ke-Xin Wang
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Yin-Hu Jiang
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Fu-Yin Zhang
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Qing Hong
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Jian He
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | - Shuang-Jiang Liu
- State Key Laboratory of Microbial Resources, and Environmental Microbiology Research Center at Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, China
| | - Ji-Guo Qiu
- Key Laboratory of Agricultural and Environmental Microbiology, Ministry of Agriculture and Rural Affairs, College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
14
|
Explainable artificial intelligence as a reliable annotator of archaeal promoter regions. Sci Rep 2023; 13:1763. [PMID: 36720898 PMCID: PMC9889792 DOI: 10.1038/s41598-023-28571-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/20/2023] [Indexed: 02/02/2023] Open
Abstract
Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.
Collapse
|
15
|
A Four-Step Platform to Optimize Growth Conditions for High-Yield Production of Siderophores in Cyanobacteria. Metabolites 2023; 13:metabo13020154. [PMID: 36837773 PMCID: PMC9967094 DOI: 10.3390/metabo13020154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/16/2023] [Accepted: 01/17/2023] [Indexed: 01/22/2023] Open
Abstract
In response to Iron deprivation and in specific environmental conditions, the cyanobacteria Anabaena flos aquae produce siderophores, iron-chelating molecules that in virtue of their interesting environmental and clinical applications, are recently gaining the interest of the pharmaceutical industry. Yields of siderophore recovery from in vitro producing cyanobacterial cultures are, unfortunately, very low and reach most of the times only analytical quantities. We here propose a four-step experimental pipeline for a rapid and inexpensive identification and optimization of growth parameters influencing, at the transcriptional level, siderophore production in Anabaena flos aquae. The four-steps pipeline consists of: (1) identification of the promoter region of the operon of interest in the genome of Anabaena flos aquae; (2) cloning of the promoter in a recombinant DNA vector, upstream the cDNA coding for the Green Fluorescent Protein (GFP) followed by its stable transformation in Escherichia Coli; (3) identification of the environmental parameters affecting expression of the gene in Escherichia coli and their application to the cultivation of the Anabaena strain; (4) identification of siderophores by the combined use of high-resolution tandem mass spectrometry and molecular networking. This multidisciplinary, sustainable, and green pipeline is amenable to automation and is virtually applicable to any cyanobacteria, or more in general, to any microorganisms.
Collapse
|
16
|
Abstract
As the global burden of antibiotic resistance continues to grow, creative approaches to antibiotic discovery are needed to accelerate the development of novel medicines. A rapidly progressing computational revolution-artificial intelligence-offers an optimistic path forward due to its ability to alleviate bottlenecks in the antibiotic discovery pipeline. In this review, we discuss how advancements in artificial intelligence are reinvigorating the adoption of past antibiotic discovery models-namely natural product exploration and small molecule screening. We then explore the application of contemporary machine learning approaches to emerging areas of antibiotic discovery, including antibacterial systems biology, drug combination development, antimicrobial peptide discovery, and mechanism of action prediction. Lastly, we propose a call to action for open access of high-quality screening datasets and interdisciplinary collaboration to accelerate the rate at which machine learning models can be trained and new antibiotic drugs can be developed.
Collapse
Affiliation(s)
- Telmah Lluka
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
17
|
Survey of mycobacterial fluoroquinolone resistance protein conservon (mfp conservon) in Mycobacteriaceae and identification of its promoter activity. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
18
|
Park SK, Mohr G, Yao J, Russell R, Lambowitz AM. Group II intron-like reverse transcriptases function in double-strand break repair. Cell 2022; 185:3671-3688.e23. [PMID: 36113466 PMCID: PMC9530004 DOI: 10.1016/j.cell.2022.08.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 06/16/2022] [Accepted: 08/14/2022] [Indexed: 01/26/2023]
Abstract
Bacteria encode reverse transcriptases (RTs) of unknown function that are closely related to group II intron-encoded RTs. We found that a Pseudomonas aeruginosa group II intron-like RT (G2L4 RT) with YIDD instead of YADD at its active site functions in DNA repair in its native host and when expressed in Escherichia coli. G2L4 RT has biochemical activities strikingly similar to those of human DNA repair polymerase θ and uses them for translesion DNA synthesis and double-strand break repair (DSBR) via microhomology-mediated end-joining (MMEJ). We also found that a group II intron RT can function similarly in DNA repair, with reciprocal active-site substitutions showing isoleucine favors MMEJ and alanine favors primer extension in both enzymes. These DNA repair functions utilize conserved structural features of non-LTR-retroelement RTs, including human LINE-1 and other eukaryotic non-LTR-retrotransposon RTs, suggesting such enzymes may have inherent ability to function in DSBR in a wide range of organisms.
Collapse
Affiliation(s)
- Seung Kuk Park
- Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX 78712, USA
| | - Georg Mohr
- Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX 78712, USA
| | - Jun Yao
- Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX 78712, USA
| | - Rick Russell
- Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX 78712, USA
| | - Alan M Lambowitz
- Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
19
|
Coppens L, Wicke L, Lavigne R. SAPPHIRE.CNN: Implementation of dRNA-seq-driven, species-specific promoter prediction using convolutional neural networks. Comput Struct Biotechnol J 2022; 20:4969-4974. [PMID: 36147675 PMCID: PMC9478156 DOI: 10.1016/j.csbj.2022.09.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/03/2022] [Accepted: 09/05/2022] [Indexed: 11/22/2022] Open
Abstract
Data availability is a consistent bottleneck for the development of bacterial species-specific promoter prediction software. In this work we leverage genome-wide promoter datasets generated with dRNA-seq in the Gram-negative bacteria Pseudomonas aeruginosa and Salmonella enterica for promoter prediction. Convolutional neural networks are presented as an optimal architecture for model training and are further modified and tailored for promoter prediction. The resulting predictors reach high binary accuracies (95% and 94.9%) on test sets and outperform each other when predicting promoters in their associated species. SAPPHIRE.CNN is available online and can also be downloaded to run locally. Our results indicate a dependency of binary promoter classification on an organism’s GC content and a decreased performance of our classifiers on genera they were not trained for, further supporting the need for dedicated, species-specific promoter classification tools.
Collapse
Affiliation(s)
- Lucas Coppens
- Department of Bioengineering and Imperial College Centre for Synthetic Biology, Imperial College London, London, UK.,Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium
| | - Laura Wicke
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium.,Institute for Molecular Infection Biology (IMIB), Medical Faculty, University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium
| |
Collapse
|
20
|
Chhaya A, Sharma A, Dattu Hade M, Kaur J, Dikshit KL. Transcript analysis and expression of the glbO gene, encoding truncated hemoglobin,O, of M. smegmatis implicate its role under hypoxia and oxidative stress. Gene X 2022; 841:146759. [PMID: 35933051 DOI: 10.1016/j.gene.2022.146759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 07/24/2022] [Indexed: 12/12/2022] Open
Abstract
Although truncated hemoglobin O, (trHbO), is ubiquitous among mycobacteria, its physiological function is not very obvious and may be diverse. In an attempt to understand role of trHbO in cellular metabolism of a non-pathogenic mycobacterium, we analysed expression profile of the glbO gene, encoding trHbO, in M. smegmatis and studied implications of its overexpression on physiology of its host under different environmental conditions. Quantitative RT-PCR indicated that transcript level of the glbO gene remains low at a basal level under aerobic growth cycle of M. smegmatis but its level gets induced significantly during low oxygen, oxidative stress and macrophage infection. Overexpression of the glbO gene enhanced growth of M. smegmatis under hypoxia, promoted pellicle biofilm formation and provided resistance towards oxidative stress. Additionally, glbO gene overexpressing M. smegmatis exhibited enhanced cell survival over isogenic control cells and altered the level of pro- and anti- inflammatory cytokines during intracellular infection. These results suggested important role of trHbO, in supporting the cellular metabolism and survival of M, smegmatis both under low oxygen and oxidative stress.
Collapse
Affiliation(s)
- Ajay Chhaya
- Department of Biotechnology, Panjab University, Chandigarh 160014, India
| | - Aashish Sharma
- Department of Biotechnology, Panjab University, Chandigarh 160014, India
| | - Mangesh Dattu Hade
- Department of Biotechnology, Panjab University, Chandigarh 160014, India
| | - Jagdeep Kaur
- Department of Biotechnology, Panjab University, Chandigarh 160014, India
| | - Kanak L Dikshit
- Department of Biotechnology, Panjab University, Chandigarh 160014, India.
| |
Collapse
|
21
|
Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022; 10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
- Genome Science and Technology Program, Faculty of Science, The University of British Columbia, Vancouver, Canada
| | - Pedro Lenz Casa
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Daniel Luis Notari
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila E Silva
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| |
Collapse
|
22
|
Machine learning and statistics shape a novel path in archaeal promoter annotation. BMC Bioinformatics 2022; 23:171. [PMID: 35538405 PMCID: PMC9087966 DOI: 10.1186/s12859-022-04714-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 05/05/2022] [Indexed: 11/29/2022] Open
Abstract
Background Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. Results and discussions In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. Concluding remarks The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04714-x.
Collapse
|
23
|
Zill D, Lettau E, Lorent C, Seifert F, Singh P, Lauterbach L. Crucial role of the chaperonin GroES/EL for heterologous production of the soluble methane monooxygenase from Methylomonas methanica MC09. Chembiochem 2022; 23:e202200195. [PMID: 35385600 PMCID: PMC9324122 DOI: 10.1002/cbic.202200195] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Indexed: 11/15/2022]
Abstract
Methane is a widespread energy source and can serve as an attractive C1 building block for a future bioeconomy. The soluble methane monooxygenase (sMMO) is able to break the strong C−H bond of methane and convert it to methanol. The high structural complexity, multiplex cofactors, and unfamiliar folding or maturation procedures of sMMO have hampered the heterologous production and thus biotechnological applications. Here, we demonstrate the heterologous production of active sMMO from the marine Methylomonas methanica MC09 in Escherichia coli by co‐synthesizing the GroES/EL chaperonin. Iron determination, electron paramagnetic resonance spectroscopy, and native gel immunoblots revealed the incorporation of the non‐heme diiron centre and homodimer formation of active sMMO. The production of recombinant sMMO will enable the expansion of the possibilities of detailed studies, allowing for a variety of novel biotechnological applications.
Collapse
Affiliation(s)
- Domenic Zill
- RWTH Aachen Fakultät für Mathematik Informatik und Naturwissenschaften: Rheinisch Westfalische Technische Hochschule Aachen Fakultat fur Mathematik Informatik und Naturwissenschaften, Institute of Applied Microbiology, GERMANY
| | - Elisabeth Lettau
- RWTH Aachen Faculty of Mathematics Computer Science and Natural Sciences: Rheinisch Westfalische Technische Hochschule Aachen Fakultat fur Mathematik Informatik und Naturwissenschaften, Institute of Applied Microbiology, GERMANY
| | - Christian Lorent
- TU Berlin: Technische Universitat Berlin, Institute for Chemistry, GERMANY
| | - Franziska Seifert
- Martin-Luther-Universität Halle-Wittenberg: Martin-Luther-Universitat Halle-Wittenberg, Institut für Pharmazeutische Technologie und Biopharmazie, GERMANY
| | - Praveen Singh
- RWTH Aachen Faculty of Mathematics Computer Science and Natural Sciences: Rheinisch Westfalische Technische Hochschule Aachen Fakultat fur Mathematik Informatik und Naturwissenschaften, Institute of Applied Microbiology, GERMANY
| | - Lars Lauterbach
- RWTH Aachen University: Rheinisch-Westfalische Technische Hochschule Aachen, Institute of Applied Microbiology, Worringer Weg 1, 52074, Aachen, GERMANY
| |
Collapse
|
24
|
Ye Q, Shin E, Lee C, Choi N, Kim Y, Yoon KS, Lee SJ. Transposition of insertion sequences by dielectric barrier discharge plasma and gamma irradiation in the radiation-resistant bacterium Deinococcus geothermalis. J Microbiol Methods 2022; 196:106473. [DOI: 10.1016/j.mimet.2022.106473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 04/19/2022] [Accepted: 04/19/2022] [Indexed: 12/27/2022]
|
25
|
Xu H, Yang C, Tian X, Chen Y, Liu WQ, Li J. Regulatory Part Engineering for High-Yield Protein Synthesis in an All- Streptomyces-Based Cell-Free Expression System. ACS Synth Biol 2022; 11:570-578. [PMID: 35129330 DOI: 10.1021/acssynbio.1c00587] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Streptomyces-based cell-free expression systems have been developed to meet the demand for synthetic biology applications. However, protein yields from the previous Streptomyces systems are relatively low, and there is a serious limitation of available genetic tools such as plasmids for gene (co)expression. Here, we sought to expand the plasmid toolkit with a focus on the enhancement of protein production. By screening native promoters and ribosome binding sites, we were able to construct a panel of plasmids with different abilities for protein synthesis, which covered a nearly 3-fold range of protein yields. Using the most efficient plasmid, the protein yield reached up to a maximum value of 515.7 ± 25.3 μg/mL. With the plasmid toolkit, we anticipate that our Streptomyces cell-free system will offer great opportunities for cell-free synthetic biology applications such as in vitro biosynthesis of valuable natural products when cell-based systems remain difficult or not amenable.
Collapse
Affiliation(s)
- Huiling Xu
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Chen Yang
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Xintong Tian
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Yilin Chen
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Wan-Qiu Liu
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Jian Li
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
26
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
27
|
Bhukya R, Kumari A, Amilpur S, Dasari CM. PPred-PCKSM: A multi-layer predictor for identifying promoter and its variants using position based features. Comput Biol Chem 2022; 97:107623. [DOI: 10.1016/j.compbiolchem.2022.107623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 01/02/2022] [Accepted: 01/05/2022] [Indexed: 11/03/2022]
|
28
|
Current and emerging tools of computational biology to improve the detoxification of mycotoxins. Appl Environ Microbiol 2021; 88:e0210221. [PMID: 34878810 DOI: 10.1128/aem.02102-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Biological organisms carry a rich potential for removing toxins from our environment, but identifying suitable candidates and improving them remain challenging. We explore the use of computational tools to discover strains and enzymes that detoxify harmful compounds. In particular, we will focus on mycotoxins-fungi-produced toxins that contaminate food and feed-and biological enzymes that are capable of rendering them less harmful. We discuss the use of established and novel computational tools to complement existing empirical data in three directions: discovering the prospect of detoxification among underexplored organisms, finding important cellular processes that contribute to detoxification, and improving the performance of detoxifying enzymes. We hope to create a synergistic conversation between researchers in computational biology and those in the bioremediation field. We showcase open bioremediation questions where computational researchers can contribute and highlight relevant existing and emerging computational tools that could benefit bioremediation researchers.
Collapse
|
29
|
Chevez-Guardado R, Peña-Castillo L. Promotech: a general tool for bacterial promoter recognition. Genome Biol 2021; 22:318. [PMID: 34789306 PMCID: PMC8597233 DOI: 10.1186/s13059-021-02514-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 10/11/2021] [Indexed: 12/14/2022] Open
Abstract
Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech's performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech .
Collapse
Affiliation(s)
- Ruben Chevez-Guardado
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada
| | - Lourdes Peña-Castillo
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada. .,Department of Biology, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada.
| |
Collapse
|
30
|
A Genome-Scale Antibiotic Screen in Serratia marcescens Identifies YdgH as a Conserved Modifier of Cephalosporin and Detergent Susceptibility. Antimicrob Agents Chemother 2021; 65:e0078621. [PMID: 34491801 DOI: 10.1128/aac.00786-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Serratia marcescens, a member of the order Enterobacterales, is adept at colonizing health care environments and is an important cause of invasive infections. Antibiotic resistance is a daunting problem in S. marcescens because, in addition to plasmid-mediated mechanisms, most isolates have considerable intrinsic resistance to multiple antibiotic classes. To discover endogenous modifiers of antibiotic susceptibility in S. marcescens, a high-density transposon insertion library was subjected to sub-MICs of two cephalosporins, cefoxitin, and cefepime, as well as the fluoroquinolone ciprofloxacin. Comparisons of transposon insertion abundance before and after antibiotic exposure identified hundreds of potential modifiers of susceptibility to these agents. Using single-gene deletions, we validated several candidate modifiers of cefoxitin susceptibility and chose ydgH, a gene of unknown function, for further characterization. In addition to cefoxitin, deletion of ydgH in S. marcescens resulted in decreased susceptibility to multiple third-generation cephalosporins and, in contrast, to increased susceptibility to both cationic and anionic detergents. YdgH is highly conserved throughout the Enterobacterales, and we observed similar phenotypes in Escherichia coli O157:H7 and Enterobacter cloacae mutants. YdgH is predicted to localize to the periplasm, and we speculate that it may be involved there in cell envelope homeostasis. Collectively, our findings provide insight into chromosomal mediators of antibiotic resistance in S. marcescens and will serve as a resource for further investigations of this important pathogen.
Collapse
|
31
|
Martinez-Hernandez F, Diop A, Garcia-Heredia I, Bobay LM, Martinez-Garcia M. Unexpected myriad of co-occurring viral strains and species in one of the most abundant and microdiverse viruses on Earth. ISME JOURNAL 2021; 16:1025-1035. [PMID: 34775488 DOI: 10.1038/s41396-021-01150-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 10/15/2021] [Accepted: 10/28/2021] [Indexed: 11/09/2022]
Abstract
Viral genetic microdiversity drives adaptation, pathogenicity, and speciation and has critical consequences for the viral-host arms race occurring at the strain and species levels, which ultimately impact microbial community structure and biogeochemical cycles. Despite the fact that most efforts have focused on viral macrodiversity, little is known about the microdiversity of ecologically important viruses on Earth. Recently, single-virus genomics discovered the putatively most abundant ocean virus in temperate and tropical waters: the uncultured dsDNA virus vSAG 37-F6 infecting Pelagibacter, the most abundant marine bacteria. In this study, we report the cooccurrence of up to ≈1,500 different viral strains (>95% nucleotide identity) and ≈30 related species (80-95% nucleotide identity) in a single oceanic sample. Viral microdiversity was maintained over space and time, and most alleles were the result of synonymous mutations without any apparent adaptive benefits to cope with host translation codon bias and efficiency. Gene flow analysis used to delimitate species according to the biological species concept (BSC) revealed the impact of recombination in shaping vSAG 37-F6 virus and Pelagibacter speciation. Data demonstrated that this large viral microdiversity somehow mirrors the host species diversity since ≈50% of the 926 analyzed Pelagibacter genomes were found to belong to independent BSC species that do not significantly engage in gene flow with one another. The host range of this evolutionarily successful virus revealed that a single viral species can infect multiple Pelagibacter BSC species, indicating that this virus crosses not only formal BSC barriers but also biomes since viral ancestors are found in freshwater.
Collapse
Affiliation(s)
| | - Awa Diop
- Department of Biology, University of North Carolina at Greensboro, Greensboro, USA
| | | | - Louis-Marie Bobay
- Department of Biology, University of North Carolina at Greensboro, Greensboro, USA
| | - Manuel Martinez-Garcia
- Department of Physiology, Genetics, and Microbiology, University of Alicante, Alicante, Spain.
| |
Collapse
|
32
|
Imported One-Day-Old Chicks as Trojan Horses for Multidrug-Resistant Priority Pathogens Harboring mcr-9, rmtG and Extended-Spectrum β-Lactamase Genes. Appl Environ Microbiol 2021; 88:e0167521. [PMID: 34731047 DOI: 10.1128/aem.01675-21] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Antimicrobial resistance is a critical issue that is no longer restricted to hospital settings, but also represents a growing problem involving intensive animal production systems. In this study, we have performed a microbiological and molecular investigation of priority pathogens carrying transferable resistance genes to critical antimicrobials in one-day-old chickens imported from Brazil to Uruguay. Bacterial identification was performed by MALDI-TOF mass spectrometry and antibiotic susceptibility was determined by Sensititre. Antimicrobial resistance genes were sought by polymerase chain reaction and clonality was assessed by PFGE. Four multidrug-resistant (MDR) representative strains were sequenced by Illumina and/or Oxford Nanopore Technologies. Twenty-eight MDR isolates identified as Escherichia coli (n= 14), Enterobacter cloacae (n= 11) and Klebsiella pneumoniae (n= 3). While resistance to oxyiminocephalosporins was due to blaCTX-M-2, blaCTX-M-8, blaCTX-M-15, blaCTX-M-55 and blaCMY-2, plasmid-mediated quinolone resistance was associated with qnrB19, qnrE1, and qnrB2 genes. Finally, resistance to aminoglycosides and fosfomycin was due to the presence of 16S rRNA methyltransferase rmtG and fosA-type genes, respectively. Short and long-read genome sequencing of E. cloacae ODC-Eclo3 strain revealed the presence of IncQ/rmtG (pUR-EC3.1, 7400-pb), IncHI2A/mcr-9.1/blaCTX-M-2 [pUR-EC3.2, ST16 (pMLST), 408,436-bp] and IncN2/qnrB19/aacC3/aph(3'')-Ib (pUR-EC3.3) resistance plasmids. Strikingly, the blaCTX-M-2 gene was carried by a novel Tn1696-like composite transposon designated Tn7337. In summary, we report that imported one-day-old chicks can act as Trojan horses for the hidden spread of WHO critical priority MDR pathogens harboring mcr-9, rmtG and extended-spectrum β-lactamase genes in poultry farms, which is a critical issue within a One Health perspective. Importance section Antimicrobial resistance is considered a significant problem for global health, including within the concept of "One Health", therefore, the food chain is a link that connects human and animal health directly. In this work, we searched for microorganisms resistant to antibiotics considered critical for human health in intestinal microbiota of one-day-old baby chicks imported to Uruguay from Brazil. We described antibiotic-resistant genes to antibiotics named as to watch or reserve for the WHO, such as rmtG or mcr9.1, which confers resistance to all the aminoglycosides and colistin, respectively, among others genes, and their presence in new mobile genetic elements that favor its dissemination. The sustained entry of these microorganisms evades the sanitary measures implemented by the countries and production establishments to reduce the selection of resistant microorganisms. These silently imported resistant microorganisms could explain a considerable part of the antimicrobial resistance problems found in the production stages of the system.
Collapse
|
33
|
Martinez GS, Sarkar S, Kumar A, Pérez‐Rueda E, de Avila e Silva S. Characterization of promoters in archaeal genomes based on DNA structural parameters. Microbiologyopen 2021; 10:e1230. [PMID: 34713600 PMCID: PMC8553660 DOI: 10.1002/mbo3.1230] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 07/27/2021] [Accepted: 07/29/2021] [Indexed: 11/10/2022] Open
Abstract
The transcription machinery of archaea can be roughly classified as a simplified version of eukaryotic organisms. The basal transcription factor machinery binds to the TATA box found around 28 nucleotides upstream of the transcription start site; however, some transcription units lack a clear TATA box and still have TBP/TFB binding over them. This apparent absence of conserved sequences could be a consequence of sequence divergence associated with the upstream region, operon, and gene organization. Furthermore, earlier studies have found that a structural analysis gains more information compared with a simple sequence inspection. In this work, we evaluated and coded 3630 archaeal promoter sequences of three organisms, Haloferax volcanii, Thermococcus kodakarensis, and Sulfolobus solfataricus into DNA duplex stability, enthalpy, curvature, and bendability parameters. We also split our dataset into conserved TATA and degenerated TATA promoters to identify differences among these two classes of promoters. The structural analysis reveals variations in archaeal promoter architecture, that is, a distinctive signal is observed in the TFB, TBP, and TFE binding sites independently of these being TATA-conserved or TATA-degenerated. In addition, the promoter encountering method was validated with upstream regions of 13 other archaea, suggesting that there might be promoter sequences among them. Therefore, we suggest a novel method for locating promoters within the genome of archaea based on DNA energetic/structural features.
Collapse
Affiliation(s)
| | - Sharmilee Sarkar
- Department of Molecular Biology and BiotechnologyTezpur UniversityTezpurAssamIndia
| | - Aditya Kumar
- Department of Molecular Biology and BiotechnologyTezpur UniversityTezpurAssamIndia
| | - Ernesto Pérez‐Rueda
- Unidad Académica de YucatánInstituto de Investigaciones en Matemáticas Aplicadas y en SistemasUniversidad Nacional Autónoma de MéxicoMéridaYucatánMéxico
| | | |
Collapse
|
34
|
Wang CY, Liu LC, Wu YC, Zhang YX. Identification and Validation of Four Novel Promoters for Gene Engineering with Broad Suitability across Species. J Microbiol Biotechnol 2021; 31:1154-1162. [PMID: 34226414 PMCID: PMC9706022 DOI: 10.4014/jmb.2103.03049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/24/2021] [Accepted: 06/27/2021] [Indexed: 12/15/2022]
Abstract
The transcriptional capacities of target genes are strongly influenced by promoters, whereas few studies have focused on the development of robust, high-performance and cross-species promoters for wide application in different bacteria. In this work, four novel promoters (Pk.rtufB, Pk.r1, Pk.r2, and Pk.r3) were predicted from Ketogulonicigenium robustum and their inconsistency in the -10 and -35 region nucleotide sequences indicated they were different promoters. Their activities were evaluated by using green fluorescent protein (gfp) as a reporter in different species of bacteria, including K. vulgare SPU B805, Pseudomonas putida KT2440, Paracoccus denitrificans PD1222, Bacillus licheniformis and Raoultella ornithinolytica, due to their importance in metabolic engineering. Our results showed that the four promoters had different activities, with Pk.r1 showing the strongest activity in almost all of the experimental bacteria. By comparison with the commonly used promoters of E. coli (tufB, lac, lacUV5), K. vulgare (Psdh, Psndh) and P. putida KT2440 (JE111411), the four promoters showed significant differences due to only 12.62% nucleotide similarities, and relatively higher ability in regulating target gene expression. Further validation experiments confirmed their ability in initiating the target minCD cassette because of the shape changes under the promoter regulation. The overexpression of sorbose dehydrogenase and cytochrome c551 by Pk.r1 and Pk.r2 resulted in a 22.75% enhancement of 2-KGA yield, indicating their potential for practical application in metabolic engineering. This study demonstrates an example of applying bioinformatics to find new biological components for gene operation and provides four novel promoters with broad suitability, which enriches the usable range of promoters to realize accurate regulation in different genetic backgrounds.
Collapse
Affiliation(s)
- Cai-Yun Wang
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang 110016, P.R. China
| | - Li-Cheng Liu
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang 110016, P.R. China
| | - Ying-Cai Wu
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang 110016, P.R. China
| | - Yi-Xuan Zhang
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, Shenyang 110016, P.R. China,Corresponding author Phone: +86-024-43520921 E-mail:
| |
Collapse
|
35
|
Martinez GS, de Ávila e Silva S, Kumar A, Pérez-Rueda E. DNA structural and physical properties reveal peculiarities in promoter sequences of the bacterium Escherichia coli K-12. SN APPLIED SCIENCES 2021. [DOI: 10.1007/s42452-021-04713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
AbstractThe gene transcription of bacteria starts with a promoter sequence being recognized by a transcription factor found in the RNAP enzyme, this process is assisted through the conservation of nucleotides as well as other factors governing these intergenic regions. Faced with this, the coding of genetic information into physical aspects of the DNA such as enthalpy, stability, and base-pair stacking could suggest promoter activity as well as protrude differentiation of promoter and non-promoter data. In this work, a total of 3131 promoter sequences associated to six different sigma factors in the bacterium E. coli were converted into numeric attributes, a strong set of control sequences referring to a shuffled version of the original sequences as well as coding regions is provided. Then, the parameterized genetic information was normalized, exhaustively analyzed through statistical tests. The results suggest that strong signals in the promoter sequences match the binding site of transcription factor proteins, indicating that promoter activity is well represented by its conversion into physical attributes. Moreover, the features tested in this report conveyed significant variances between promoter and control data, enabling these features to be employed in bacterial promoter classification. The results produced here may aid in bacterial promoter recognition by providing a robust set of biological inferences.
Collapse
|
36
|
Wilson EH, Groom JD, Sarfatis MC, Ford SM, Lidstrom ME, Beck DAC. A Computational Framework for Identifying Promoter Sequences in Nonmodel Organisms Using RNA-seq Data Sets. ACS Synth Biol 2021; 10:1394-1405. [PMID: 33988977 DOI: 10.1021/acssynbio.1c00017] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Engineering microorganisms into biological factories that convert renewable feedstocks into valuable materials is a major goal of synthetic biology; however, for many nonmodel organisms, we do not yet have the genetic tools, such as suites of strong promoters, necessary to effectively engineer them. In this work, we developed a computational framework that can leverage standard RNA-seq data sets to identify sets of constitutive, strongly expressed genes and predict strong promoter signals within their upstream regions. The framework was applied to a diverse collection of RNA-seq data measured for the methanotroph Methylotuvimicrobium buryatense 5GB1 and identified 25 genes that were constitutively, strongly expressed across 12 experimental conditions. For each gene, the framework predicted short (27-30 nucleotide) sequences as candidate promoters and derived -35 and -10 consensus promoter motifs (TTGACA and TATAAT, respectively) for strong expression in M. buryatense. This consensus closely matches the canonical E. coli sigma-70 motif and was found to be enriched in promoter regions of the genome. A subset of promoter predictions was experimentally validated in a XylE reporter assay, including the consensus promoter, which showed high expression. The pmoC, pqqA, and ssrA promoter predictions were additionally screened in an experiment that scrambled the -35 and -10 signal sequences, confirming that transcription initiation was disrupted when these specific regions of the predicted sequence were altered. These results indicate that the computational framework can make biologically meaningful promoter predictions and identify key pieces of regulatory systems that can serve as foundational tools for engineering diverse microorganisms for biomolecule production.
Collapse
Affiliation(s)
- Erin H. Wilson
- The Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Joseph D. Groom
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
| | - M. Claire Sarfatis
- Department of Microbiology, University of Washington, Seattle, Washington 98195, United States
| | - Stephanie M. Ford
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Mary E. Lidstrom
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
- Department of Microbiology, University of Washington, Seattle, Washington 98195, United States
| | - David A. C. Beck
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
- eScience Institute, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
37
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
38
|
Abstract
The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives. The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools having become popular to identify bacterial promoters, no systematic comparison of such tools has been performed. Here, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, iPro70-FMWin, 70ProPred, iPromoter-2L, and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used data sets of experimentally validated promoters from Escherichia coli and a control data set composed of randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC). We show that the widely used BPROM presented the worse performance among the compared tools, while four tools (CNNProm, iPro70-FMWin, 70ProPred, and iPromoter-2L) offered high predictive power. Of these tools, iPro70-FMWin exhibited the best results for most of the metrics used. We present here some potentials and limitations of available tools, and we hope that future work can build upon our effort to systematically characterize this useful class of bioinformatics tools. IMPORTANCE The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.
Collapse
|
39
|
Liu X, Guo Z, He T, Ren M. Prediction and analysis of prokaryotic promoters based on sequence features. Biosystems 2020; 197:104218. [PMID: 32755610 DOI: 10.1016/j.biosystems.2020.104218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/03/2020] [Accepted: 07/21/2020] [Indexed: 10/23/2022]
Abstract
Promoter recognition is an important part of functional genomic annotation but a difficult problem. Many studies have been carried out to address this issue. However, they still cannot meet application needs. Most of the methods exhibit specificity, and the objects analyzed are relatively simple, especially for prokaryotes. Hence, more research on prokaryotic promoters is lacking. In this study, the similarity between gene expression and the transmission of information inspired us to analyze promoter sequences by calculating the information content of the sequences and the correlation between sequences in the subregion. We also calculated other sequence features as supplements, such as the Hurst exponent, GC content, and sequence bending property. Then, we employed an artificial neural network to build a classifier and applied it to identify promoters in three organisms, Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa. The experiments on the benchmark test set indicate that our method has good capability to distinguish promoters from randomly selected nonpromoters. The maximal AUC for the classifier is 0.90, and the minimal AUC score is 0.80. Additionally, cross-species experiments were conducted. The AUC of the cross-experiment on three organisms yielded 0.8, suggesting that our approach has better generalization ability, which is conducive to revealing the more common characteristics of prokaryotic promoters.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | - Zhirui Guo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
40
|
Ren H, Shi C, Zhao H. Computational Tools for Discovering and Engineering Natural Product Biosynthetic Pathways. iScience 2020; 23:100795. [PMID: 31926431 PMCID: PMC6957853 DOI: 10.1016/j.isci.2019.100795] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/24/2019] [Accepted: 12/19/2019] [Indexed: 01/09/2023] Open
Abstract
Natural products (NPs), also known as secondary metabolites, are produced in bacteria, fungi, and plants. NPs represent a rich source of antibacterial, antifungal, and anticancer agents. Recent advances in DNA sequencing technologies and bioinformatics unveiled nature's great potential for synthesizing numerous NPs that may confer unprecedented structural and biological features. However, discovering novel bioactive NPs by genome mining remains a challenge. Moreover, even with interesting bioactivity, the low productivity of many NPs significantly limits their practical applications. Here we discuss the progress in developing bioinformatics tools for efficient discovery of bioactive NPs. In addition, we highlight computational methods for optimizing the productivity of NPs of pharmaceutical importance.
Collapse
Affiliation(s)
- Hengqian Ren
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Chengyou Shi
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Departments of Chemistry, Biochemistry, and Bioengineering, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
41
|
Lenzini L, Di Patti F, Livi R, Fondi M, Fani R, Mengoni A. A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features. Genes (Basel) 2019; 10:genes10100834. [PMID: 31652625 PMCID: PMC6826451 DOI: 10.3390/genes10100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 11/16/2022] Open
Abstract
In this paper, we propose a computational strategy for performing genome-wide analyses of intergenic sequences in bacterial genomes. Following similar directions of a previous paper, where a method for genome-wide analysis of eucaryotic Intergenic sequences was proposed, here we developed a tool for implementing similar concepts in bacteria genomes. This allows us to (i) classify intergenic sequences into clusters, characterized by specific global structural features and (ii) draw possible relations with their functional features.
Collapse
Affiliation(s)
- Leonardo Lenzini
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
| | - Francesca Di Patti
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
| | - Roberto Livi
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
| | - Marco Fondi
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Renato Fani
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Alessio Mengoni
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| |
Collapse
|
42
|
Coelho RV, Dall'Alba G, de Avila E Silva S, Echeverrigaray S, Delamare APL. Toward Algorithms for Automation of Postgenomic Data Analyses: Bacillus subtilis Promoter Prediction with Artificial Neural Network. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 24:300-309. [PMID: 31573385 DOI: 10.1089/omi.2019.0041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In the present postgenomic era, the capacity to generate big data has far exceeded the capacity to analyze, contextualize, and make sense of the data in clinical, biological, and ecological applications. There is a great unmet need for automation and algorithms to aid in analyses of big data, in biology in particular. In this context, it is noteworthy that computational methods used to analyze the regulation of bacterial gene expression have in the past focused mainly on Escherichia coli promoters due to the large amount of data available. The challenge and prospects of automation in prediction and recognition of bacteria sequences as promoters have not been properly addressed due to the promoter size and degenerate pattern. We report here an original neural network approach for recognition and prediction of Bacillus subtilis promoters. The artificial neural network used as input 767 B. subtilis promoter sequences, while also aiming at identifying the architecture, provides the most optimal prediction. Two multilayer perceptron neural network architectures offered the highest accuracy: one with five, and another with seven neurons in the hidden layer. Each architecture achieved an accuracy of 98.57% and 97.69%, respectively. The results collectively indicate the promise of the application of neural network approaches to the B. subtilis promoter recognition problem, while also suggesting the broader potential of algorithms for automation of data analyses in the postgenomic era.
Collapse
Affiliation(s)
- Rafael Vieira Coelho
- Farroupilha Campus, Rio Grande do Sul Federal Institute of Education, Science and Technology (IFRS), Farroupilha, Brazil
| | - Gabriel Dall'Alba
- Biotechnology Institute, Caxias do Sul University (UCS), Caxias do Sul, Brazil
| | | | | | | |
Collapse
|
43
|
Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns. Sci Rep 2018; 8:17695. [PMID: 30523308 PMCID: PMC6283834 DOI: 10.1038/s41598-018-36308-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 11/12/2018] [Indexed: 11/18/2022] Open
Abstract
Prediction of promoter regions is crucial for studying gene function and regulation. The well-accepted position weight matrix method for this purpose relies on predefined motifs, which would hinder application across different species. Here, we introduce image-based promoter prediction (IBPP) as a method that creates an “image” from training promoter sequences using an evolutionary approach and predicts promoters by matching with the “image”. We used Escherichia coli σ70 promoter sequences to test the performance of IBPP and the combination of IBPP and a support vector machine algorithm (IBPP-SVM). The “images” generated with IBPP could effectively distinguish promoter from non-promoter sequences. Compared with IBPP, IBPP-SVM showed a substantial improvement in sensitivity. Furthermore, both methods showed good performance for sequences of up to 2,000 nt in length. The performances of IBPP and IBPP-SVM were largely affected by the threshold and dimension of vectors, respectively. The source code and documentation are freely available at https://github.com/hahatcdg/IBPP.
Collapse
|
44
|
Dall'Alba G, Casa PL, Notari DL, Adami AG, Echeverrigaray S, de Avila E Silva S. Analysis of the nucleotide content of Escherichia coli promoter sequences related to the alternative sigma factors. J Mol Recognit 2018; 32:e2770. [PMID: 30458580 DOI: 10.1002/jmr.2770] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 10/23/2018] [Accepted: 10/24/2018] [Indexed: 01/26/2023]
Abstract
Promoters are DNA sequences located upstream of the transcription start site of genes. In bacteria, the RNA polymerase enzyme requires additional subunits, called sigma factors (σ) to begin specific gene transcription in distinct environmental conditions. Currently, promoter prediction still poses many challenges due to the characteristics of these sequences. In this paper, the nucleotide content of Escherichia coli promoter sequences, related to five alternative σ factors, was analyzed by a machine learning technique in order to provide profiles according to the σ factor which recognizes them. For this, the clustering technique was applied since it is a viable method for finding hidden patterns on a data set. As a result, 20 groups of sequences were formed, and, aided by the Weblogo tool, it was possible to determine sequence profiles. These found patterns should be considered for implementing computational prediction tools. In addition, evidence was found of an overlap between the functions of the genes regulated by different σ factors, suggesting that DNA structural properties are also essential parameters for further studies.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Department of Life Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Pedro Lenz Casa
- Department of Life Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Daniel Luis Notari
- Department of Exact Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Andre Gustavo Adami
- Department of Exact Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Sergio Echeverrigaray
- Department of Life Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Scheila de Avila E Silva
- Department of Exact Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil
| |
Collapse
|
45
|
Coelho RV, de Avila E Silva S, Echeverrigaray S, Delamare APL. Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria. Data Brief 2018; 19:264-270. [PMID: 29892645 PMCID: PMC5993011 DOI: 10.1016/j.dib.2018.05.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 04/02/2018] [Accepted: 05/07/2018] [Indexed: 11/28/2022] Open
Abstract
This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.
Collapse
Affiliation(s)
- Rafael Vieira Coelho
- Rio Grande do Sul Federal Institute of Education, Science and Technology (IFRS), Farroupilha Campus, Farroupilha, RS, Brazil
| | | | - Sergio Echeverrigaray
- Biotechnology Institute, University of Caxias do Sul (UCS), Caxias do Sul, RS, Brazil
| | | |
Collapse
|
46
|
Chang SC, Lee CY. OpaR and RpoS are positive regulators of a virulence factor PrtA in Vibrio parahaemolyticus. Microbiology (Reading) 2018; 164:221-231. [DOI: 10.1099/mic.0.000591] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- San-Chi Chang
- Department of Agricultural Chemistry, Microbiology Laboratory, National Taiwan University, Taipei, Taiwan, ROC
| | - Chia-Yin Lee
- Department of Agricultural Chemistry, Microbiology Laboratory, National Taiwan University, Taipei, Taiwan, ROC
| |
Collapse
|
47
|
Shahmuradov IA, Mohamad Razali R, Bougouffa S, Radovanovic A, Bajic VB. bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 2017; 33:334-340. [PMID: 27694198 PMCID: PMC5408793 DOI: 10.1093/bioinformatics/btw629] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 09/27/2016] [Indexed: 12/01/2022] Open
Abstract
Motivation The computational search for promoters in prokaryotes remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. In any bacterial genome, the transcription start site is chosen mostly by the sigma (σ) factor proteins, which control the gene activation. The majority of published bacterial promoter prediction tools target σ70 promoters in Escherichia coli. Moreover, no σ-specific classification of promoters is available for prokaryotes other than for E. coli. Results Here, we introduce bTSSfinder, a novel tool that predicts putative promoters for five classes of σ factors in Cyanobacteria (σA, σC, σH, σG and σF) and for five classes of sigma factors in E. coli (σ70, σ38, σ32, σ28 and σ24). Comparing to currently available tools, bTSSfinder achieves higher accuracy (MCC = 0.86, F1-score = 0.93) compared to the next best tool with MCC = 0.59, F1-score = 0.79) and covers multiple classes of promoters. Availability and Implementation bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilham Ayub Shahmuradov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Rozaimi Mohamad Razali
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Salim Bougouffa
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Aleksandar Radovanovic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
48
|
Kernan T, West AC, Banta S. Characterization of endogenous promoters for control of recombinant gene expression in
Acidithiobacillus ferrooxidans. Biotechnol Appl Biochem 2017; 64:793-802. [DOI: 10.1002/bab.1546] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 11/16/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Timothy Kernan
- Department of Physiology & Cellular Biophysics Columbia University New York NY USA
| | - Alan C. West
- Department of Chemical Engineering Columbia University New York NY USA
| | - Scott Banta
- Department of Chemical Engineering Columbia University New York NY USA
| |
Collapse
|
49
|
Kumar A, Manivelan V, Bansal M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains ofHelicobacter pylori. FEMS Microbiol Lett 2016; 363:fnw207. [DOI: 10.1093/femsle/fnw207] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2016] [Indexed: 12/19/2022] Open
|
50
|
Abbas MM, Mohie-Eldin MM, EL-Manzalawy Y. Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors. PLoS One 2015; 10:e0119721. [PMID: 25803493 PMCID: PMC4372424 DOI: 10.1371/journal.pone.0119721] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 01/26/2015] [Indexed: 11/27/2022] Open
Abstract
As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.
Collapse
Affiliation(s)
- Mostafa M. Abbas
- KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar
| | | | - Yasser EL-Manzalawy
- Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
- College of Information Sciences, Penn State University, University Park, United States of America
| |
Collapse
|