1
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknome". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for over 450 enzymes of unknown function from the model bacteria Escherichia coli uxgsing the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome. Article Summary Many proteins in any genome, ranging from 30 to 70%, lack an assigned function. This knowledge gap limits the full use of the vast available genomic data. Machine learning has shown promise in transferring functional knowledge from proteins of known functions to similar ones, but largely fails to predict novel functions not seen in its training data. Understanding these failures can guide the development of better machine-learning methods to help experts make accurate functional predictions for uncharacterized proteins.
Collapse
|
2
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
3
|
Glockow T, Kaster AK, Rabe KS, Niemeyer CM. Sustainable agriculture: leveraging microorganisms for a circular economy. Appl Microbiol Biotechnol 2024; 108:452. [PMID: 39212740 PMCID: PMC11364797 DOI: 10.1007/s00253-024-13294-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 08/20/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Microorganisms serve as linchpins in agricultural systems. Classic examples include microbial composting for nutrient recovery, using microorganisms in biogas technology for agricultural waste utilization, and employing biofilters to reduce emissions from stables or improve water quality in aquaculture. This mini-review highlights the importance of microbiome analysis in understanding microbial diversity, dynamics, and functions, fostering innovations for a more sustainable agriculture. In this regard, customized microorganisms for soil improvement, replacements for harmful agrochemicals or antibiotics in animal husbandry, and (probiotic) additives in animal nutrition are already in or even beyond the testing phase for a large-scale conventional agriculture. Additionally, as climate change reduces arable land, new strategies based on closed-loop systems and controlled environment agriculture, emphasizing microbial techniques, are being developed for regional food production. These strategies aim to secure the future food supply and pave the way for a sustainable, resilient, and circular agricultural economy. KEY POINTS: • Microbial strategies facilitate the integration of multiple trophic levels, essential for cycling carbon, nitrogen, phosphorus, and micronutrients. • Exploring microorganisms in integrated biological systems is essential for developing practical agricultural solutions. • Technological progress makes sustainable closed-entity re-circulation systems possible, securing resilient future food production.
Collapse
Affiliation(s)
- Till Glockow
- Acheron GmbH, Auf Der Muggenburg 30, 28217, Bremen, Germany
| | - Anne-Kristin Kaster
- Karlsruhe Institute of Technology (KIT), Institute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Hermann-Von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Kersten S Rabe
- Karlsruhe Institute of Technology (KIT), Institute for Biological Interfaces 1 (IBG-1), Biomolecular Micro- and Nanostructures, Hermann-Von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Christof M Niemeyer
- Karlsruhe Institute of Technology (KIT), Institute for Biological Interfaces 1 (IBG-1), Biomolecular Micro- and Nanostructures, Hermann-Von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.
| |
Collapse
|
4
|
Tominaga K, Ozaki S, Sato S, Katayama T, Nishimura Y, Omae K, Iwasaki W. Frequent nonhomologous replacement of replicative helicase loaders by viruses in Vibrionaceae. Proc Natl Acad Sci U S A 2024; 121:e2317954121. [PMID: 38683976 PMCID: PMC11087808 DOI: 10.1073/pnas.2317954121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 03/14/2024] [Indexed: 05/02/2024] Open
Abstract
Several microbial genomes lack textbook-defined essential genes. If an essential gene is absent from a genome, then an evolutionarily independent gene of unknown function complements its function. Here, we identified frequent nonhomologous replacement of an essential component of DNA replication initiation, a replicative helicase loader gene, in Vibrionaceae. Our analysis of Vibrionaceae genomes revealed two genes with unknown function, named vdhL1 and vdhL2, that were substantially enriched in genomes without the known helicase-loader genes. These genes showed no sequence similarities to genes with known function but encoded proteins structurally similar with a viral helicase loader. Analyses of genomic syntenies and coevolution with helicase genes suggested that vdhL1/2 encodes a helicase loader. The in vitro assay showed that Vibrio harveyi VdhL1 and Vibrio ezurae VdhL2 promote the helicase activity of DnaB. Furthermore, molecular phylogenetics suggested that vdhL1/2 were derived from phages and replaced an intrinsic helicase loader gene of Vibrionaceae over 20 times. This high replacement frequency implies the host's advantage in acquiring a viral helicase loader gene.
Collapse
Affiliation(s)
- Kento Tominaga
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-0882, Japan
| | - Shogo Ozaki
- Department of Molecular Biology, Graduate School of Pharmaceutical Sciences, Kyushu University, Fukuoka812-8582, Japan
| | - Shohei Sato
- Department of Molecular Biology, Graduate School of Pharmaceutical Sciences, Kyushu University, Fukuoka812-8582, Japan
| | - Tsutomu Katayama
- Department of Molecular Biology, Graduate School of Pharmaceutical Sciences, Kyushu University, Fukuoka812-8582, Japan
| | - Yuki Nishimura
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-0882, Japan
| | - Kimiho Omae
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-0882, Japan
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-0882, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo113-0032, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-0882, Japan
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba277-8564, Japan
- Institute for Quantitative Biosciences, The University of Tokyo, Tokyo113-0032, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| |
Collapse
|
5
|
Kim MJ, Martin CA, Kim J, Jablonski MM. Computational methods in glaucoma research: Current status and future outlook. Mol Aspects Med 2023; 94:101222. [PMID: 37925783 PMCID: PMC10842846 DOI: 10.1016/j.mam.2023.101222] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/06/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023]
Abstract
Advancements in computational techniques have transformed glaucoma research, providing a deeper understanding of genetics, disease mechanisms, and potential therapeutic targets. Systems genetics integrates genomic and clinical data, aiding in identifying drug targets, comprehending disease mechanisms, and personalizing treatment strategies for glaucoma. Molecular dynamics simulations offer valuable molecular-level insights into glaucoma-related biomolecule behavior and drug interactions, guiding experimental studies and drug discovery efforts. Artificial intelligence (AI) technologies hold promise in revolutionizing glaucoma research, enhancing disease diagnosis, target identification, and drug candidate selection. The generalized protocols for systems genetics, MD simulations, and AI model development are included as a guide for glaucoma researchers. These computational methods, however, are not separate and work harmoniously together to discover novel ways to combat glaucoma. Ongoing research and progresses in genomics technologies, MD simulations, and AI methodologies project computational methods to become an integral part of glaucoma research in the future.
Collapse
Affiliation(s)
- Minjae J Kim
- Department of Ophthalmology, The Hamilton Eye Institute, The University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| | - Cole A Martin
- Department of Ophthalmology, The Hamilton Eye Institute, The University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| | - Jinhwa Kim
- Graduate School of Artificial Intelligence, Graduate School of Metaverse, Department of Management Information Systems, Sogang University, 1 Shinsoo-Dong, Mapo-Gu, Seoul, South Korea.
| | - Monica M Jablonski
- Department of Ophthalmology, The Hamilton Eye Institute, The University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
6
|
Metze F, Vollmers J, Lenk F, Kaster AK. First shotgun metagenomics study of Juan de Fuca deep-sea sediments reveals distinct microbial communities above, within, between, and below sulfate methane transition zones. Front Microbiol 2023; 14:1241810. [PMID: 38053553 PMCID: PMC10694467 DOI: 10.3389/fmicb.2023.1241810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 10/03/2023] [Indexed: 12/07/2023] Open
Abstract
The marine deep subsurface is home to a vast microbial ecosystem, affecting biogeochemical cycles on a global scale. One of the better-studied deep biospheres is the Juan de Fuca (JdF) Ridge, where hydrothermal fluid introduces oxidants into the sediment from below, resulting in two sulfate methane transition zones (SMTZs). In this study, we present the first shotgun metagenomics study of unamplified DNA from sediment samples from different depths in this stratified environment. Bioinformatic analyses showed a shift from a heterotrophic, Chloroflexota-dominated community above the upper SMTZ to a chemolithoautotrophic Proteobacteria-dominated community below the secondary SMTZ. The reintroduction of sulfate likely enables respiration and boosts active cells that oxidize acetate, iron, and complex carbohydrates to degrade dead biomass in this low-abundance, low-diversity environment. In addition, analyses showed many proteins of unknown function as well as novel metagenome-assembled genomes (MAGs). The study provides new insights into microbial communities in this habitat, enabled by an improved DNA extraction protocol that allows a less biased view of taxonomic composition and metabolic activities, as well as uncovering novel taxa. Our approach presents the first successful attempt at unamplified shotgun sequencing samples from beyond 50 meters below the seafloor and opens new ways for capturing the true diversity and functional potential of deep-sea sediments.
Collapse
Affiliation(s)
| | | | | | - Anne-Kristin Kaster
- Institute for Biological Interfaces (IBG 5), Karlsruhe Institute of Technology, Hermann-von-Helmholtz Platz, Karlsruhe, Germany
| |
Collapse
|
7
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
8
|
Münch JM, Sobol MS, Brors B, Kaster AK. Single-cell transcriptomics and data analyses for prokaryotes-Past, present and future concepts. ADVANCES IN APPLIED MICROBIOLOGY 2023; 123:1-39. [PMID: 37400172 DOI: 10.1016/bs.aambs.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Transcriptomics, or more specifically mRNA sequencing, is a powerful tool to study gene expression at the single-cell level (scRNA-seq) which enables new insights into a plethora of biological processes. While methods for single-cell RNA-seq in eukaryotes are well established, application to prokaryotes is still challenging. Reasons for that are rigid and diverse cell wall structures hampering lysis, the lack of polyadenylated transcripts impeding mRNA enrichment, and minute amounts of RNA requiring amplification steps before sequencing. Despite those obstacles, several promising scRNA-seq approaches for bacteria have been published recently, albeit difficulties in the experimental workflow and data processing and analysis remain. In particular, bias is often introduced by amplification which makes it difficult to distinguish between technical noise and biological variation. Future optimization of experimental procedures and data analysis algorithms are needed for the improvement of scRNA-seq but also to aid in the emergence of prokaryotic single-cell multi-omics. to help address 21st century challenges in the biotechnology and health sector.
Collapse
Affiliation(s)
- Julia M Münch
- Institute for Biological Interfaces 5, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany; Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Biosciences, Heidelberg University, Heidelberg, Germany; HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
| | - Morgan S Sobol
- Institute for Biological Interfaces 5, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany; HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany; HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany.
| |
Collapse
|