1
|
Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T. Language models can identify enzymatic binding sites in protein sequences. Comput Struct Biotechnol J 2024; 23:1929-1937. [PMID: 38736695 PMCID: PMC11087710 DOI: 10.1016/j.csbj.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/05/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.
Collapse
Affiliation(s)
| | - Loïc Kwate Dassi
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Matteo Manica
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Daniel Probst
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Philippe Schwaller
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Teodoro Laino
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| |
Collapse
|
2
|
Lai PT, Coudert E, Aimo L, Axelsen K, Breuza L, de Castro E, Feuermann M, Morgat A, Pourcel L, Pedruzzi I, Poux S, Redaschi N, Rivoire C, Sveshnikova A, Wei CH, Leaman R, Luo L, Lu Z, Bridge A. EnzChemRED, a rich enzyme chemistry relation extraction dataset. Sci Data 2024; 11:982. [PMID: 39251610 PMCID: PMC11384730 DOI: 10.1038/s41597-024-03835-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 08/23/2024] [Indexed: 09/11/2024] Open
Abstract
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.
Collapse
Affiliation(s)
- Po-Ting Lai
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA
| | - Elisabeth Coudert
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Lucila Aimo
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Kristian Axelsen
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Lionel Breuza
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Edouard de Castro
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Marc Feuermann
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Lucille Pourcel
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Ivo Pedruzzi
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Sylvain Poux
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Nicole Redaschi
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Catherine Rivoire
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Anastasia Sveshnikova
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA
| | - Robert Leaman
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA
| | - Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, 116024, Dalian, China
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, 20894, USA.
| | - Alan Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211, Geneva, 4, Switzerland.
| |
Collapse
|
3
|
Brejchova J, Brejchova K, Kuda O. Metabolic Pathways of Acylcarnitine Synthesis. Physiol Res 2024; 73:S153-S163. [PMID: 38752770 DOI: 10.33549/physiolres.935261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024] Open
Abstract
Acylcarnitines are important markers in metabolic studies of many diseases, including metabolic, cardiovascular, and neurological disorders. We reviewed analytical methods for analyzing acylcarnitines with respect to the available molecular structural information, the technical limitations of legacy methods, and the potential of new mass spectrometry-based techniques to provide new information on metabolite structure. We summarized the nomenclature of acylcarnitines based on historical common names and common abbreviations, and we propose the use of systematic abbreviations derived from the shorthand notation for lipid structures. The transition to systematic nomenclature will facilitate acylcarnitine annotation, reporting, and standardization in metabolomics. We have reviewed the metabolic origins of acylcarnitines important for the biological interpretation of human metabolomic profiles. We identified neglected isomers of acylcarnitines and summarized the metabolic pathways involved in the synthesis and degradation of acylcarnitines, including branched-chain lipids and amino acids. We reviewed the primary literature, mapped the metabolic transformations of acyl-CoAs to acylcarnitines, and created a freely available WikiPathway WP5423 to help researchers navigate the acylcarnitine field. The WikiPathway was curated, metabolites and metabolic reactions were annotated, and references were included. We also provide a table for conversion between common names and abbreviations and systematic abbreviations linked to the LIPID MAPS or Human Metabolome Database.
Collapse
Affiliation(s)
- J Brejchova
- Laboratory of Metabolism of Bioactive Lipids, Institute of Physiology of the Czech Academy of Sciences, Prague, Czech Republic.
| | | | | |
Collapse
|
4
|
d’Oelsnitz S, Love JD, Ellington AD, Ross D. Ligify: Automated Genome Mining for Ligand-Inducible Transcription Factors. ACS Synth Biol 2024; 13:2577-2586. [PMID: 39029917 PMCID: PMC11334909 DOI: 10.1021/acssynbio.4c00372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/27/2024] [Accepted: 07/01/2024] [Indexed: 07/21/2024]
Abstract
Prokaryotic transcription factors can be repurposed into biosensors for the ligand-inducible control of gene expression, but the landscape of chemical ligands for which biosensors exist is extremely limited. To expand this landscape, we developed Ligify, a web application that leverages information in enzyme reaction databases to predict transcription factors that may be responsive to user-defined chemicals. Candidate transcription factors are then incorporated into automatically generated plasmid sequences that are designed to express GFP in response to the target chemical. Our benchmarking analyses demonstrated that Ligify correctly predicted 31/100 previously validated biosensors and highlighted strategies for further improvement. We then used Ligify to build a panel of genetic circuits that could induce a 47-fold, 5-fold, 9-fold, and 27-fold change in fluorescence in response to D-ribose, L-sorbose, isoeugenol, and 4-vinylphenol, respectively. Ligify should enhance the ability of researchers to quickly develop biosensors for an expanded range of chemicals and is publicly available at https://ligify.groov.bio.
Collapse
Affiliation(s)
- Simon d’Oelsnitz
- Department
of Molecular Biosciences, University of
Texas at Austin, Austin, Texas 78712, United States
| | - Joshua D. Love
- Independent
Web Developer, Bentonville, Arkansas 72712, United States
| | - Andrew D. Ellington
- Department
of Molecular Biosciences, University of
Texas at Austin, Austin, Texas 78712, United States
| | - David Ross
- National
Institute of Standards and Technology, Gaithersburg, Maryland 20878, United States
| |
Collapse
|
5
|
Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford) 2024; 2024:baae073. [PMID: 39137905 PMCID: PMC11321244 DOI: 10.1093/database/baae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/24/2024] [Accepted: 07/10/2024] [Indexed: 08/15/2024]
Abstract
Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). To gain a better understanding of organisms, there is a need to capture the functional impact of glycans and glycosylation on biological processes. A workshop titled "Functional impact of glycans and their curation" was held in conjunction with the 16th Annual International Biocuration Conference to discuss ongoing worldwide activities related to glycan function curation. This workshop brought together subject matter experts, tool developers, and biocurators from over 20 projects and bioinformatics resources. Participants discussed four key topics for each of their resources: (i) how they curate glycan function-related data from publications and other sources, (ii) what type of data they would like to acquire, (iii) what data they currently have, and (iv) what standards they use. Their answers contributed input that provided a comprehensive overview of state-of-the-art glycan function curation and annotations. This report summarizes the outcome of discussions, including potential solutions and areas where curators, data wranglers, and text mining experts can collaborate to address current gaps in glycan and glycosylation annotations, leveraging each other's work to improve their respective resources and encourage impactful data sharing among resources. Database URL: https://wiki.glygen.org/Glycan_Function_Workshop_2023.
Collapse
Affiliation(s)
- Karina Martinez
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Jon Agirre
- York Structural Biology Laboratory, Department of Chemistry, University of York, Wentworth Way, York YO10 5DD, United Kingdom
| | - Yukie Akune
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Kiyoko F Aoki-Kinoshita
- Glycan and Life Systems Integration Center (GaLSIC), Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Kristian B Axelsen
- Swiss-Prot Group, Swiss Institute of Bioinformatics (SIB), CMU, 1 rue Michel Servet, Geneva 4 1211, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Emily Bordeleau
- Michael Smith Laboratories, The University of British Columbia, 2185 East Mall, Vancouver, British Columbia V6T 1Z4, Canada
| | - Nathan J Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, 2115 Wisconsin Ave NW, Washington, DC 20007, United States
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Ten Feizi
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Catherine Hayes
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Callum M Ives
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Hiren J Joshi
- Copenhagen Center for Glycomics, Department of Cellular and Molecular Medicine, Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen DK-2200, Denmark
| | - Khakurel Krishna Prasad
- ELI Beamlines Facility, The Extreme Light Infrastructure ERIC, Za Radnicí 835, Dolní Břežany 25241, Czech Republic
| | - Sofia Kossida
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Yan Liu
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig-University Gießen, Frankfurter Str. 100, Gießen 35392, Germany
| | - Junfeng Ma
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3900 Reservior Road NW, Washington, DC 20007, United States
| | - Adnan Malik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Akul Y Mehta
- Department of Surgery, Beth Israel Deaconess Medical Center, National Center for Functional Glycomics, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Sriram Neelamegham
- Departments of Chemical & Biological Engineering, Biomedical Engineering and Medicine, University at Buffalo, State University of New York, 906 Furnas Hall, Buffalo, NY 14260, United States
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - René Ranzinger
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - Sylvie Ricard-Blum
- Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, University Lyon 1, CNRS, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex F-69622, France
| | - Gaoussou Sanou
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Vijay Shanker
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, 2001 N Soto Street, Los Angeles, CA 90032, United States
| | - Michael Tiemeyer
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7 B, Gothenburg 41390, Sweden
| | - Randi Vita
- Immune Epitope Database and Analysis Project, La Jolla Institute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037, United States
| | - Jeet Vora
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| | - Raja Mazumder
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| |
Collapse
|
6
|
Shiroma H, Darzi Y, Terajima E, Nakagawa Z, Tsuchikura H, Tsukuda N, Moriya Y, Okuda S, Goto S, Yamada T. Enteropathway: the metabolic pathway database for the human gut microbiota. Brief Bioinform 2024; 25:bbae419. [PMID: 39222063 PMCID: PMC11367760 DOI: 10.1093/bib/bbae419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/09/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024] Open
Abstract
The human gut microbiota produces diverse, extensive metabolites that have the potential to affect host physiology. Despite significant efforts to identify metabolic pathways for producing these microbial metabolites, a comprehensive metabolic pathway database for the human gut microbiota is still lacking. Here, we present Enteropathway, a metabolic pathway database that integrates 3269 compounds, 3677 reactions, and 876 modules that were obtained from 1012 manually curated scientific literature. Notably, 698 modules of these modules are new entries and cannot be found in any other databases. The database is accessible from a web application (https://enteropathway.org) that offers a metabolic diagram for graphical visualization of metabolic pathways, a customization interface, and an enrichment analysis feature for highlighting enriched modules on the metabolic diagram. Overall, Enteropathway is a comprehensive reference database that can complement widely used databases, and a tool for visual and statistical analysis in human gut microbiota studies and was designed to help researchers pinpoint new insights into the complex interplay between microbiota and host metabolism.
Collapse
Affiliation(s)
- Hirotsugu Shiroma
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Youssef Darzi
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
- Omixer solutions, 4-7-15, Zaimokuza, Kamakura-shi, Kanagawa 248-0013, Japan
| | - Etsuko Terajima
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Zenichi Nakagawa
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Hirotaka Tsuchikura
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Naoki Tsukuda
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Yuki Moriya
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, Japan
| | - Shujiro Okuda
- Graduate School of Medical and Dental Sciences, Niigata University, 2-5274, Gakkocho-dori, Chuo-ku, Niigata City, Niigata 951-8514, Japan
| | - Susumu Goto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, Japan
| | - Takuji Yamada
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 M6-3 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
- Metagen, Inc., 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan
- Metagen Theurapeutics, Inc., 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan
- Digzyme, Inc., 2-2-1 Toranomon, Minato-ku, Tokyo 105-0001, Japan
| |
Collapse
|
7
|
Zeng T, Jin Z, Zheng S, Yu T, Wu R. Developing BioNavi for Hybrid Retrosynthesis Planning. JACS AU 2024; 4:2492-2502. [PMID: 39055138 PMCID: PMC11267531 DOI: 10.1021/jacsau.4c00228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/18/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, and their integration often leads to more efficient and sustainable pathways. Despite the rapid development of retrosynthesis models, few of them consider both chemical and biological syntheses, hindering the pathway design for high-value chemicals. Here, we propose BioNavi by innovating multitask learning and reaction templates into the deep learning-driven model to design hybrid synthesis pathways in a more interpretable manner. BioNavi outperforms existing approaches on different data sets, achieving a 75% hit rate in replicating reported biosynthetic pathways and displaying superior ability in designing hybrid synthesis pathways. Additional case studies further illustrate the potential application of BioNavi in a de novo pathway design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations and implements step-by-step exploration according to user experience. We show that BioNavi is a handy navigator for designing synthetic pathways for various chemicals.
Collapse
Affiliation(s)
- Tao Zeng
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| | - Zhehao Jin
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Shuangjia Zheng
- Global
Institute of Future Technology, Shanghai
Jiao Tong University, Shanghai 200240, P. R. China
| | - Tao Yu
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Ruibo Wu
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| |
Collapse
|
8
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknownme". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for 450 enzymes of unknown function from the model bacteria Escherichia coli using the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome.
Collapse
|
9
|
van Milgen J. From the biochemical pieces to the nutritional puzzle: using meta-reactions in teaching and research. Animal 2024; 18:101204. [PMID: 38897106 DOI: 10.1016/j.animal.2024.101204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/16/2024] [Accepted: 05/17/2024] [Indexed: 06/21/2024] Open
Abstract
We now live in an era where metabolic data are increasingly accessible and available. Analysis of this data can be done using novel techniques (e.g., machine learning and artificial intelligence) but this does not alleviate scientists to use "human intelligence". The objective of this paper is to combine the information of a large database of biochemical reactions with a method and tool to make nutritional biochemistry more accessible to nutritionists. A script was developed to extract information from a database with more than 16 000 biochemical reactions so that it can be used for "biochemical bookkeeping". A system of more than 300 meta-reactions (i.e., the outcome reaction of a series of connected individual reactions) was constructed covering a wide range of metabolic pathways for macro- and micronutrients. Meta-reactions were constructed by identifying metabolic nodes, which are inputs or outputs of a metabolic system or that serve as connection points between meta-reactions. Complete metabolic pathways can be constructed by combining and balancing the meta-reactions using a simple Excel tool. To illustrate the use of meta-reactions and the tool in the teaching of nutritional biochemistry, examples are given to illustrate how much ATP can be synthesized from glucose, either directly or indirectly (i.e., via storage and mobilization or via transfer of intermediate metabolites between tissues and generations). To illustrate how meta-reactions and the tool can be used in research, nutrient balance data of the mammary gland of a dairy cow were used to construct a plausible pathway of nutrient metabolism of the whole mammary gland. The balance data included 34 metabolites taken up or exported by the mammary gland and 39 meta-reactions were used to construct a metabolic pathway that accounted for the uptake and output of metabolites. The results highlighted the importance of the synthesis of proline from arginine and the concomitant synthesis of urea by the mammary gland. It also raised the question of whether the availability of metabolic pathways or glucose uptake would be the more limiting factor for the synthesis of NADPH required for fatty acid synthesis. The availability of an open database with biochemical reactions, the concept of meta-reactions, and the provision of a tool allow users to construct metabolic pathways, which helps acquiring a more comprehensive and integrated view of metabolism and may raise issues that may be difficult to identify otherwise.
Collapse
Affiliation(s)
- J van Milgen
- PEGASE, INRAE, Institut Agro, 35590 Saint Gilles, France.
| |
Collapse
|
10
|
Vaparanta K, Merilahti JAM, Ojala VK, Elenius K. De Novo Multi-Omics Pathway Analysis Designed for Prior Data Independent Inference of Cell Signaling Pathways. Mol Cell Proteomics 2024; 23:100780. [PMID: 38703893 PMCID: PMC11259815 DOI: 10.1016/j.mcpro.2024.100780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 04/07/2024] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open
Abstract
New tools for cell signaling pathway inference from multi-omics data that are independent of previous knowledge are needed. Here, we propose a new de novo method, the de novo multi-omics pathway analysis (DMPA), to model and combine omics data into network modules and pathways. DMPA was validated with published omics data and was found accurate in discovering reported molecular associations in transcriptome, interactome, phosphoproteome, methylome, and metabolomics data, and signaling pathways in multi-omics data. DMPA was benchmarked against module discovery and multi-omics integration methods and outperformed previous methods in module and pathway discovery especially when applied to datasets of relatively low sample sizes. Transcription factor, kinase, subcellular location, and function prediction algorithms were devised for transcriptome, phosphoproteome, and interactome modules and pathways, respectively. To apply DMPA in a biologically relevant context, interactome, phosphoproteome, transcriptome, and proteome data were collected from analyses carried out using melanoma cells to address gamma-secretase cleavage-dependent signaling characteristics of the receptor tyrosine kinase TYRO3. The pathways modeled with DMPA reflected the predicted function and its direction in validation experiments.
Collapse
Affiliation(s)
- Katri Vaparanta
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland.
| | - Johannes A M Merilahti
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Veera K Ojala
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Klaus Elenius
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland.
| |
Collapse
|
11
|
Farr E, Dimitrov D, Schmidt C, Turei D, Lobentanzer S, Dugourd A, Saez-Rodriguez J. MetalinksDB: a flexible and contextualizable resource of metabolite-protein interactions. Brief Bioinform 2024; 25:bbae347. [PMID: 39038934 PMCID: PMC11262834 DOI: 10.1093/bib/bbae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/29/2024] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
From the catalytic breakdown of nutrients to signaling, interactions between metabolites and proteins play an essential role in cellular function. An important case is cell-cell communication, where metabolites, secreted into the microenvironment, initiate signaling cascades by binding to intra- or extracellular receptors of neighboring cells. Protein-protein cell-cell communication interactions are routinely predicted from transcriptomic data. However, inferring metabolite-mediated intercellular signaling remains challenging, partially due to the limited size of intercellular prior knowledge resources focused on metabolites. Here, we leverage knowledge-graph infrastructure to integrate generalistic metabolite-protein with curated metabolite-receptor resources to create MetalinksDB. MetalinksDB is an order of magnitude larger than existing metabolite-receptor resources and can be tailored to specific biological contexts, such as diseases, pathways, or tissue/cellular locations. We demonstrate MetalinksDB's utility in identifying deregulated processes in renal cancer using multi-omics bulk data. Furthermore, we infer metabolite-driven intercellular signaling in acute kidney injury using spatial transcriptomics data. MetalinksDB is a comprehensive and customizable database of intercellular metabolite-protein interactions, accessible via a web interface (https://metalinks.omnipathdb.org/) and programmatically as a knowledge graph (https://github.com/biocypher/metalinks). We anticipate that by enabling diverse analyses tailored to specific biological contexts, MetalinksDB will facilitate the discovery of disease-relevant metabolite-mediated intercellular signaling processes.
Collapse
Affiliation(s)
- Elias Farr
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, United Kingdom
| | - Daniel Dimitrov
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
| | - Christina Schmidt
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
| | - Denes Turei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
| | - Aurelien Dugourd
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
- EMBL European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SA, United Kingdom
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany
- EMBL European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SA, United Kingdom
| |
Collapse
|
12
|
Glauer M, Neuhaus F, Flügel S, Wosny M, Mossakowski T, Memariani A, Schwerdt J, Hastings J. Chebifier: automating semantic classification in ChEBI to accelerate data-driven discovery. DIGITAL DISCOVERY 2024; 3:896-907. [PMID: 38756223 PMCID: PMC11094693 DOI: 10.1039/d3dd00238a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/26/2024] [Indexed: 05/18/2024]
Abstract
Connecting chemical structural representations with meaningful categories and semantic annotations representing existing knowledge enables data-driven digital discovery from chemistry data. Ontologies are semantic annotation resources that provide definitions and a classification hierarchy for a domain. They are widely used throughout the life sciences. ChEBI is a large-scale ontology for the domain of biologically interesting chemistry that connects representations of chemical structures with meaningful chemical and biological categories. Classifying novel molecular structures into ontologies such as ChEBI has been a longstanding objective for data scientific methods, but the approaches that have been developed to date are limited in several ways: they are not able to expand as the ontology expands without manual intervention, and they are not able to learn from continuously expanding data. We have developed an approach for automated classification of chemicals in the ChEBI ontology based on a neuro-symbolic AI technique that harnesses the ontology itself to create the learning system. We provide this system as a publicly available tool, Chebifier, and as an API, ChEB-AI. We here evaluate our approach and show how it constitutes an advance towards a continuously learning semantic system for chemical knowledge discovery.
Collapse
Affiliation(s)
| | | | | | - Marie Wosny
- Institute for Implementation Science in Health Care, University of Zurich Switzerland
- School of Medicine, University of St. Gallen Switzerland
| | | | | | - Johannes Schwerdt
- Otto von Guericke University Magdeburg Germany
- University of Applied Sciences Merseburg Germany
| | - Janna Hastings
- Institute for Implementation Science in Health Care, University of Zurich Switzerland
- School of Medicine, University of St. Gallen Switzerland
- Swiss Institute of Bioinformatics Switzerland
| |
Collapse
|
13
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
14
|
Öztürk-Çolak A, Marygold SJ, Antonazzo G, Attrill H, Goutte-Gattat D, Jenkins VK, Matthews BB, Millburn G, dos Santos G, Tabone CJ. FlyBase: updates to the Drosophila genes and genomes database. Genetics 2024; 227:iyad211. [PMID: 38301657 PMCID: PMC11075543 DOI: 10.1093/genetics/iyad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 11/27/2023] [Indexed: 02/03/2024] Open
Abstract
FlyBase (flybase.org) is a model organism database and knowledge base about Drosophila melanogaster, commonly known as the fruit fly. Researchers from around the world rely on the genetic, genomic, and functional information available in FlyBase, as well as its tools to view and interrogate these data. In this article, we describe the latest developments and updates to FlyBase. These include the introduction of single-cell RNA sequencing data, improved content and display of functional information, updated orthology pipelines, new chemical reports, and enhancements to our outreach resources.
Collapse
Affiliation(s)
- Arzu Öztürk-Çolak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Steven J Marygold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Damien Goutte-Gattat
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Victoria K Jenkins
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Beverley B Matthews
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Gillian Millburn
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Gilberto dos Santos
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher J Tabone
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
15
|
Santangelo BE, Apgar M, Colorado ASB, Martin CG, Sterrett J, Wall E, Joachimiak MP, Hunter LE, Lozupone CA. Integrating biological knowledge for mechanistic inference in the host-associated microbiome. Front Microbiol 2024; 15:1351678. [PMID: 38638909 PMCID: PMC11024261 DOI: 10.3389/fmicb.2024.1351678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 04/20/2024] Open
Abstract
Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.
Collapse
Affiliation(s)
- Brook E. Santangelo
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Madison Apgar
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | | | - Casey G. Martin
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - John Sterrett
- Department of Integrative Physiology, University of Colorado, Boulder, CO, United States
| | - Elena Wall
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Marcin P. Joachimiak
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Biosystems Data Science Department, Berkeley, CA, United States
| | - Lawrence E. Hunter
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Catherine A. Lozupone
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| |
Collapse
|
16
|
Galgonek J, Vondrášek J. The IDSM mass spectrometry extension: searching mass spectra using SPARQL. Bioinformatics 2024; 40:btae174. [PMID: 38561173 PMCID: PMC11034985 DOI: 10.1093/bioinformatics/btae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/24/2024] [Accepted: 03/28/2024] [Indexed: 04/04/2024] Open
Abstract
SUMMARY The Integrated Database of Small Molecules (IDSM) integrates data from small-molecule datasets, making them accessible through the SPARQL query language. Its unique feature is the ability to search for compounds through SPARQL based on their molecular structure. We extended IDSM to enable mass spectra databases to be integrated and searched for based on mass spectrum similarity. As sources of mass spectra, we employed the MassBank of North America database and the In Silico Spectral Database of natural products. AVAILABILITY AND IMPLEMENTATION The extension is an integral part of IDSM, which is available at https://idsm.elixir-czech.cz. The manual and usage examples are available at https://idsm.elixir-czech.cz/docs/ms. The source codes of all IDSM parts are available under open-source licences at https://github.com/idsm-src.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| |
Collapse
|
17
|
Schottlender G, Prieto JM, Clemente C, Schuster CD, Dumas V, Fernández Do Porto D, Martí MA. Bacterial cytochrome P450s: a bioinformatics odyssey of substrate discovery. Front Microbiol 2024; 15:1343029. [PMID: 38384262 PMCID: PMC10879549 DOI: 10.3389/fmicb.2024.1343029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/23/2024] [Indexed: 02/23/2024] Open
Abstract
Bacterial P450 cytochromes (BacCYPs) are versatile heme-containing proteins responsible for oxidation reactions on a wide range of substrates, contributing to the production of valuable natural products with limitless biotechnological potential. While the sequencing of microbial genomes has provided a wealth of BacCYP sequences, functional characterization lags behind, hindering our understanding of their roles. This study employs a comprehensive approach to predict BacCYP substrate specificity, bridging the gap between sequence and function. We employed an integrated approach combining sequence and functional data analysis, genomic context exploration, 3D structural modeling with molecular docking, and phylogenetic clustering. The research begins with an in-depth analysis of BacCYP sequence diversity and structural characteristics, revealing conserved motifs and recurrent residues in the active site. Phylogenetic analysis identifies distinct groups within the BacCYP family based on sequence similarity. However, our study reveals that sequence alone does not consistently predict substrate specificity, necessitating additional perspectives. The study delves into the genetic context of BacCYPs, utilizing neighboring gene information to infer potential substrates, a method proven very effective in many cases. Molecular docking is employed to assess BacCYP-substrate interactions, confirming potential substrates and providing insights into selectivity. Finally, a comprehensive strategy is proposed for predicting BacCYP substrates, involving all the evaluated approaches. The effectiveness of this strategy is demonstrated with two case studies, highlighting its potential for substrate discovery.
Collapse
Affiliation(s)
- Gustavo Schottlender
- Facultad de Ciencias Exactas y Naturales, Instituto de Cálculo, Universidad de Buenos Aires, CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan Manuel Prieto
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Buenos Aires, Argentina
| | - Camila Clemente
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Buenos Aires, Argentina
| | - Claudio David Schuster
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Buenos Aires, Argentina
| | - Victoria Dumas
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA), Buenos Aires, Argentina
| | - Darío Fernández Do Porto
- Facultad de Ciencias Exactas y Naturales, Instituto de Cálculo, Universidad de Buenos Aires, CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA), Buenos Aires, Argentina
| | - Marcelo Adrian Martí
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN) CONICET, Buenos Aires, Argentina
- Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA), Buenos Aires, Argentina
| |
Collapse
|
18
|
Zulfiqar M, Singh V, Steinbeck C, Sorokina M. Review on computer-assisted biosynthetic capacities elucidation to assess metabolic interactions and communication within microbial communities. Crit Rev Microbiol 2024:1-40. [PMID: 38270170 DOI: 10.1080/1040841x.2024.2306465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 01/12/2024] [Indexed: 01/26/2024]
Abstract
Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Vinay Singh
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Data Science and Artificial Intelligence, Research and Development, Pharmaceuticals, Bayer, Berlin, Germany
| |
Collapse
|
19
|
Witting M, Malik A, Leach A, Bridge A, Aimo L, Conroy MJ, O'Donnell VB, Hoffmann N, Kopczynski D, Giacomoni F, Paulhe N, Gassiot AC, Poupin N, Jourdan F, Bertrand-Michel J. Challenges and perspectives for naming lipids in the context of lipidomics. Metabolomics 2024; 20:15. [PMID: 38267595 PMCID: PMC10808356 DOI: 10.1007/s11306-023-02075-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/01/2023] [Indexed: 01/26/2024]
Abstract
INTRODUCTION Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. OBJECTIVES In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. METHODS & RESULTS In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. CONCLUSION Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks.
Collapse
Affiliation(s)
- Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354, Freising-Weihenstephan, Germany
| | - Adnan Malik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alan Bridge
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211, Geneva 4, Switzerland
| | - Lucila Aimo
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211, Geneva 4, Switzerland
| | - Matthew J Conroy
- Division of Infection and Immunity, Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Valerie B O'Donnell
- Division of Infection and Immunity, Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Nils Hoffmann
- Institute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany
| | - Dominik Kopczynski
- Institute for Analytical Chemistry, Universität Wien, Währingerstrasse 38, 1090, Vienna, Austria
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
| | - Amaury Cazenave Gassiot
- Singapore Lipidomics Incubator, Life Sciences Institute, and Precision Medicine TRP, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Nathalie Poupin
- UMR1331 Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Fabien Jourdan
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France
- UMR1331 Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Justine Bertrand-Michel
- MetaboHUB, National Infrastructure of Metabolomics and Fluxomics ANR-11-INBS-0010, 31077, Toulouse, France.
- I2MC, Inserm U1297, Université de Toulouse, Toulouse, France.
| |
Collapse
|
20
|
Feitosa-Junior OR, Lubbe A, Kosina SM, Martins-Junior J, Barbosa D, Baccari C, Zaini PA, Bowen BP, Northen TR, Lindow SE, da Silva AM. The Exometabolome of Xylella fastidiosa in Contact with Paraburkholderia phytofirmans Supernatant Reveals Changes in Nicotinamide, Amino Acids, Biotin, and Plant Hormones. Metabolites 2024; 14:82. [PMID: 38392974 PMCID: PMC10890622 DOI: 10.3390/metabo14020082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/11/2024] [Accepted: 01/12/2024] [Indexed: 02/25/2024] Open
Abstract
Microbial competition within plant tissues affects invading pathogens' fitness. Metabolomics is a great tool for studying their biochemical interactions by identifying accumulated metabolites. Xylella fastidiosa, a Gram-negative bacterium causing Pierce's disease (PD) in grapevines, secretes various virulence factors including cell wall-degrading enzymes, adhesion proteins, and quorum-sensing molecules. These factors, along with outer membrane vesicles, contribute to its pathogenicity. Previous studies demonstrated that co-inoculating X. fastidiosa with the Paraburkholderia phytofirmans strain PsJN suppressed PD symptoms. Here, we further investigated the interaction between the phytopathogen and the endophyte by analyzing the exometabolome of wild-type X. fastidiosa and a diffusible signaling factor (DSF) mutant lacking quorum sensing, cultivated with 20% P. phytofirmans spent media. Liquid chromatography-mass spectrometry (LC-MS) and the Method for Metabolite Annotation and Gene Integration (MAGI) were used to detect and map metabolites to genomes, revealing a total of 121 metabolites, of which 25 were further investigated. These metabolites potentially relate to host adaptation, virulence, and pathogenicity. Notably, this study presents the first comprehensive profile of X. fastidiosa in the presence of a P. phytofirmans spent media. The results highlight that P. phytofirmans and the absence of functional quorum sensing affect the ratios of glutamine to glutamate (Gln:Glu) in X. fastidiosa. Additionally, two compounds with plant metabolism and growth properties, 2-aminoisobutyric acid and gibberellic acid, were downregulated when X. fastidiosa interacted with P. phytofirmans. These findings suggest that P. phytofirmans-mediated disease suppression involves modulation of the exometabolome of X. fastidiosa, impacting plant immunity.
Collapse
Affiliation(s)
- Oseias R Feitosa-Junior
- Department of Biochemistry, Institute of Chemistry, University of Sao Paulo, Sao Paulo 05508-900, SP, Brazil
- The DOE Joint Genome Institute, Berkeley, CA 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Andrea Lubbe
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Suzanne M Kosina
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Joaquim Martins-Junior
- Department of Biochemistry, Institute of Chemistry, University of Sao Paulo, Sao Paulo 05508-900, SP, Brazil
| | - Deibs Barbosa
- Department of Biochemistry, Institute of Chemistry, University of Sao Paulo, Sao Paulo 05508-900, SP, Brazil
| | - Clelia Baccari
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Paulo A Zaini
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - Benjamin P Bowen
- The DOE Joint Genome Institute, Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Trent R Northen
- The DOE Joint Genome Institute, Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Steven E Lindow
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Aline M da Silva
- Department of Biochemistry, Institute of Chemistry, University of Sao Paulo, Sao Paulo 05508-900, SP, Brazil
| |
Collapse
|
21
|
Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024; 25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open
Abstract
The screening of enzymes for catalyzing specific substrate-product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we have developed the Substrate-product Pair-based Enzyme Promiscuity Prediction (SPEPP) model. This innovative approach utilizes transfer learning and transformer architecture to predict enzyme promiscuity, thereby elucidating the intricate interplay between enzymes and substrate-product pairs. SPEPP exhibited robust predictive ability, eliminating the need for prior knowledge of reactions and allowing users to define their own candidate-enzyme libraries. It can be seamlessly integrated into various applications, including metabolic engineering, de novo pathway design, and hazardous material degradation. To better assist metabolic engineers in designing and refining biochemical pathways, particularly those without programming skills, we also designed EnzyPick, an easy-to-use web server for enzyme screening based on SPEPP. EnzyPick is accessible at http://www.biosynther.com/enzypick/.
Collapse
Affiliation(s)
- Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Yingying Le
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Institute of Environmental Engineering, ETH Zurich, Laura-Hezner-Weg 7, 8093 Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
22
|
McDonald AG, Lisacek F. Simulated digestions of free oligosaccharides and mucin-type O-glycans reveal a potential role for Clostridium perfringens. Sci Rep 2024; 14:1649. [PMID: 38238389 PMCID: PMC10796942 DOI: 10.1038/s41598-023-51012-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 12/29/2023] [Indexed: 01/22/2024] Open
Abstract
The development of a stable human gut microbiota occurs within the first year of life. Many open questions remain about how microfloral species are influenced by the composition of milk, in particular its content of human milk oligosaccharides (HMOs). The objective is to investigate the effect of the human HMO glycome on bacterial symbiosis and competition, based on the glycoside hydrolase (GH) enzyme activities known to be present in microbial species. We extracted from UniProt a list of all bacterial species catalysing glycoside hydrolase activities (EC 3.2.1.-), cross-referencing with the BRENDA database, and obtained a set of taxonomic lineages and CAZy family data. A set of 13 documented enzyme activities was selected and modelled within an enzyme simulator according to a method described previously in the context of biosynthesis. A diverse population of experimentally observed HMOs was fed to the simulator, and the enzymes matching specific bacterial species were recorded, based on their appearance of individual enzymes in the UniProt dataset. Pairs of bacterial species were identified that possessed complementary enzyme profiles enabling the digestion of the HMO glycome, from which potential symbioses could be inferred. Conversely, bacterial species having similar GH enzyme profiles were considered likely to be in competition for the same set of dietary HMOs within the gut of the newborn. We generated a set of putative biodegradative networks from the simulator output, which provides a visualisation of the ability of organisms to digest HMO and mucin-type O-glycans. B. bifidum, B. longum and C. perfringens species were predicted to have the most diverse GH activity and therefore to excel in their ability to digest these substrates. The expected cooperative role of Bifidobacteriales contrasts with the surprising capacities of the pathogen. These findings indicate that potential pathogens may associate in human gut based on their shared glycoside hydrolase digestive apparatus, and which, in the event of colonisation, might result in dysbiosis. The methods described can readily be adapted to other enzyme categories and species as well as being easily fine-tuneable if new degrading enzymes are identified and require inclusion in the model.
Collapse
Affiliation(s)
- Andrew G McDonald
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211, Geneva, Switzerland.
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland.
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211, Geneva, Switzerland.
- Computer Science Department, University of Geneva, Geneva, Switzerland.
- Section of Biology, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
23
|
Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, Haw R, Jassal B, Matthews L, May B, Petryszak R, Ragueneau E, Rothfels K, Sevilla C, Shamovsky V, Stephan R, Tiwari K, Varusai T, Weiser J, Wright A, Wu G, Stein L, Hermjakob H, D’Eustachio P. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res 2024; 52:D672-D678. [PMID: 37941124 PMCID: PMC10767911 DOI: 10.1093/nar/gkad1025] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/14/2023] [Accepted: 10/20/2023] [Indexed: 11/10/2023] Open
Abstract
The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.
Collapse
Affiliation(s)
- Marija Milacic
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Deidre Beavers
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Patrick Conley
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marc Gillespie
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- College of Pharmacy and Health Sciences, St. John's University, Queens, NY 11439, USA
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Bijay Jassal
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Lisa Matthews
- NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Bruce May
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | | | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Ralf Stephan
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Institute for Globally Distributed Open Research and Education (IGDORE)
| | - Krishna Tiwari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Thawfeek Varusai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Joel Weiser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Adam Wright
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Guanming Wu
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | |
Collapse
|
24
|
Conroy MJ, Andrews RM, Andrews S, Cockayne L, Dennis E, Fahy E, Gaud C, Griffiths W, Jukes G, Kolchin M, Mendivelso K, Lopez-Clavijo A, Ready C, Subramaniam S, O’Donnell V. LIPID MAPS: update to databases and tools for the lipidomics community. Nucleic Acids Res 2024; 52:D1677-D1682. [PMID: 37855672 PMCID: PMC10767878 DOI: 10.1093/nar/gkad896] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/20/2023] Open
Abstract
LIPID MAPS (LIPID Metabolites and Pathways Strategy), www.lipidmaps.org, provides a systematic and standardized approach to organizing lipid structural and biochemical data. Founded 20 years ago, the LIPID MAPS nomenclature and classification has become the accepted community standard. LIPID MAPS provides databases for cataloging and identifying lipids at varying levels of characterization in addition to numerous software tools and educational resources, and became an ELIXIR-UK data resource in 2020. This paper describes the expansion of existing databases in LIPID MAPS, including richer metadata with literature provenance, taxonomic data and improved interoperability to facilitate FAIR compliance. A joint project funded by ELIXIR-UK, in collaboration with WikiPathways, curates and hosts pathway data, and annotates lipids in the context of their biochemical pathways. Updated features of the search infrastructure are described along with implementation of programmatic access via API and SPARQL. New lipid-specific databases have been developed and provision of lipidomics tools to the community has been updated. Training and engagement have been expanded with webinars, podcasts and an online training school.
Collapse
Affiliation(s)
- Matthew J Conroy
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Robert M Andrews
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Simon Andrews
- Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Lauren Cockayne
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Edward A Dennis
- Department of Pharmacology, Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0601, USA
| | - Eoin Fahy
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92037, USA
| | - Caroline Gaud
- Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - William J Griffiths
- Swansea University Medical School, Singleton Park, Swansea SA2 8PP, Wales, UK
| | - Geoff Jukes
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Maksim Kolchin
- Boehringer Ingelheim Espana SA, Carrer de Prat de la Riba, 50, 08174 Sant Cugat del Vallès, Barcelona, Spain
| | - Karla Mendivelso
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | | | - Caroline Ready
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Shankar Subramaniam
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92037, USA
| | - Valerie B O’Donnell
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| |
Collapse
|
25
|
Boob AG, Chen J, Zhao H. Enabling pathway design by multiplex experimentation and machine learning. Metab Eng 2024; 81:70-87. [PMID: 38040110 DOI: 10.1016/j.ymben.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/01/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023]
Abstract
The remarkable metabolic diversity observed in nature has provided a foundation for sustainable production of a wide array of valuable molecules. However, transferring the biosynthetic pathway to the desired host often runs into inherent failures that arise from intermediate accumulation and reduced flux resulting from competing pathways within the host cell. Moreover, the conventional trial and error methods utilized in pathway optimization struggle to fully grasp the intricacies of installed pathways, leading to time-consuming and labor-intensive experiments, ultimately resulting in suboptimal yields. Considering these obstacles, there is a pressing need to explore the enzyme expression landscape and identify the optimal pathway configuration for enhanced production of molecules. This review delves into recent advancements in pathway engineering, with a focus on multiplex experimentation and machine learning techniques. These approaches play a pivotal role in overcoming the limitations of traditional methods, enabling exploration of a broader design space and increasing the likelihood of discovering optimal pathway configurations for enhanced production of molecules. We discuss several tools and strategies for pathway design, construction, and optimization for sustainable and cost-effective microbial production of molecules ranging from bulk to fine chemicals. We also highlight major successes in academia and industry through compelling case studies.
Collapse
Affiliation(s)
- Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Junyu Chen
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States.
| |
Collapse
|
26
|
Abad-Navarro F, Martínez-Costa C. A knowledge graph-based data harmonization framework for secondary data reuse. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107918. [PMID: 37981455 DOI: 10.1016/j.cmpb.2023.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 10/02/2023] [Accepted: 11/05/2023] [Indexed: 11/21/2023]
Abstract
BACKGROUND AND OBJECTIVE The adoption of new technologies in clinical care systems has propitiated the availability of a great amount of valuable data. However, this data is usually heterogeneous, requiring its harmonization to be integrated and analysed. We propose a semantic-driven harmonization framework that (1) enables the meaningful sharing and integration of healthcare data across institutions and (2) facilitates the analysis and exploitation of the shared data. METHODS The framework includes an ontology-based common data model (i.e. SCDM), a data transformation pipeline and a semantic query system. Heterogeneous datasets, mapped to different terminologies, are integrated by using an ontology-based infrastructure rooted in a top-level ontology. A graph database is generated by using these mappings, and web-based semantic query system facilitates data exploration. RESULTS Several datasets from different European institutions have been integrated by using the framework in the context of the European H2020 Precise4Q project. Through the query system, data scientists were able to explore data and use it for building machine learning models. CONCLUSIONS The flexible data representation using RDF, together with the formal semantic underpinning provided by the SCDM, have enabled the semantic integration, query and advanced exploitation of heterogeneous data in the context of the Precise4Q project.
Collapse
Affiliation(s)
- Francisco Abad-Navarro
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100, Murcia, Spain.
| | - Catalina Martínez-Costa
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100, Murcia, Spain.
| |
Collapse
|
27
|
Ma S, Fan L, Konanki SA, Liu E, Gennari JH, Smith LP, Hellerstein JL, Sauro HM. VSCode-Antimony: a source editor for building, analyzing, and translating antimony models. Bioinformatics 2023; 39:btad753. [PMID: 38096590 PMCID: PMC10753917 DOI: 10.1093/bioinformatics/btad753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 11/06/2023] [Accepted: 12/13/2023] [Indexed: 12/29/2023] Open
Abstract
MOTIVATION Developing biochemical models in systems biology is a complex, knowledge-intensive activity. Some modelers (especially novices) benefit from model development tools with a graphical user interface. However, as with the development of complex software, text-based representations of models provide many benefits for advanced model development. At present, the tools for text-based model development are limited, typically just a textual editor that provides features such as copy, paste, find, and replace. Since these tools are not "model aware," they do not provide features for: (i) model building such as autocompletion of species names; (ii) model analysis such as hover messages that provide information about chemical species; and (iii) model translation to convert between model representations. We refer to these as BAT features. RESULTS We present VSCode-Antimony, a tool for building, analyzing, and translating models written in the Antimony modeling language, a human readable representation of Systems Biology Markup Language (SBML) models. VSCode-Antimony is a source editor, a tool with language-aware features. For example, there is autocompletion of variable names to assist with model building, hover messages that aid in model analysis, and translation between XML and Antimony representations of SBML models. These features result from making VSCode-Antimony model-aware by incorporating several sophisticated capabilities: analysis of the Antimony grammar (e.g. to identify model symbols and their types); a query system for accessing knowledge sources for chemical species and reactions; and automatic conversion between different model representations (e.g. between Antimony and SBML). AVAILABILITY AND IMPLEMENTATION VSCode-Antimony is available as an open source extension in the VSCode Marketplace https://marketplace.visualstudio.com/items?itemName=stevem.vscode-antimony. Source code can be found at https://github.com/sys-bio/vscode-antimony.
Collapse
Affiliation(s)
- Steve Ma
- NVIDIA Corporation, Redmond, WA 98052, United States
| | - Longxuan Fan
- Department of Mathematics, University of Washington, Seattle, WA 98195, United States
| | - Sai Anish Konanki
- Allen School of Computer Science, University of Washington, Seattle, WA 98195, United States
| | - Eva Liu
- Allen School of Computer Science, University of Washington, Seattle, WA 98195, United States
| | - John H Gennari
- Biomedical and Health Informatics, University of Washington, Seattle, WA 98195, United States
| | - Lucian P Smith
- Department of Bioengineering, University of Washington, Seattle, WA 98195, United States
| | | | - Herbert M Sauro
- Department of Bioengineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
28
|
Irmler S, Bavan T, Binz E, Portmann R. Ability of Latilactobacillus curvatus FAM25164 to produce tryptamine: Identification of a novel tryptophan decarboxylase. Food Microbiol 2023; 116:104343. [PMID: 37689414 DOI: 10.1016/j.fm.2023.104343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/16/2023] [Accepted: 07/17/2023] [Indexed: 09/11/2023]
Abstract
Screenings of cheese isolates revealed that the Latilactobacillus curvatus strain FAM25164 formed tryptamine and tyramine. In the present study, it was studied whether a tryptophan decarboxylase, which has rarely been described in bacteria, could be involved in the production of tryptamine. The genome of strain FAM25164 was sequenced and two amino acid decarboxylase genes of interest were identified by sequence comparisons and gene context analyses. One of the two genes, named tdc1, showed 99% identity to the tdcA gene that has recently been demonstrated by knockout studies to play a role in tyramine formation in L. curvatus. The second gene, named tdc2, was predicted to have an amino acid decarboxylase function, but could not be assigned to a metabolic function. Its protein sequence has 51% identity with Tdc1 and the tdc2 gene is part of a gene cluster not often found in publicly available genome sequences of L. curvatus. Among others, the gene cluster includes a tryptophan-tRNA ligase, indicating that tdc2 plays a role in tryptophan metabolism. To study decarboxylase activity, tdc1 and tdc2 were cloned and expressed as His6-tagged proteins in Escherichia coli. The recombinant E. coli expressing tdc1 produced tyramine, whereas E. coli expressing tdc2 produced tryptamine. The purified recombinant Tdc1 protein decarboxylated tyrosine and 2,3-dihydroxy-l-phenylalanine (l-DOPA), but not tryptophan and phenylalanine. In contrast, the purified Tdc2 was capable of decarboxylating tryptophan but not l-DOPA, tyrosine, or phenylalanine. This study describes a novel bacterial tryptophan decarboxylase (EC 4.1.1.105) that may be responsible for tryptamine formation in cheese.
Collapse
|
29
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
30
|
Prešern U, Goličnik M. Enzyme Databases in the Era of Omics and Artificial Intelligence. Int J Mol Sci 2023; 24:16918. [PMID: 38069254 PMCID: PMC10707154 DOI: 10.3390/ijms242316918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open
Abstract
Enzyme research is important for the development of various scientific fields such as medicine and biotechnology. Enzyme databases facilitate this research by providing a wide range of information relevant to research planning and data analysis. Over the years, various databases that cover different aspects of enzyme biology (e.g., kinetic parameters, enzyme occurrence, and reaction mechanisms) have been developed. Most of the databases are curated manually, which improves reliability of the information; however, such curation cannot keep pace with the exponential growth in published data. Lack of data standardization is another obstacle for data extraction and analysis. Improving machine readability of databases is especially important in the light of recent advances in deep learning algorithms that require big training datasets. This review provides information regarding the current state of enzyme databases, especially in relation to the ever-increasing amount of generated research data and recent advancements in artificial intelligence algorithms. Furthermore, it describes several enzyme databases, providing the reader with necessary information for their use.
Collapse
Affiliation(s)
| | - Marko Goličnik
- Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia;
| |
Collapse
|
31
|
Rushing BR, Thessen AE, Soliman GA, Ramesh A, Sumner SCJ. The Exposome and Nutritional Pharmacology and Toxicology: A New Application for Metabolomics. EXPOSOME 2023; 3:osad008. [PMID: 38766521 PMCID: PMC11101153 DOI: 10.1093/exposome/osad008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
The exposome refers to all of the internal and external life-long exposures that an individual experiences. These exposures, either acute or chronic, are associated with changes in metabolism that will positively or negatively influence the health and well-being of individuals. Nutrients and other dietary compounds modulate similar biochemical processes and have the potential in some cases to counteract the negative effects of exposures or enhance their beneficial effects. We present herein the concept of Nutritional Pharmacology/Toxicology which uses high-information metabolomics workflows to identify metabolic targets associated with exposures. Using this information, nutritional interventions can be designed toward those targets to mitigate adverse effects or enhance positive effects. We also discuss the potential for this approach in precision nutrition where nutrients/diet can be used to target gene-environment interactions and other subpopulation characteristics. Deriving these "nutrient cocktails" presents an opportunity to modify the effects of exposures for more beneficial outcomes in public health.
Collapse
Affiliation(s)
- Blake R. Rushing
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ghada A. Soliman
- Department of Environmental, Occupational and Geospatial Health Sciences, City University of New York-Graduate School of Public Health and Health Policy, New York, NY, USA
| | - Aramandla Ramesh
- Department of Biochemistry, Cancer Biology, Neuroscience & Pharmacology, Meharry Medical College, Nashville, TN, USA
| | - Susan CJ Sumner
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
32
|
Probst D. An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification. J Cheminform 2023; 15:113. [PMID: 37996942 PMCID: PMC10668483 DOI: 10.1186/s13321-023-00784-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023] Open
Abstract
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given the lack of large and balanced annotated data sets of enzyme-catalysed reactions, assigning an enzyme to a reaction still relies on expert-curated rules and databases. Here, we present a data-driven explainable human-in-the-loop machine learning approach to support and ultimately automate the association of a catalysing enzyme with a given biochemical reaction. In addition, the proposed method is capable of predicting enzymes as candidate catalysts for organic reactions amendable to biocatalysis. Finally, the introduced explainability and visualisation methods can easily be generalised to support other machine-learning approaches involving chemical and biochemical reactions.
Collapse
Affiliation(s)
- Daniel Probst
- Signal Processing Laboratory 2, Institute of Electrical and Micro Engineering, School of Engineering, EPFL, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland.
| |
Collapse
|
33
|
Ryu G, Kim GB, Yu T, Lee SY. Deep learning for metabolic pathway design. Metab Eng 2023; 80:130-141. [PMID: 37734652 DOI: 10.1016/j.ymben.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 09/17/2023] [Accepted: 09/19/2023] [Indexed: 09/23/2023]
Abstract
The establishment of a bio-based circular economy is imperative in tackling the climate crisis and advancing sustainable development. In this realm, the creation of microbial cell factories is central to generating a variety of chemicals and materials. The design of metabolic pathways is crucial in shaping these microbial cell factories, especially when it comes to producing chemicals with yet-to-be-discovered biosynthetic routes. To aid in navigating the complexities of chemical and metabolic domains, computer-supported tools for metabolic pathway design have emerged. In this paper, we evaluate how digital strategies can be employed for pathway prediction and enzyme discovery. Additionally, we touch upon the recent strides made in using deep learning techniques for metabolic pathway prediction. These computational tools and strategies streamline the design of metabolic pathways, facilitating the development of microbial cell factories. Leveraging the capabilities of deep learning in metabolic pathway design is profoundly promising, potentially hastening the advent of a bio-based circular economy.
Collapse
Affiliation(s)
- Gahyeon Ryu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Taeho Yu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea; BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea; Graduate School of Engineering Biology, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
34
|
Bay ÖF, Hayes KS, Schwartz JM, Grencis RK, Roberts IS. A genome-scale metabolic model of parasitic whipworm. Nat Commun 2023; 14:6937. [PMID: 37907472 PMCID: PMC10618284 DOI: 10.1038/s41467-023-42552-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/13/2023] [Indexed: 11/02/2023] Open
Abstract
Genome-scale metabolic models are widely used to enhance our understanding of metabolic features of organisms, host-pathogen interactions and to identify therapeutics for diseases. Here we present iTMU798, the genome-scale metabolic model of the mouse whipworm Trichuris muris. The model demonstrates the metabolic features of T. muris and allows the prediction of metabolic steps essential for its survival. Specifically, that Thioredoxin Reductase (TrxR) enzyme is essential, a prediction we validate in vitro with the drug auranofin. Furthermore, our observation that the T. muris genome lacks gsr-1 encoding Glutathione Reductase (GR) but has GR activity that can be inhibited by auranofin indicates a mechanism for the reduction of glutathione by the TrxR enzyme in T. muris. In addition, iTMU798 predicts seven essential amino acids that cannot be synthesised by T. muris, a prediction we validate for the amino acid tryptophan. Overall, iTMU798 is as a powerful tool to study not only the T. muris metabolism but also other Trichuris spp. in understanding host parasite interactions and the rationale design of new intervention strategies.
Collapse
Affiliation(s)
- Ömer F Bay
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Bioinformatics, Abdullah Gül University, Kayseri, Türkiye
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Kelly S Hayes
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- The Wellcome Trust Centre for Cell-Matrix Research, University of Manchester, Manchester, UK
| | - Jean-Marc Schwartz
- Division of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Richard K Grencis
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Wellcome Trust Centre for Cell-Matrix Research, University of Manchester, Manchester, UK.
| | - Ian S Roberts
- Division of Infection, Immunity and Respiratory Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
- The Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
| |
Collapse
|
35
|
Palm EH, Chirsir P, Krier J, Thiessen PA, Zhang J, Bolton EE, Schymanski EL. ShinyTPs: Curating Transformation Products from Text Mining Results. ENVIRONMENTAL SCIENCE & TECHNOLOGY LETTERS 2023; 10:865-871. [PMID: 37840815 PMCID: PMC10569035 DOI: 10.1021/acs.estlett.3c00537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 10/17/2023]
Abstract
Transformation product (TP) information is essential to accurately evaluate the hazards compounds pose to human health and the environment. However, information about TPs is often limited, and existing data is often not fully Findable, Accessible, Interoperable, and Reusable (FAIR). FAIRifying existing TP knowledge is a relatively easy path toward improving access to data for identification workflows and for machine-learning-based algorithms. ShinyTPs was developed to curate existing transformation information derived from text-mined data within the PubChem database. The application (available as an R package) visualizes the text-mined chemical names to facilitate the user validation of the automatically extracted reactions. ShinyTPs was applied to a case study using 436 tentatively identified compounds to prioritize TP retrieval. This resulted in the extraction of 645 reactions (associated with 496 compounds), of which 319 were not previously available in PubChem. The curated reactions were added to the PubChem Transformations library, which was used as a TP suspect list for identification of TPs using the open-source workflow patRoon. In total, 72 compounds from the library were tentatively identified, 18% of which were curated using ShinyTPs, showing that the app can help support TP identification in non-target analysis workflows.
Collapse
Affiliation(s)
- Emma H. Palm
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6 Avenue
du Swing, 4367 Belvaux, Luxembourg
| | - Parviel Chirsir
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6 Avenue
du Swing, 4367 Belvaux, Luxembourg
| | - Jessy Krier
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6 Avenue
du Swing, 4367 Belvaux, Luxembourg
| | - Paul A. Thiessen
- National
Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health
(NIH), Bethesda, Maryland 20894, United States
| | - Jian Zhang
- National
Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health
(NIH), Bethesda, Maryland 20894, United States
| | - Evan E. Bolton
- National
Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health
(NIH), Bethesda, Maryland 20894, United States
| | - Emma L. Schymanski
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6 Avenue
du Swing, 4367 Belvaux, Luxembourg
| |
Collapse
|
36
|
Ribeiro AJM, Riziotis IG, Tyzack JD, Borkakoti N, Thornton JM. EzMechanism: an automated tool to propose catalytic mechanisms of enzyme reactions. Nat Methods 2023; 20:1516-1522. [PMID: 37735566 PMCID: PMC10555830 DOI: 10.1038/s41592-023-02006-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 08/15/2023] [Indexed: 09/23/2023]
Abstract
Over the years, hundreds of enzyme reaction mechanisms have been studied using experimental and simulation methods. This rich literature on biological catalysis is now ripe for use as the foundation of new knowledge-based approaches to investigate enzyme mechanisms. Here, we present a tool able to automatically infer mechanistic paths for a given three-dimensional active site and enzyme reaction, based on a set of catalytic rules compiled from the Mechanism and Catalytic Site Atlas, a database of enzyme mechanisms. EzMechanism (pronounced as 'Easy' Mechanism) is available to everyone through a web user interface. When studying a mechanism, EzMechanism facilitates and improves the generation of hypotheses, by making sure that relevant information is considered, as derived from the literature on both related and unrelated enzymes. We validated EzMechanism on a set of 62 enzymes and have identified paths for further improvement, including the need for additional and more generic catalytic rules.
Collapse
Affiliation(s)
- Antonio J M Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | - Ioannis G Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Jonathan D Tyzack
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
37
|
Holland BL, Matthews ML, Bota P, Sweetlove LJ, Long SP, diCenzo GC. A genome-scale metabolic reconstruction of soybean and Bradyrhizobium diazoefficiens reveals the cost-benefit of nitrogen fixation. THE NEW PHYTOLOGIST 2023; 240:744-756. [PMID: 37649265 DOI: 10.1111/nph.19203] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/05/2023] [Indexed: 09/01/2023]
Abstract
Nitrogen-fixing symbioses allow legumes to thrive in nitrogen-poor soils at the cost of diverting some photoassimilate to their microsymbionts. Effort is being made to bioengineer nitrogen fixation into nonleguminous crops. This requires a quantitative understanding of its energetic costs and the links between metabolic variations and symbiotic efficiency. A whole-plant metabolic model for soybean (Glycine max) with its associated microsymbiont Bradyrhizobium diazoefficiens was developed and applied to predict the cost-benefit of nitrogen fixation with varying soil nitrogen availability. The model predicted a nitrogen-fixation cost of c. 4.13 g C g-1 N, which when implemented into a crop scale model, translated to a grain yield reduction of 27% compared with a non-nodulating plant receiving its nitrogen from the soil. Considering the lower nitrogen content of cereals, the yield cost to a hypothetical N-fixing cereal is predicted to be less than half that of soybean. Soybean growth was predicted to be c. 5% greater when the nodule nitrogen export products were amides versus ureides. This is the first metabolic reconstruction in a tropical crop species that simulates the entire plant and nodule metabolism. Going forward, this model will serve as a tool to investigate carbon use efficiency and key mechanisms within N-fixing symbiosis in a tropical species forming determinate nodules.
Collapse
Affiliation(s)
- Bethany L Holland
- Carl R Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Megan L Matthews
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Pedro Bota
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Lee J Sweetlove
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Stephen P Long
- Carl R Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Departments of Plant Biology and of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - George C diCenzo
- Department of Biology, Queen's University, Kingston, ON, K7L 3N6, Canada
| |
Collapse
|
38
|
Wang Y, Quan S, Zhao Y, Xia Y, Zhang R, Ran M, Wu Z, Zhang W. The active synergetic microbiota with Aspergillus as the core dominates the metabolic network of ester synthesis in medium-high temperature Daqu. Food Microbiol 2023; 115:104336. [PMID: 37567625 DOI: 10.1016/j.fm.2023.104336] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 07/07/2023] [Accepted: 07/07/2023] [Indexed: 08/13/2023]
Abstract
The active ester-synthesis microorganisms in medium-high temperature Daqu (MHT-Daqu) largely impact the strong-flavor Baijiu quality, while their actual composition and metabolic mechanism remain unclear. Here, to explore how the active microbiota contributes to MHT-Daqu ester biosynthesis, metatranscriptomic and metaproteomic analyses coupled with experimental verification were performed. The results showed that the MHT-Daqu microbiota with the higher ester-forming ability exhibited a more active dynamic alteration from transcription to translation. The genera Aspergillus, Bacillus, Leuconostoc, and Pediococcus could transcribe and translate obviously more ester-forming enzymes. In the ester-synthesis metabolic network, the synergetic microbiota confirmed by interaction analysis, containing Eurotiales, Bacillales, and Saccharomycetales, played an essential role, in which the Eurotiales and its representative genus Aspergillus contributed the highest transcript and protein abundance in almost every metabolic process, respectively. The recombined fermentation verified that their corresponding genera could produce the ester and precursor profiles very close to that of the original MHT-Daqu active microbiota, while the microbiota without Aspergillus caused a polar separation. These results indicated that the synergetic microbiota with Aspergillus as the core dominated the metabolic network of ester synthesis in MHT-Daqu. Our study provides a detailed framework of the association between the active synergetic microbiota and ester synthesis in MHT-Daqu.
Collapse
Affiliation(s)
- Yan Wang
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China.
| | - Shikai Quan
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China.
| | - Yajiao Zhao
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China.
| | - Yu Xia
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China.
| | - Rui Zhang
- Luzhou Laojiao Co., Ltd, Luzhou, 646600, China.
| | - Maofang Ran
- Luzhou Laojiao Co., Ltd, Luzhou, 646600, China.
| | - Zhengyun Wu
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China.
| | - Wenxue Zhang
- College of Biomass Science and Engineering, Sichuan University, Chengdu, 610065, China; School of Liquor-Brewing Engineering, Sichuan University of Jinjiang College, Meishan, 620860, China.
| |
Collapse
|
39
|
Finnigan W, Lubberink M, Hepworth LJ, Citoler J, Mattey AP, Ford GJ, Sangster J, Cosgrove SC, da Costa BZ, Heath RS, Thorpe TW, Yu Y, Flitsch SL, Turner NJ. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catal 2023; 13:11771-11780. [PMID: 37671181 PMCID: PMC10476152 DOI: 10.1021/acscatal.3c01418] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/26/2023] [Indexed: 09/07/2023]
Abstract
Despite the increasing use of biocatalysis for organic synthesis, there are currently no databases that adequately capture synthetic biotransformations. The lack of a biocatalysis database prevents accelerating biocatalyst characterization efforts from being leveraged to quickly identify candidate enzymes for reactions or cascades, slowing their development. The RetroBioCat Database (available at retrobiocat.com) addresses this gap by capturing information on synthetic biotransformations and providing an analysis platform that allows biocatalysis data to be searched and explored through a range of highly interactive data visualization tools. This database makes it simple to explore available enzymes, their substrate scopes, and how characterized enzymes are related to each other and the wider sequence space. Data entry is facilitated through an openly accessible curation platform, featuring automated tools to accelerate the process. The RetroBioCat Database democratizes biocatalysis knowledge and has the potential to accelerate biocatalytic reaction development, making it a valuable resource for the community.
Collapse
Affiliation(s)
- William Finnigan
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Lorna J. Hepworth
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Joan Citoler
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Ashley P. Mattey
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Grayson J. Ford
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Jack Sangster
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Bruna Zucoloto da Costa
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | | | - Yuqi Yu
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Sabine L. Flitsch
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nicholas J. Turner
- Department of Chemistry, Manchester Institute of Biotechnology, University
of Manchester, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
40
|
Han SY, Kim WY, Kim JS, Hwang I. Comparative transcriptomics reveals the role of altered energy metabolism in the establishment of single-cell C 4 photosynthesis in Bienertia sinuspersici. FRONTIERS IN PLANT SCIENCE 2023; 14:1202521. [PMID: 37476170 PMCID: PMC10354284 DOI: 10.3389/fpls.2023.1202521] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 05/31/2023] [Indexed: 07/22/2023]
Abstract
Single-cell C4 photosynthesis (SCC4) in terrestrial plants without Kranz anatomy involves three steps: initial CO2 fixation in the cytosol, CO2 release in mitochondria, and a second CO2 fixation in central chloroplasts. Here, we investigated how the large number of mechanisms underlying these processes, which occur in three different compartments, are orchestrated in a coordinated manner to establish the C4 pathway in Bienertia sinuspersici, a SCC4 plant. Leaves were subjected to transcriptome analysis at three different developmental stages. Functional enrichment analysis revealed that SCC4 cycle genes are coexpressed with genes regulating cyclic electron flow and amino/organic acid metabolism, two key processes required for the production of energy molecules in C3 plants. Comparative gene expression profiling of B. sinuspersici and three other species (Suaeda aralocaspica, Amaranthus hypochondriacus, and Arabidopsis thaliana) showed that the direction of metabolic flux was determined via an alteration in energy supply in peripheral chloroplasts and mitochondria via regulation of gene expression in the direction of the C4 cycle. Based on these results, we propose that the redox homeostasis of energy molecules via energy metabolism regulation is key to the establishment of the SCC4 pathway in B. sinuspersici.
Collapse
Affiliation(s)
- Sang-Yun Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Woe-Yeon Kim
- Division of Applied Life Science (BK21+) and Research Institute of Life Science, Institute of Agriculture and Life Sciences, Gyeongsang National University, Jinju, Republic of Korea
| | - Jung Sun Kim
- Genomic Division, Department of Agricultural Bio-Resources, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju, Republic of Korea
| | - Inhwan Hwang
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
| |
Collapse
|
41
|
Fakih I, Got J, Robles-Rodriguez CE, Siegel A, Forano E, Muñoz-Tamayo R. Dynamic genome-based metabolic modeling of the predominant cellulolytic rumen bacterium Fibrobacter succinogenes S85. mSystems 2023; 8:e0102722. [PMID: 37289026 PMCID: PMC10308913 DOI: 10.1128/msystems.01027-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/14/2023] [Indexed: 06/09/2023] Open
Abstract
Fibrobacter succinogenes is a cellulolytic bacterium that plays an essential role in the degradation of plant fibers in the rumen ecosystem. It converts cellulose polymers into intracellular glycogen and the fermentation metabolites succinate, acetate, and formate. We developed dynamic models of F. succinogenes S85 metabolism on glucose, cellobiose, and cellulose on the basis of a network reconstruction done with the automatic reconstruction of metabolic model workspace. The reconstruction was based on genome annotation, five template-based orthology methods, gap filling, and manual curation. The metabolic network of F. succinogenes S85 comprises 1,565 reactions with 77% linked to 1,317 genes, 1,586 unique metabolites, and 931 pathways. The network was reduced using the NetRed algorithm and analyzed for the computation of elementary flux modes. A yield analysis was further performed to select a minimal set of macroscopic reactions for each substrate. The accuracy of the models was acceptable in simulating F. succinogenes carbohydrate metabolism with an average coefficient of variation of the root mean squared error of 19%. The resulting models are useful resources for investigating the metabolic capabilities of F. succinogenes S85, including the dynamics of metabolite production. Such an approach is a key step toward the integration of omics microbial information into predictive models of rumen metabolism. IMPORTANCE F. succinogenes S85 is a cellulose-degrading and succinate-producing bacterium. Such functions are central for the rumen ecosystem and are of special interest for several industrial applications. This work illustrates how information of the genome of F. succinogenes can be translated to develop predictive dynamic models of rumen fermentation processes. We expect this approach can be applied to other rumen microbes for producing a model of rumen microbiome that can be used for studying microbial manipulation strategies aimed at enhancing feed utilization and mitigating enteric emissions.
Collapse
Affiliation(s)
- Ibrahim Fakih
- Université Clermont Auvergne, INRAE, UMR454 Microbiologie Environnement Digestif et Santé, 63000 Clermont-Ferrand, France
- Université Paris-Saclay, INRAE, AgroParisTech, UMR Modélisation Systémique Appliquée aux Ruminants, 91120 Palaiseau, France
| | - Jeanne Got
- Université Rennes, Inria, CNRS, IRISA, Dyliss team, 35042 Rennes, France
| | | | - Anne Siegel
- Université Rennes, Inria, CNRS, IRISA, Dyliss team, 35042 Rennes, France
| | - Evelyne Forano
- Université Clermont Auvergne, INRAE, UMR454 Microbiologie Environnement Digestif et Santé, 63000 Clermont-Ferrand, France
| | - Rafael Muñoz-Tamayo
- Université Paris-Saclay, INRAE, AgroParisTech, UMR Modélisation Systémique Appliquée aux Ruminants, 91120 Palaiseau, France
| |
Collapse
|
42
|
Galgonek J, Vondrášek J. A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL. J Cheminform 2023; 15:61. [PMID: 37340506 DOI: 10.1186/s13321-023-00729-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 05/30/2023] [Indexed: 06/22/2023] Open
Abstract
Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the Resource Description Framework (RDF) to express data and on the SPARQL query language to retrieve the data. Many existing biological and chemical databases are stored in the form of a relational database (RDB). Converting a relational database into the RDF form and storing it in a native RDF database system may not be desirable in many cases. It may be necessary to preserve the original database form, and having two versions of the same data may not be convenient. A solution may be to use a system mapping the relational database to the RDF form. Such a system keeps data in their original relational form and translates incoming SPARQL queries to equivalent SQL queries, which are evaluated by a relational-database system. This review compares different RDB-to-RDF mapping systems with a primary focus on those that can be used free of charge. In addition, it compares different approaches to expressing RDB-to-RDF mappings. The review shows that these systems represent a viable method providing sufficient performance. Their real-life performance is demonstrated on data and queries coming from the neXtProt project.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
43
|
Sankaranarayanan K, Jensen KF. Computer-assisted multistep chemoenzymatic retrosynthesis using a chemical synthesis planner. Chem Sci 2023; 14:6467-6475. [PMID: 37325140 PMCID: PMC10266459 DOI: 10.1039/d3sc01355c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023] Open
Abstract
Chemoenzymatic synthesis methods use organic and enzyme chemistry to synthesize a desired small molecule. Complementing organic synthesis with enzyme-catalyzed selective transformations under mild conditions enables more sustainable and synthetically efficient chemical manufacturing. Here, we present a multistep retrosynthesis search algorithm to facilitate chemoenzymatic synthesis of pharmaceutical compounds, specialty chemicals, commodity chemicals, and monomers. First, we employ the synthesis planner ASKCOS to plan multistep syntheses starting from commercially available materials. Then, we identify transformations that can be catalyzed by enzymes using a small database of biocatalytic reaction rules previously curated for RetroBioCat, a computer-aided synthesis planning tool for biocatalytic cascades. Enzymatic suggestions captured by the approach include ones capable of reducing the number of synthetic steps. We successfully plan chemoenzymatic routes for active pharmaceutical ingredients or their intermediates (e.g., Sitagliptin, Rivastigmine, and Ephedrine), commodity chemicals (e.g., acrylamide and glycolic acid), and specialty chemicals (e.g., S-Metalochlor and Vanillin), in a retrospective fashion. In addition to recovering published routes, the algorithm proposes many sensible alternative pathways. Our approach provides a chemoenzymatic synthesis planning strategy by identifying synthetic transformations that could be candidates for enzyme catalysis.
Collapse
Affiliation(s)
- Karthik Sankaranarayanan
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge Massachusetts 02139 USA
| |
Collapse
|
44
|
Chen N, Zhang R, Zeng T, Zhang X, Wu R. Developing TeroENZ and TeroMAP modules for the terpenome research platform TeroKit. Database (Oxford) 2023; 2023:7173549. [PMID: 37207351 PMCID: PMC10380177 DOI: 10.1093/database/baad020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/19/2023] [Accepted: 03/17/2023] [Indexed: 05/21/2023]
Abstract
Terpenoids and their derivatives are collectively known as the terpenome and are the largest class of natural products, whose biosynthesis refers to various kinds of enzymes. To date, there is no terpenome-related enzyme database, which is a desire for enzyme mining, metabolic engineering and discovery of new natural products related to terpenoids. In this work, we have constructed a comprehensive database called TeroENZ (http://terokit.qmclab.com/browse_enz.html) containing 13 462 enzymes involved in the terpenoid biosynthetic pathway, covering 2541 species and 4293 reactions reported in the literature and public databases. At the same time, we classify enzymes according to their catalytic reactions into cyclase, oxidoreductase, transferase, and so on, and also make a classification according to species. This meticulous classification is beneficial for users as it can be retrieved and downloaded conveniently. We also provide a computational module for isozyme prediction. Moreover, a module named TeroMAP (http://terokit.qmclab.com/browse_rxn.html) is also constructed to organize all available terpenoid enzymatic reactions into an interactive network by interfacing with the previously established database of terpenoid compounds, TeroMOL. Finally, all these databases and modules are integrated into the web server TeroKit (http://terokit.qmclab.com/) to shed light on the field of terpenoid research. Database URL http://terokit.qmclab.com/.
Collapse
Affiliation(s)
- Nianhang Chen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Rong Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Xuting Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
45
|
Kroll A, Ranjan S, Engqvist MKM, Lercher MJ. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 2023; 14:2787. [PMID: 37188731 DOI: 10.1038/s41467-023-38347-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 04/21/2023] [Indexed: 05/17/2023] Open
Abstract
For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples. Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data. ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families. ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates. By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Martin K M Engqvist
- Department of Biology and Bioengineering, Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
- EnginZyme AB, Tomtebodevägen 6, 17165, Stockholm, Sweden
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
46
|
Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, Hill DP, Lee R, Mi H, Moxon S, Mungall CJ, Muruganugan A, Mushayahama T, Sternberg PW, Thomas PD, Van Auken K, Ramsey J, Siegele DA, Chisholm RL, Fey P, Aspromonte MC, Nugnes MV, Quaglia F, Tosatto S, Giglio M, Nadendla S, Antonazzo G, Attrill H, Dos Santos G, Marygold S, Strelets V, Tabone CJ, Thurmond J, Zhou P, Ahmed SH, Asanitthong P, Luna Buitrago D, Erdol MN, Gage MC, Ali Kadhum M, Li KYC, Long M, Michalak A, Pesala A, Pritazahra A, Saverimuttu SCC, Su R, Thurlow KE, Lovering RC, Logie C, Oliferenko S, Blake J, Christie K, Corbani L, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov D, Smith C, Cuzick A, Seager J, Cooper L, Elser J, Jaiswal P, Gupta P, Jaiswal P, Naithani S, Lera-Ramirez M, Rutherford K, Wood V, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Tutaj MA, Vedi M, Wang SJ, D'Eustachio P, Aimo L, Axelsen K, Bridge A, Hyka-Nouspikel N, Morgat A, Aleksander SA, Cherry JM, Engel SR, Karra K, Miyasato SR, Nash RS, Skrzypek MS, Weng S, Wong ED, Bakker E, Berardini TZ, Reiser L, Auchincloss A, Axelsen K, Argoud-Puy G, Blatter MC, Boutet E, Breuza L, Bridge A, Casals-Casas C, Coudert E, Estreicher A, Livia Famiglietti M, Feuermann M, Gos A, Gruaz-Gumowski N, Hulo C, Hyka-Nouspikel N, Jungo F, Le Mercier P, Lieberherr D, Masson P, Morgat A, Pedruzzi I, Pourcel L, Poux S, Rivoire C, Sundaram S, Bateman A, Bowler-Barnett E, Bye-A-Jee H, Denny P, Ignatchenko A, Ishtiaq R, Lock A, Lussi Y, Magrane M, Martin MJ, Orchard S, Raposo P, Speretta E, Tyagi N, Warner K, Zaru R, Diehl AD, Lee R, Chan J, Diamantakis S, Raciti D, Zarowiecki M, Fisher M, James-Zorn C, Ponferrada V, Zorn A, Ramachandran S, Ruzicka L, Westerfield M. The Gene Ontology knowledgebase in 2023. Genetics 2023; 224:iyad031. [PMID: 36866529 PMCID: PMC10158837 DOI: 10.1093/genetics/iyad031] [Citation(s) in RCA: 389] [Impact Index Per Article: 389.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/10/2023] [Accepted: 02/11/2023] [Indexed: 03/04/2023] Open
Abstract
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
Collapse
|
47
|
Mogilenko DA, Sergushichev A, Artyomov MN. Systems Immunology Approaches to Metabolism. Annu Rev Immunol 2023; 41:317-342. [PMID: 37126419 DOI: 10.1146/annurev-immunol-101220-031513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Over the last decade, immunometabolism has emerged as a novel interdisciplinary field of research and yielded significant fundamental insights into the regulation of immune responses. Multiple classical approaches to interrogate immunometabolism, including bulk metabolic profiling and analysis of metabolic regulator expression, paved the way to appreciating the physiological complexity of immunometabolic regulation in vivo. Studying immunometabolism at the systems level raised the need to transition towards the next-generation technology for metabolic profiling and analysis. Spatially resolved metabolic imaging and computational algorithms for multi-modal data integration are new approaches to connecting metabolism and immunity. In this review, we discuss recent studies that highlight the complex physiological interplay between immune responses and metabolism and give an overview of technological developments that bear the promise of capturing this complexity most directly and comprehensively.
Collapse
Affiliation(s)
- Denis A Mogilenko
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA; ,
- Current affiliation: Department of Medicine, Department of Pathology, Microbiology, and Immunology, and Vanderbilt Center for Immunobiology, Vanderbilt University Medical Center, Nashville, Tennessee, USA;
| | - Alexey Sergushichev
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA; ,
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Maxim N Artyomov
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA; ,
| |
Collapse
|
48
|
Bremer E, Calteau A, Danchin A, Harwood C, Helmann JD, Médigue C, Palsson BO, Sekowska A, Vallenet D, Zuniga A, Zuniga C. A model industrial workhorse:
Bacillus subtilis
strain 168 and its genome after a quarter of a century. Microb Biotechnol 2023; 16:1203-1231. [PMID: 37002859 DOI: 10.1111/1751-7915.14257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open
Abstract
The vast majority of genomic sequences are automatically annotated using various software programs. The accuracy of these annotations depends heavily on the very few manual annotation efforts that combine verified experimental data with genomic sequences from model organisms. Here, we summarize the updated functional annotation of Bacillus subtilis strain 168, a quarter century after its genome sequence was first made public. Since the last such effort 5 years ago, 1168 genetic functions have been updated, allowing the construction of a new metabolic model of this organism of environmental and industrial interest. The emphasis in this review is on new metabolic insights, the role of metals in metabolism and macromolecule biosynthesis, functions involved in biofilm formation, features controlling cell growth, and finally, protein agents that allow class discrimination, thus allowing maintenance management, and accuracy of all cell processes. New 'genomic objects' and an extensive updated literature review have been included for the sequence, now available at the International Nucleotide Sequence Database Collaboration (INSDC: AccNum AL009126.4).
Collapse
Affiliation(s)
- Erhard Bremer
- Department of Biology, Laboratory for Microbiology and Center for Synthetic Microbiology (SYNMIKRO) Philipps‐University Marburg Marburg Germany
| | - Alexandra Calteau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine Hong Kong University Pokfulam SAR Hong Kong China
| | - Colin Harwood
- Centre for Bacterial Cell Biology, Biosciences Institute Newcastle University Baddiley Clark Building Newcastle upon Tyne UK
| | - John D. Helmann
- Department of Microbiology Cornell University Ithaca New York USA
| | - Claudine Médigue
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Bernhard O. Palsson
- Department of Bioengineering University of California San Diego La Jolla USA
| | | | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut de Biologie François Jacob Université d'Évry, Université Paris‐Saclay, CNRS Évry France
| | - Abril Zuniga
- Department of Biology San Diego State University San Diego California USA
| | - Cristal Zuniga
- Bioinformatics and Medical Informatics Graduate Program San Diego State University San Diego California USA
| |
Collapse
|
49
|
Rothfels K, Milacic M, Matthews L, Haw R, Sevilla C, Gillespie M, Stephan R, Gong C, Ragueneau E, May B, Shamovsky V, Wright A, Weiser J, Beavers D, Conley P, Tiwari K, Jassal B, Griss J, Senff-Ribeiro A, Brunson T, Petryszak R, Hermjakob H, D'Eustachio P, Wu G, Stein L. Using the Reactome Database. Curr Protoc 2023; 3:e722. [PMID: 37053306 PMCID: PMC11184634 DOI: 10.1002/cpz1.722] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
Pathway databases provide descriptions of the roles of proteins, nucleic acids, lipids, carbohydrates, and other molecular entities within their biological cellular contexts. Pathway-centric views of these roles may allow for the discovery of unexpected functional relationships in data such as gene expression profiles and somatic mutation catalogues from tumor cells. For this reason, there is a high demand for high-quality pathway databases and their associated tools. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, New York University Langone Health, the European Bioinformatics Institute, and Oregon Health & Science University) is one such pathway database. Reactome collects detailed information on biological pathways and processes in humans from the primary literature. Reactome content is manually curated, expert-authored, and peer-reviewed and spans the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm, and other model organisms. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Browsing a Reactome pathway Basic Protocol 2: Exploring Reactome annotations of disease and drugs Basic Protocol 3: Finding the pathways involving a gene or protein Alternate Protocol 1: Finding the pathways involving a gene or protein using UniProtKB (SwissProt), Ensembl, or Entrez gene identifier Alternate Protocol 2: Using advanced search Basic Protocol 4: Using the Reactome pathway analysis tool to identify statistically overrepresented pathways Basic Protocol 5: Using the Reactome pathway analysis tool to overlay expression data onto Reactome pathway diagrams Basic Protocol 6: Comparing inferred model organism and human pathways using the Species Comparison tool Basic Protocol 7: Comparing tissue-specific expression using the Tissue Distribution tool.
Collapse
Affiliation(s)
- Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Marija Milacic
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Marc Gillespie
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- College of Pharmacy and Health Sciences, St. John's University, Queens, New York
| | - Ralf Stephan
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Bruce May
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Adam Wright
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Joel Weiser
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | | | - Krishna Tiwari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Bijay Jassal
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Andrea Senff-Ribeiro
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Universidade Federal do Paraná, Curitiba, Brazil
| | | | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
- Oregon Health and Science University, Portland, Oregon
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | | | - Guanming Wu
- Oregon Health and Science University, Portland, Oregon
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
50
|
Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023; 63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Compact and interpretable structural feature representations are required for accurately predicting properties and function of proteins. In this work, we construct and evaluate three-dimensional feature representations of protein structures based on space-filling curves (SFCs). We focus on the problem of enzyme substrate prediction, using two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine-dependent methyltransferases (SAM-MTases). Space-filling curves such as the Hilbert curve and the Morton curve generate a reversible mapping from discretized three-dimensional to one-dimensional representations and thus help to encode three-dimensional molecular structures in a system-independent way and with only a few adjustable parameters. Using three-dimensional structures of SDRs and SAM-MTases generated using AlphaFold2, we assess the performance of the SFC-based feature representations in predictions on a new benchmark database of enzyme classification tasks including their cofactor and substrate selectivity. Gradient-boosted tree classifiers yield binary prediction accuracy of 0.77-0.91 and area under curve (AUC) characteristics of 0.83-0.92 for the classification tasks. We investigate the effects of amino acid encoding, spatial orientation, and (the few) parameters of SFC-based encodings on the accuracy of the predictions. Our results suggest that geometry-based approaches such as SFCs are promising for generating protein structural representations and are complementary to the existing protein feature representations such as evolutionary scale modeling (ESM) sequence embeddings.
Collapse
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of California, Irvine, 1102 Natural Sciences 2, Irvine, California 92697, United States
| | - Adrian Jinich
- Weill Cornell Medicine, 1300 York Avenue, Box 65, New York, New York 10065, United States
| |
Collapse
|