1
|
Merz N, Schilling K, Thomas D, Hahnefeld L, Grösch S. Cell Labeling with 15-YNE Is Useful for Tracking Protein Palmitoylation and Metabolic Lipid Flux in the Same Sample. Molecules 2025; 30:377. [PMID: 39860245 PMCID: PMC11767944 DOI: 10.3390/molecules30020377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 01/13/2025] [Accepted: 01/15/2025] [Indexed: 01/27/2025] Open
Abstract
Protein S-palmitoylation is the process by which a palmitoyl fatty acid is attached to a cysteine residue of a protein via a thioester bond. A range of methodologies are available for the detection of protein S-palmitoylation. In this study, two methods for the S-palmitoylation of different proteins were compared after metabolic labeling of cells with 15-hexadecynoic acid (15-YNE) to ascertain their relative usefulness. It was hypothesized that labeling cells with a traceable lipid would affect lipid metabolism and the cellular lipidome. In this study, we developed a method to track 15-YNE incorporation into lipids using liquid chromatography high-resolution mass spectrometry (LC-HRMS) as well as protein palmitoylation in the same sample. We observed a time- and concentration-dependent S-palmitoylation of calnexin and succinate dehydrogenase complex flavoprotein subunit A (SDHA) depending on the cell type. The detection of S-palmitoylation with a clickable fluorophore or biotin azide followed by immunoprecipitation is shown to be equally useful. 15-YNE was observed to be incorporated into a wide array of lipid classes during the process, yet it did not appear to modify the overall lipid composition of the cells. In conclusion, we show that 15-YNE is a useful tracer to detect both protein S-palmitoylation and lipid metabolism in the same sample.
Collapse
Affiliation(s)
- Nadine Merz
- Goethe University Frankfurt, Institute of Clinical Pharmacology, Faculty of Medicine, Theodor Stern Kai 7, 60590 Frankfurt am Main, Germany; (N.M.); (K.S.); (D.T.); (L.H.)
| | - Karin Schilling
- Goethe University Frankfurt, Institute of Clinical Pharmacology, Faculty of Medicine, Theodor Stern Kai 7, 60590 Frankfurt am Main, Germany; (N.M.); (K.S.); (D.T.); (L.H.)
| | - Dominique Thomas
- Goethe University Frankfurt, Institute of Clinical Pharmacology, Faculty of Medicine, Theodor Stern Kai 7, 60590 Frankfurt am Main, Germany; (N.M.); (K.S.); (D.T.); (L.H.)
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
| | - Lisa Hahnefeld
- Goethe University Frankfurt, Institute of Clinical Pharmacology, Faculty of Medicine, Theodor Stern Kai 7, 60590 Frankfurt am Main, Germany; (N.M.); (K.S.); (D.T.); (L.H.)
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases CIMD, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
| | - Sabine Grösch
- Goethe University Frankfurt, Institute of Clinical Pharmacology, Faculty of Medicine, Theodor Stern Kai 7, 60590 Frankfurt am Main, Germany; (N.M.); (K.S.); (D.T.); (L.H.)
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
| |
Collapse
|
2
|
Arora C, Matic M, Bisceglia L, Di Chiaro P, De Oliveira Rosa N, Carli F, Clubb L, Nemati Fard LA, Kargas G, Diaferia GR, Vukotic R, Licata L, Wu G, Natoli G, Gutkind JS, Raimondi F. The landscape of cancer-rewired GPCR signaling axes. CELL GENOMICS 2024; 4:100557. [PMID: 38723607 PMCID: PMC11099383 DOI: 10.1016/j.xgen.2024.100557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 02/17/2024] [Accepted: 04/10/2024] [Indexed: 05/15/2024]
Abstract
We explored the dysregulation of G-protein-coupled receptor (GPCR) ligand systems in cancer transcriptomics datasets to uncover new therapeutics opportunities in oncology. We derived an interaction network of receptors with ligands and their biosynthetic enzymes. Multiple GPCRs are differentially regulated together with their upstream partners across cancer subtypes and are associated to specific transcriptional programs and to patient survival patterns. The expression of both receptor-ligand (or enzymes) partners improved patient stratification, suggesting a synergistic role for the activation of GPCR networks in modulating cancer phenotypes. Remarkably, we identified many such axes across several cancer molecular subtypes, including many involving receptor-biosynthetic enzymes for neurotransmitters. We found that GPCRs from these actionable axes, including, e.g., muscarinic, adenosine, 5-hydroxytryptamine, and chemokine receptors, are the targets of multiple drugs displaying anti-growth effects in large-scale, cancer cell drug screens, which we further validated. We have made the results generated in this study freely available through a webapp (gpcrcanceraxes.bioinfolab.sns.it).
Collapse
Affiliation(s)
- Chakit Arora
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Marin Matic
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - Natalia De Oliveira Rosa
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Lauren Clubb
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Lorenzo Amir Nemati Fard
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Giorgos Kargas
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Giuseppe R Diaferia
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - Ranka Vukotic
- Azienda Ospedaliero-Universitaria Pisana, Via Roma, 67, 56126 Pisa, Italy
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Guanming Wu
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - J Silvio Gutkind
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Francesco Raimondi
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy; Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy.
| |
Collapse
|
3
|
Karbhal R, Sawant S, Kulkarni-Kale U. iCEED: Integrated customized extraction of enzyme data. J Bioinform Comput Biol 2024; 22:2450005. [PMID: 38779780 DOI: 10.1142/s0219720024500057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Enzymes catalyze diverse biochemical reactions and are building blocks of cellular and metabolic pathways. Data and metadata of enzymes are distributed across databases and are archived in various formats. The enzyme databases provide utilities for efficient searches and downloading enzyme records in batch mode but do not support organism-specific extraction of subsets of data. Users are required to write scripts for parsing entries for customized data extraction prior to downstream analysis. Integrated Customized Extraction of Enzyme Data (iCEED) has been developed to provide organism-specific customized data extraction utilities for seven commonly used enzyme databases and brings these resources under an integrated portal. iCEED provides dropdown menus and search boxes using typehead utility for submission of queries as well as enzyme class-based browsing utility. A utility to facilitate mapping and visualization of functionally important features on the three-dimensional (3D) structures of enzymes is integrated. The customized data extraction utilities provided in iCEED are expected to be useful for biochemists, biotechnologists, computational biologists, and life science researchers to build curated datasets of their choice through an easy to navigate web-based interface. The integrated feature visualization system is useful for a fine-grained understanding of the enzyme structure-function relationship. Desired subsets of data, extracted and curated using iCEED can be subsequently used for downstream processing, analyses, and knowledge discovery. iCEED can also be used for training and teaching purposes.
Collapse
Affiliation(s)
- Rajiv Karbhal
- Bioinformatics Centre, Savitribai Phule Pune University, Pune 411007, India
| | - Sangeeta Sawant
- Bioinformatics Centre, Savitribai Phule Pune University, Pune 411007, India
| | - Urmila Kulkarni-Kale
- Bioinformatics Centre, Savitribai Phule Pune University, Pune 411007, India
- Department of Natural Sciences and Environmental Health, University of Southeastern Norway, Gullbringvegen 36, 3800 Bø, Norway
| |
Collapse
|
4
|
Altenhoff A, Bairoch A, Bansal P, Baratin D, Bastian F, Bolleman* J, Bridge A, Burdet F, Crameri K, Dauvillier J, Dessimoz C, Gehant S, Glover N, Gnodtke K, Hayes C, Ibberson M, Kriventseva E, Kuznetsov D, Frédérique L, Mehl F, Mendes de Farias* T, Michel PA, Moretti S, Morgat A, Österle S, Pagni M, Redaschi N, Robinson-Rechavi M, Samarasinghe K, Sima AC, Szklarczyk D, Topalov O, Touré V, Unni D, von Mering C, Wollbrett J, Zahn-Zabal* M, Zdobnov E. The SIB Swiss Institute of Bioinformatics Semantic Web of data. Nucleic Acids Res 2024; 52:D44-D51. [PMID: 37878411 PMCID: PMC10767860 DOI: 10.1093/nar/gkad902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/02/2023] [Accepted: 10/05/2023] [Indexed: 10/27/2023] Open
Abstract
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.
Collapse
|
5
|
Arora C, Matic M, DiChiaro P, Rosa NDO, Carli F, Clubb L, Fard LAN, Kargas G, Diaferia G, Vukotic R, Licata L, Wu G, Natoli G, Gutkind JS, Raimondi F. The landscape of cancer rewired GPCR signaling axes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.13.532291. [PMID: 37398064 PMCID: PMC10312480 DOI: 10.1101/2023.03.13.532291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We explored the dysregulation of GPCR ligand signaling systems in cancer transcriptomics datasets to uncover new therapeutics opportunities in oncology. We derived an interaction network of receptors with ligands and their biosynthetic enzymes, which revealed that multiple GPCRs are differentially regulated together with their upstream partners across cancer subtypes. We showed that biosynthetic pathway enrichment from enzyme expression recapitulated pathway activity signatures from metabolomics datasets, providing valuable surrogate information for GPCRs responding to organic ligands. We found that several GPCRs signaling components were significantly associated with patient survival in a cancer type-specific fashion. The expression of both receptor-ligand (or enzymes) partners improved patient stratification, suggesting a synergistic role for the activation of GPCR networks in modulating cancer phenotypes. Remarkably, we identified many such axes across several cancer molecular subtypes, including many pairs involving receptor-biosynthetic enzymes for neurotransmitters. We found that GPCRs from these actionable axes, including e.g., muscarinic, adenosine, 5-hydroxytryptamine and chemokine receptors, are the targets of multiple drugs displaying anti-growth effects in large-scale, cancer cell drug screens. We have made the results generated in this study freely available through a webapp (gpcrcanceraxes.bioinfolab.sns.it).
Collapse
Affiliation(s)
- Chakit Arora
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Marin Matic
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Pierluigi DiChiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - Natalia De Oliveira Rosa
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Lauren Clubb
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Lorenzo Amir Nemati Fard
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Giorgos Kargas
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| | - Giuseppe Diaferia
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - Ranka Vukotic
- Azienda Ospedaliero-Universitaria Pisana, Via Roma, 67, 56126 Pisa
| | - Luana Licata
- Department of Biology, University of Rome ‘Tor Vergata’, Rome 00133, Italy
| | - Guanming Wu
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | - J. Silvio Gutkind
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Francesco Raimondi
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy
| |
Collapse
|
6
|
Ji J, Zhang D, Ye J, Zheng Y, Cui J, Sun X. MycotoxinDB: A Data-Driven Platform for Investigating Masked Forms of Mycotoxins. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023. [PMID: 37145977 DOI: 10.1021/acs.jafc.3c01403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Mycotoxins are likely to be converted into masked forms when subjected to plant metabolism or food processing. These masked forms of mycotoxins together with their prototypes may cause mixture toxicity effects, causing adverse effects on animal welfare and productivity. The structure elucidation of masked mycotoxins is the most challenging task in mycotoxin research due to the limitations of traditional analysis methods. To assist in the rapid identification of masked mycotoxins, we developed a data-driven online prediction tool, MycotoxinDB, based on reaction rules. Using MycotoxinDB, we identified seven masked DONs from wheat samples. Given its widespread applications, we expect that MycotoxinDB will become an indispensable tool in future mycotoxin research. MycotoxinDB is freely available at: http://www.mycotoxin-db.com/.
Collapse
Affiliation(s)
- Jian Ji
- State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Food, Synergetic Innovation Center of Food Safety and Quality Control, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
- College of Food Science and Pharmacy, Xinjiang Agricultural University, No. 311 Nongda Dong Road, Ürümqi, 830052 Xinjiang Uygur Autonomous Region, P. R. China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, Zurich 8093, Switzerland
| | - Jin Ye
- Academy of National Food and Strategic Reserves Administration, No. 11 Baiwanzhuang Str, Xicheng District, Beijing 100037, P. R. China
| | - Yi Zheng
- Jiangsu Agri-animal Husbandry Vocational College, Key Laboratory for High-Tech Research and Development of Veterinary Biopharmaceuticals, Taizhou, Jiangsu 225300, P. R. China
| | - Jing Cui
- Wuxi Food Safety Inspection and Test Center, No. 1 Building, Life Science Park, Xinwu District, Jiangsu, P. R. China
| | - Xiulan Sun
- State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Food, Synergetic Innovation Center of Food Safety and Quality Control, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| |
Collapse
|
7
|
Slenter DN, Hemel IMGM, Evelo CT, Bierau J, Willighagen EL, Steinbusch LKM. Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations. Orphanet J Rare Dis 2023; 18:95. [PMID: 37101200 PMCID: PMC10131334 DOI: 10.1186/s13023-023-02683-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 04/02/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Inherited Metabolic Disorders (IMDs) are rare diseases where one impaired protein leads to a cascade of changes in the adjacent chemical conversions. IMDs often present with non-specific symptoms, a lack of a clear genotype-phenotype correlation, and de novo mutations, complicating diagnosis. Furthermore, products of one metabolic conversion can be the substrate of another pathway obscuring biomarker identification and causing overlapping biomarkers for different disorders. Visualization of the connections between metabolic biomarkers and the enzymes involved might aid in the diagnostic process. The goal of this study was to provide a proof-of-concept framework for integrating knowledge of metabolic interactions with real-life patient data before scaling up this approach. This framework was tested on two groups of well-studied and related metabolic pathways (the urea cycle and pyrimidine de-novo synthesis). The lessons learned from our approach will help to scale up the framework and support the diagnosis of other less-understood IMDs. METHODS Our framework integrates literature and expert knowledge into machine-readable pathway models, including relevant urine biomarkers and their interactions. The clinical data of 16 previously diagnosed patients with various pyrimidine and urea cycle disorders were visualized on the top 3 relevant pathways. Two expert laboratory scientists evaluated the resulting visualizations to derive a diagnosis. RESULTS The proof-of-concept platform resulted in varying numbers of relevant biomarkers (five to 48), pathways, and pathway interactions for each patient. The two experts reached the same conclusions for all samples with our proposed framework as with the current metabolic diagnostic pipeline. For nine patient samples, the diagnosis was made without knowledge about clinical symptoms or sex. For the remaining seven cases, four interpretations pointed in the direction of a subset of disorders, while three cases were found to be undiagnosable with the available data. Diagnosing these patients would require additional testing besides biochemical analysis. CONCLUSION The presented framework shows how metabolic interaction knowledge can be integrated with clinical data in one visualization, which can be relevant for future analysis of difficult patient cases and untargeted metabolomics data. Several challenges were identified during the development of this framework, which should be resolved before this approach can be scaled up and implemented to support the diagnosis of other (less understood) IMDs. The framework could be extended with other OMICS data (e.g. genomics, transcriptomics), and phenotypic data, as well as linked to other knowledge captured as Linked Open Data.
Collapse
Affiliation(s)
- Denise N Slenter
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands.
| | - Irene M G M Hemel
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Jörgen Bierau
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Laura K M Steinbusch
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
8
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
9
|
Rose TD, Köhler N, Falk L, Klischat L, Lazareva OE, Pauling JK. Lipid network and moiety analysis for revealing enzymatic dysregulation and mechanistic alterations from lipidomics data. Brief Bioinform 2023; 24:bbac572. [PMID: 36592059 PMCID: PMC9851308 DOI: 10.1093/bib/bbac572] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 11/10/2022] [Accepted: 11/24/2022] [Indexed: 01/03/2023] Open
Abstract
Lipidomics is of growing importance for clinical and biomedical research due to many associations between lipid metabolism and diseases. The discovery of these associations is facilitated by improved lipid identification and quantification. Sophisticated computational methods are advantageous for interpreting such large-scale data for understanding metabolic processes and their underlying (patho)mechanisms. To generate hypothesis about these mechanisms, the combination of metabolic networks and graph algorithms is a powerful option to pinpoint molecular disease drivers and their interactions. Here we present lipid network explorer (LINEX$^2$), a lipid network analysis framework that fuels biological interpretation of alterations in lipid compositions. By integrating lipid-metabolic reactions from public databases, we generate dataset-specific lipid interaction networks. To aid interpretation of these networks, we present an enrichment graph algorithm that infers changes in enzymatic activity in the context of their multispecificity from lipidomics data. Our inference method successfully recovered the MBOAT7 enzyme from knock-out data. Furthermore, we mechanistically interpret lipidomic alterations of adipocytes in obesity by leveraging network enrichment and lipid moieties. We address the general lack of lipidomics data mining options to elucidate potential disease mechanisms and make lipidomics more clinically relevant.
Collapse
Affiliation(s)
- Tim D Rose
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Nikolai Köhler
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Lisa Falk
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Lucie Klischat
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Olga E Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Junior Clinical Cooperation Unit Multiparametric methods for early detection of prostate cancer, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Josch K Pauling
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
10
|
Andersen JL, Fagerberg R, Flamm C, Fontana W, Kolčák J, Laurent CVFP, Merkle D, Nøjgaard N. Representing Catalytic Mechanisms with Rule Composition. J Chem Inf Model 2022; 62:5513-5524. [PMID: 36326605 DOI: 10.1021/acs.jcim.2c00426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An "imaginary transition structure" overlays the molecular graphs of the educt and product sides of an elementary chemical reaction in a single graph to highlight the changes in bond structure. We generalize this idea to reactions with complex mechanisms in a formally rigorous approach based on composing arrow-pushing steps represented as graph-transformation rules to construct an overall composite rule and a derived transition structure. This transition structure retains information about transient bond changes that are invisible at the overall level and can be constructed automatically from an existing database of detailed enzymatic mechanisms. We use the construction to (i) illuminate the distribution of catalytic action across enzymes and substrates and (ii) to search in a large database for reactions of known or unknown mechanisms that are compatible with the mechanism captured by the constructed composite rule.
Collapse
Affiliation(s)
- Jakob L Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Christoph Flamm
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria
| | - Walter Fontana
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, Massachusetts 02115, United States
| | - Juri Kolčák
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark.,Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, Massachusetts 02115, United States
| | - Christophe V F P Laurent
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Nikolai Nøjgaard
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| |
Collapse
|
11
|
Shi Z, Liu P, Liao X, Mao Z, Zhang J, Wang Q, Sun J, Ma H, Ma Y. Data-Driven Synthetic Cell Factories Development for Industrial Biomanufacturing. BIODESIGN RESEARCH 2022; 2022:9898461. [PMID: 37850146 PMCID: PMC10521697 DOI: 10.34133/2022/9898461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/26/2022] [Indexed: 10/19/2023] Open
Abstract
Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a wide range of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, we review the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. We first briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis for data-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are also presented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factory development using examples from recent studies, including the prediction of protein function, improvement of metabolic models, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization. In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methods should be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains for industrial biomanufacturing.
Collapse
Affiliation(s)
- Zhenkun Shi
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Pi Liu
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Xiaoping Liao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Zhitao Mao
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jianqi Zhang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Qinhong Wang
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Jibin Sun
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Hongwu Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| | - Yanhe Ma
- Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin 300308China
| |
Collapse
|
12
|
Zheng S, Zeng T, Li C, Chen B, Coley CW, Yang Y, Wu R. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat Commun 2022; 13:3342. [PMID: 35688826 PMCID: PMC9187661 DOI: 10.1038/s41467-022-30970-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 05/27/2022] [Indexed: 12/30/2022] Open
Abstract
The complete biosynthetic pathways are unknown for most natural products (NPs), it is thus valuable to make computer-aided bio-retrosynthesis predictions. Here, a navigable and user-friendly toolkit, BioNavi-NP, is developed to predict the biosynthetic pathways for both NPs and NP-like compounds. First, a single-step bio-retrosynthesis prediction model is trained using both general organic and biosynthetic reactions through end-to-end transformer neural networks. Based on this model, plausible biosynthetic pathways can be efficiently sampled through an AND-OR tree-based planning algorithm from iterative multi-step bio-retrosynthetic routes. Extensive evaluations reveal that BioNavi-NP can identify biosynthetic pathways for 90.2% of 368 test compounds and recover the reported building blocks as in the test set for 72.8%, 1.7 times more accurate than existing conventional rule-based approaches. The model is further shown to identify biologically plausible pathways for complex NPs collected from the recent literature. The toolkit as well as the curated datasets and learned models are freely available to facilitate the elucidation and reconstruction of the biosynthetic pathways for NPs. The complete biosynthetic pathway from most natural products (NPs) are unknown. Here, the authors report BioNavi-NP, a computational toolkit for bio-retrosynthetic pathway elucidation or reconstruction for both NPs and NP-like compounds.
Collapse
Affiliation(s)
- Shuangjia Zheng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Galixir, Beijing, China.,School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | | | - Binghong Chen
- College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China.
| |
Collapse
|
13
|
The Immunometabolic Atlas: A tool for design and interpretation of metabolomics studies in immunology. PLoS One 2022; 17:e0268408. [PMID: 35550647 PMCID: PMC9098072 DOI: 10.1371/journal.pone.0268408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/28/2022] [Indexed: 11/28/2022] Open
Abstract
Immunometabolism, which concerns the interplay between metabolism and the immune system, is increasingly recognized as a potential source of novel drug targets and biomarkers. In this context, the use of metabolomics to identify metabolic characteristics associated with specific functional immune response processes is of value. Currently, there is a lack of tools to determine known associations between metabolites and immune processes. Consequently, interpretation of metabolites in metabolomics studies in terms of their role in the immune system, or selection of the most relevant metabolite classes to include in metabolomics studies, is challenging. Here, we describe the Immunometabolic Atlas (IMA), a public web application and library of R functions to infer immune processes associated with specific metabolites and vice versa. The IMA derives metabolite-immune process associations utilizing a protein-metabolite network analysis algorithm that associates immune system-associated annotated proteins in Gene Ontology to metabolites. We evaluated IMA inferred metabolite-immune system associations using a text mining strategy, identifying substantial overlap, but also demonstrating a significant chemical space of immune system-associated metabolites that should be confirmed experimentally. Overall, the IMA facilitates the interpretation and design of immunometabolomics studies by the association of metabolites to specific immune processes.
Collapse
|
14
|
Liu S, Moon CD, Zheng N, Huws S, Zhao S, Wang J. Opportunities and challenges of using metagenomic data to bring uncultured microbes into cultivation. MICROBIOME 2022; 10:76. [PMID: 35546409 PMCID: PMC9097414 DOI: 10.1186/s40168-022-01272-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/10/2022] [Indexed: 05/12/2023]
Abstract
Although there is now an extensive understanding of the diversity of microbial life on earth through culture-independent metagenomic DNA sequence analyses, the isolation and cultivation of microbes remains critical to directly study them and confirm their metabolic and physiological functions, and their ecological roles. The majority of environmental microbes are as yet uncultured however; therefore, bringing these rare or poorly characterized groups into culture is a priority to further understand microbiome functions. Moreover, cultivated isolates may find utility in a range of applications, such as new probiotics, biocontrol agents, and agents for industrial processes. The growing abundance of metagenomic and meta-transcriptomic sequence information from a wide range of environments provides more opportunities to guide the isolation and cultivation of microbes of interest. In this paper, we discuss a range of successful methodologies and applications that have underpinned recent metagenome-guided isolation and cultivation of microbe efforts. These approaches include determining specific culture conditions to enrich for taxa of interest, to more complex strategies that specifically target the capture of microbial species through antibody engineering and genome editing strategies. With the greater degree of genomic information now available from uncultivated members, such as via metagenome-assembled genomes, the theoretical understanding of their cultivation requirements will enable greater possibilities to capture these and ultimately gain a more comprehensive understanding of the microbiomes. Video Abstract.
Collapse
Affiliation(s)
- Sijia Liu
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, No. 2 Yuanmingyuan West Road, Haidian, Beijing, 100193, China
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730020, China
| | - Christina D Moon
- AgResearch Ltd., Grasslands Research Centre, Palmerston North, New Zealand
| | - Nan Zheng
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, No. 2 Yuanmingyuan West Road, Haidian, Beijing, 100193, China
| | - Sharon Huws
- School of Biological Sciences and Institute for Global Food Security, 19 Chlorine Gardens, Queen's University Belfast, Belfast, UK
| | - Shengguo Zhao
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, No. 2 Yuanmingyuan West Road, Haidian, Beijing, 100193, China.
| | - Jiaqi Wang
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, No. 2 Yuanmingyuan West Road, Haidian, Beijing, 100193, China.
| |
Collapse
|
15
|
PyMiner: A method for metabolic pathway design based on the uniform similarity of substrate-product pairs and conditional search. PLoS One 2022; 17:e0266783. [PMID: 35404943 PMCID: PMC9000129 DOI: 10.1371/journal.pone.0266783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 03/26/2022] [Indexed: 11/30/2022] Open
Abstract
Metabolic pathway design is an essential step in the course of constructing an efficient microbial cell factory to produce high value-added chemicals. Meanwhile, the computational design of biologically meaningful metabolic pathways has been attracting much attention to produce natural and non-natural products. However, there has been a lack of effective methods to perform metabolic network reduction automatically. In addition, comprehensive evaluation indexes for metabolic pathway are still relatively scarce. Here, we define a novel uniform similarity to calculate the main substrate-product pairs of known biochemical reactions, and develop further an efficient metabolic pathway design tool named PyMiner. As a result, the redundant information of general metabolic network (GMN) is eliminated, and the number of substrate-product pairs is shown to decrease by 81.62% on average. Considering that the nodes in the extracted metabolic network (EMN) constructed in this work is large in scale but imbalanced in distribution, we establish a conditional search strategy (CSS) that cuts search time in 90.6% cases. Compared with state-of-the-art methods, PyMiner shows obvious advantages and demonstrates equivalent or better performance on 95% cases of experimentally verified pathways. Consequently, PyMiner is a practical and effective tool for metabolic pathway design.
Collapse
|
16
|
Fukushima A, Takahashi M, Nagasaki H, Aono Y, Kobayashi M, Kusano M, Saito K, Kobayashi N, Arita M. Development of RIKEN Plant Metabolome MetaDatabase. PLANT & CELL PHYSIOLOGY 2022; 63:433-440. [PMID: 34918130 PMCID: PMC8917833 DOI: 10.1093/pcp/pcab173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 11/15/2021] [Accepted: 12/16/2021] [Indexed: 06/14/2023]
Abstract
The advancement of metabolomics in terms of techniques for measuring small molecules has enabled the rapid detection and quantification of numerous cellular metabolites. Metabolomic data provide new opportunities to gain a deeper understanding of plant metabolism that can improve the health of both plants and humans that consume them. Although major public repositories for general metabolomic data have been established, the community still has shortcomings related to data sharing, especially in terms of data reanalysis, reusability and reproducibility. To address these issues, we developed the RIKEN Plant Metabolome MetaDatabase (RIKEN PMM, http://metabobank.riken.jp/pmm/db/plantMetabolomics), which stores mass spectrometry-based (e.g. gas chromatography-MS-based) metabolite profiling data of plants together with their detailed, structured experimental metadata, including sampling and experimental procedures. Our metadata are described as Linked Open Data based on the Resource Description Framework using standardized and controlled vocabularies, such as the Metabolomics Standards Initiative Ontology, which are to be integrated with various life and biomedical science data using the World Wide Web. RIKEN PMM implements intuitive and interactive operations for plant metabolome data, including raw data (netCDF format), mass spectra (NIST MSP format) and metabolite annotations. The feature is suitable not only for biologists who are interested in metabolomic phenotypes, but also for researchers who would like to investigate life science in general through plant metabolomic approaches.
Collapse
Affiliation(s)
- Atsushi Fukushima
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Mikiko Takahashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Hideki Nagasaki
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Yusuke Aono
- Degree Programs in Life and Earth Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
| | - Makoto Kobayashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Miyako Kusano
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Faculty of Life and Environmental Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
- Tsukuba Plant Innovation Research Center, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8572, Japan
| | - Kazuki Saito
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Norio Kobayashi
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Data Knowledge Organization Unit, RIKEN Information R&D and Strategy Headquarters, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Masanori Arita
- Metabolome Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
17
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 DOI: 10.26434/chemrxiv.13524191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
INTRODUCTION The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. MATERIALS AND METHODS We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property-object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. RESULTS The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org. DISCUSSION SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. CONCLUSION Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
18
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 PMCID: PMC8978481 DOI: 10.1089/aivt.2021.0010] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Introduction: The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. Materials and Methods: We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property–object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. Results: The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org Discussion: SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. Conclusion: Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
19
|
Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, Griss J, Sevilla C, Matthews L, Gong C, Deng C, Varusai T, Ragueneau E, Haider Y, May B, Shamovsky V, Weiser J, Brunson T, Sanati N, Beckman L, Shao X, Fabregat A, Sidiropoulos K, Murillo J, Viteri G, Cook J, Shorser S, Bader G, Demir E, Sander C, Haw R, Wu G, Stein L, Hermjakob H, D’Eustachio P. The reactome pathway knowledgebase 2022. Nucleic Acids Res 2022; 50:D687-D692. [PMID: 34788843 PMCID: PMC8689983 DOI: 10.1093/nar/gkab1028] [Citation(s) in RCA: 1037] [Impact Index Per Article: 345.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/11/2021] [Accepted: 10/13/2021] [Indexed: 11/13/2022] Open
Abstract
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.
Collapse
Affiliation(s)
- Marc Gillespie
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- College of Pharmacy and Health Sciences, St. John’s University, Queens, NY11439, USA
| | - Bijay Jassal
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Ralf Stephan
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Marija Milacic
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Andrea Senff-Ribeiro
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Universidade Federal do Paraná, Curitiba, 80060-000, Brazil
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lisa Matthews
- NYU Grossman School of Medicine, New York, NY10016, USA
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Chuan Deng
- National Center for Protein Sciences Beijing, Beijing Institute of Life Omics, Beijing102206, China
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Thawfeek Varusai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Yusra Haider
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Bruce May
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | | | - Joel Weiser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Timothy Brunson
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Nasim Sanati
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Liam Beckman
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Xiang Shao
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Antonio Fabregat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Konstantinos Sidiropoulos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Julieth Murillo
- Centro Internacional de Entrenamiento e Investigaciones Médicas, Cali 18 # 122-135, Colombia
| | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Justin Cook
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Solomon Shorser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Gary Bader
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Emek Demir
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Chris Sander
- cBio Center at Dana-Farber Cancer Institute, Boston, MA02115, USA
| | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Guanming Wu
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- National Center for Protein Sciences Beijing, Beijing Institute of Life Omics, Beijing102206, China
| | | |
Collapse
|
20
|
Bansal P, Morgat A, Axelsen KB, Muthukrishnan V, Coudert E, Aimo L, Hyka-Nouspikel N, Gasteiger E, Kerhornou A, Neto TB, Pozzato M, Blatter MC, Ignatchenko A, Redaschi N, Bridge A. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res 2022; 50:D693-D700. [PMID: 34755880 PMCID: PMC8728268 DOI: 10.1093/nar/gkab1016] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/08/2021] [Accepted: 11/09/2021] [Indexed: 12/15/2022] Open
Abstract
Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.
Collapse
Affiliation(s)
- Parit Bansal
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Kristian B Axelsen
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Venkatesh Muthukrishnan
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Lucila Aimo
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Gasteiger
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Arnaud Kerhornou
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Teresa Batista Neto
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Monica Pozzato
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Marie-Claude Blatter
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alex Ignatchenko
- EMBL-EBI European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alan Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| |
Collapse
|
21
|
Meldal BHM, Perfetto L, Combe C, Lubiana T, Ferreira Cavalcante JV, Bye-A-Jee H, Waagmeester A, del-Toro N, Shrivastava A, Barrera E, Wong E, Mlecnik B, Bindea G, Panneerselvam K, Willighagen E, Rappsilber J, Porras P, Hermjakob H, Orchard S. Complex Portal 2022: new curation frontiers. Nucleic Acids Res 2022; 50:D578-D586. [PMID: 34718729 PMCID: PMC8689886 DOI: 10.1093/nar/gkab991] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/07/2021] [Accepted: 10/10/2021] [Indexed: 01/02/2023] Open
Abstract
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Collapse
Affiliation(s)
- Birgit H M Meldal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Fondazione Human Technopole, 20157 Milan, Italy
| | - Colin Combe
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Tiago Lubiana
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, Av. Professor Lineu Prestes 580, CEP 05508-000 São Paulo SP, Brasil
| | - João Vitor Ferreira Cavalcante
- Bioinformatics Multidisciplinary Environment (BioME), Digital Metropolis Institute, Federal University of Rio Grande do Norte, Av. Odilon Gomes de Lima 1722, Capim Macio, 59078-400 Natal/RN, Brasil
| | - Hema Bye-A-Jee
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Noemi del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anjali Shrivastava
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elisabeth Barrera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Edith Wong
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Bernhard Mlecnik
- Laboratory of Integrative Cancer Immunology, INSERM, 75006 Paris, France
- Equipe Labellisée Ligue Contre le Cancer, 75006 Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, Université de Paris, 75006 Paris, France
- Inovarion, 75005 Paris, France
| | - Gabriela Bindea
- Laboratory of Integrative Cancer Immunology, INSERM, 75006 Paris, France
- Equipe Labellisée Ligue Contre le Cancer, 75006 Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, Université de Paris, 75006 Paris, France
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Egon Willighagen
- Dept of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
22
|
Danchin A. In vivo, in vitro and in silico: an open space for the development of microbe-based applications of synthetic biology. Microb Biotechnol 2022; 15:42-64. [PMID: 34570957 PMCID: PMC8719824 DOI: 10.1111/1751-7915.13937] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 09/14/2021] [Indexed: 12/24/2022] Open
Abstract
Living systems are studied using three complementary approaches: living cells, cell-free systems and computer-mediated modelling. Progresses in understanding, allowing researchers to create novel chassis and industrial processes rest on a cycle that combines in vivo, in vitro and in silico studies. This design-build-test-learn iteration loop cycle between experiments and analyses combines together physiology, genetics, biochemistry and bioinformatics in a way that keeps going forward. Because computer-aided approaches are not directly constrained by the material nature of the entities of interest, we illustrate here how this virtuous cycle allows researchers to explore chemistry which is foreign to that present in extant life, from whole chassis to novel metabolic cycles. Particular emphasis is placed on the importance of evolution.
Collapse
Affiliation(s)
- Antoine Danchin
- Kodikos LabsInstitut Cochin24 rue du Faubourg Saint‐JacquesParis75014France
| |
Collapse
|
23
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
24
|
Feierabend M, Renz A, Zelle E, Nöh K, Wiechert W, Dräger A. High-Quality Genome-Scale Reconstruction of Corynebacterium glutamicum ATCC 13032. Front Microbiol 2021; 12:750206. [PMID: 34867870 PMCID: PMC8634658 DOI: 10.3389/fmicb.2021.750206] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 10/19/2021] [Indexed: 11/30/2022] Open
Abstract
Corynebacterium glutamicum belongs to the microbes of enormous biotechnological relevance. In particular, its strain ATCC 13032 is a widely used producer of L-amino acids at an industrial scale. Its apparent robustness also turns it into a favorable platform host for a wide range of further compounds, mainly because of emerging bio-based economies. A deep understanding of the biochemical processes in C. glutamicum is essential for a sustainable enhancement of the microbe's productivity. Computational systems biology has the potential to provide a valuable basis for driving metabolic engineering and biotechnological advances, such as increased yields of healthy producer strains based on genome-scale metabolic models (GEMs). Advanced reconstruction pipelines are now available that facilitate the reconstruction of GEMs and support their manual curation. This article presents iCGB21FR, an updated and unified GEM of C. glutamicum ATCC 13032 with high quality regarding comprehensiveness and data standards, built with the latest modeling techniques and advanced reconstruction pipelines. It comprises 1042 metabolites, 1539 reactions, and 805 genes with detailed annotations and database cross-references. The model validation took place using different media and resulted in realistic growth rate predictions under aerobic and anaerobic conditions. The new GEM produces all canonical amino acids, and its phenotypic predictions are consistent with laboratory data. The in silico model proved fruitful in adding knowledge to the metabolism of C. glutamicum: iCGB21FR still produces L-glutamate with the knock-out of the enzyme pyruvate carboxylase, despite the common belief to be relevant for the amino acid's production. We conclude that integrating high standards into the reconstruction of GEMs facilitates replicating validated knowledge, closing knowledge gaps, and making it a useful basis for metabolic engineering. The model is freely available from BioModels Database under identifier MODEL2102050001.
Collapse
Affiliation(s)
- Martina Feierabend
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Alina Renz
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Elisabeth Zelle
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Katharina Nöh
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Wolfgang Wiechert
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
- Computational Systems Biotechnology (AVT.CSB), RWTH Aachen University, Aachen, Germany
| | - Andreas Dräger
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
| |
Collapse
|
25
|
GPRuler: Metabolic gene-protein-reaction rules automatic reconstruction. PLoS Comput Biol 2021; 17:e1009550. [PMID: 34748537 PMCID: PMC8601613 DOI: 10.1371/journal.pcbi.1009550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 11/18/2021] [Accepted: 10/11/2021] [Indexed: 11/19/2022] Open
Abstract
Metabolic network models are increasingly being used in health care and industry. As a consequence, many tools have been released to automate their reconstruction process de novo. In order to enable gene deletion simulations and integration of gene expression data, these networks must include gene-protein-reaction (GPR) rules, which describe with a Boolean logic relationships between the gene products (e.g., enzyme isoforms or subunits) associated with the catalysis of a given reaction. Nevertheless, the reconstruction of GPRs still remains a largely manual and time consuming process. Aiming at fully automating the reconstruction process of GPRs for any organism, we propose the open-source python-based framework GPRuler. By mining text and data from 9 different biological databases, GPRuler can reconstruct GPRs starting either from just the name of the target organism or from an existing metabolic model. The performance of the developed tool is evaluated at small-scale level for a manually curated metabolic model, and at genome-scale level for three metabolic models related to Homo sapiens and Saccharomyces cerevisiae organisms. By exploiting these models as benchmarks, the proposed tool shown its ability to reproduce the original GPR rules with a high level of accuracy. In all the tested scenarios, after a manual investigation of the mismatches between the rules proposed by GPRuler and the original ones, the proposed approach revealed to be in many cases more accurate than the original models. By complementing existing tools for metabolic network reconstruction with the possibility to reconstruct GPRs quickly and with a few resources, GPRuler paves the way to the study of context-specific metabolic networks, representing the active portion of the complete network in given conditions, for organisms of industrial or biomedical interest that have not been characterized metabolically yet.
Collapse
|
26
|
Heid E, Goldman S, Sankaranarayanan K, Coley CW, Flamm C, Green WH. EHreact: Extended Hasse Diagrams for the Extraction and Scoring of Enzymatic Reaction Templates. J Chem Inf Model 2021; 61:4949-4961. [PMID: 34587449 PMCID: PMC8549070 DOI: 10.1021/acs.jcim.1c00921] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Indexed: 11/29/2022]
Abstract
Data-driven computer-aided synthesis planning utilizing organic or biocatalyzed reactions from large databases has gained increasing interest in the last decade, sparking the development of numerous tools to extract, apply, and score general reaction templates. The generation of reaction rules for enzymatic reactions is especially challenging since substrate promiscuity varies between enzymes, causing the optimal levels of rule specificity and optimal number of included atoms to differ between enzymes. This complicates an automated extraction from databases and has promoted the creation of manually curated reaction rule sets. Here, we present EHreact, a purely data-driven open-source software tool, to extract and score reaction rules from sets of reactions known to be catalyzed by an enzyme at appropriate levels of specificity without expert knowledge. EHreact extracts and groups reaction rules into tree-like structures, Hasse diagrams, based on common substructures in the imaginary transition structures. Each diagram can be utilized to output a single or a set of reaction rules, as well as calculate the probability of a new substrate to be processed by the given enzyme by inferring information about the reactive site of the enzyme from the known reactions and their grouping in the template tree. EHreact heuristically predicts the activity of a given enzyme on a new substrate, outperforming current approaches in accuracy and functionality.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Samuel Goldman
- Computational
and Systems Biology, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Karthik Sankaranarayanan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Christoph Flamm
- Department
of Theoretical Chemistry, University of
Vienna, 1090 Vienna, Austria
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
27
|
Zaru R, Onwubiko J, Ribeiro AJM, Cochrane K, Tyzack JD, Muthukrishnan V, Pravda L, Thornton JM, O'Donovan C, Velanker S, Orchard S, Leach A, Martin MJ. The Enzyme Portal: an integrative tool for enzyme information and analysis. FEBS J 2021; 289:5875-5890. [PMID: 34437766 DOI: 10.1111/febs.16168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/10/2021] [Accepted: 08/25/2021] [Indexed: 12/19/2022]
Abstract
Enzymes play essential roles in all life processes and are used extensively in the biomedical and biotechnological fields. However, enzyme-related information is spread across multiple resources making its retrieval time-consuming. In response to this challenge, the Enzyme Portal has been established to facilitate enzyme research, by providing a freely available hub where researchers can easily find and explore enzyme-related information. It integrates relevant enzyme data for a wide range of species from various resources such as UniProtKB, PDBe and ChEMBL. Here, we describe what type of enzyme-related data the Enzyme Portal provides, how the information is organized and, by show-casing two potential use cases, how to access and retrieve it.
Collapse
Affiliation(s)
- Rossana Zaru
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Joseph Onwubiko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Antonio J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Keeva Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Venkatesh Muthukrishnan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Lukas Pravda
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Sameer Velanker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Andrew Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
28
|
Baldazzi D, Savojardo C, Martelli PL, Casadio R. BENZ WS: the Bologna ENZyme Web Server for four-level EC number annotation. Nucleic Acids Res 2021; 49:W60-W66. [PMID: 33963861 PMCID: PMC8262719 DOI: 10.1093/nar/gkab328] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/01/2021] [Accepted: 04/20/2021] [Indexed: 11/12/2022] Open
Abstract
The Bologna ENZyme Web Server (BENZ WS) annotates four-level Enzyme Commission numbers (EC numbers) as defined by the International Union of Biochemistry and Molecular Biology (IUBMB). BENZ WS filters a target sequence with a combined system of Hidden Markov Models, modelling protein sequences annotated with the same molecular function, and Pfams, carrying along conserved protein domains. BENZ returns, when successful, for any enzyme target sequence an associated four-level EC number. Our system can annotate both monofunctional and polyfunctional enzymes, and it can be a valuable resource for sequence functional annotation.
Collapse
Affiliation(s)
- Davide Baldazzi
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
29
|
Andersen JL, Fagerberg R, Flamm C, Fontana W, Kolčák J, Laurent CVFP, Merkle D, Nøjgaard N. Graph transformation for enzymatic mechanisms. Bioinformatics 2021; 37:i392-i400. [PMID: 34252947 PMCID: PMC8686676 DOI: 10.1093/bioinformatics/btab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2021] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that the enzyme is meant to implement, is largely left to human expertise. The ability to capture specific chemistries of multistep catalysis in a fashion that enables its computational construction and design is therefore highly desirable and would equally impact the elucidation of existing enzymatic reactions whose mechanisms are unknown. RESULTS We use the mathematical framework of graph transformation to express the distinction between rules and reactions in chemistry. We derive about 1000 rules for amino acid side chain chemistry from the M-CSA database, a curated repository of enzymatic mechanisms. Using graph transformation, we are able to propose hundreds of hypothetical catalytic mechanisms for a large number of unrelated reactions in the Rhea database. We analyze these mechanisms to find that they combine in chemically sound fashion individual steps from a variety of known multistep mechanisms, showing that plausible novel mechanisms for catalysis can be constructed computationally. AVAILABILITY AND IMPLEMENTATION The source code of the initial prototype of our approach is available at https://github.com/Nojgaard/mechsearch. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakob L Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Christoph Flamm
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Walter Fontana
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Juri Kolčák
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | | | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Nikolai Nøjgaard
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
30
|
Renz A, Dräger A. Curating and comparing 114 strain-specific genome-scale metabolic models of Staphylococcus aureus. NPJ Syst Biol Appl 2021; 7:30. [PMID: 34188046 PMCID: PMC8241996 DOI: 10.1038/s41540-021-00188-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 05/25/2021] [Indexed: 12/19/2022] Open
Abstract
Staphylococcus aureus is a high-priority pathogen causing severe infections with high morbidity and mortality worldwide. Many S. aureus strains are methicillin-resistant (MRSA) or even multi-drug resistant. It is one of the most successful and prominent modern pathogens. An effective fight against S. aureus infections requires novel targets for antimicrobial and antistaphylococcal therapies. Recent advances in whole-genome sequencing and high-throughput techniques facilitate the generation of genome-scale metabolic models (GEMs). Among the multiple applications of GEMs is drug-targeting in pathogens. Hence, comprehensive and predictive metabolic reconstructions of S. aureus could facilitate the identification of novel targets for antimicrobial therapies. This review aims at giving an overview of all available GEMs of multiple S. aureus strains. We downloaded all 114 available GEMs of S. aureus for further analysis. The scope of each model was evaluated, including the number of reactions, metabolites, and genes. Furthermore, all models were quality-controlled using MEMOTE, an open-source application with standardized metabolic tests. Growth capabilities and model similarities were examined. This review should lead as a guide for choosing the appropriate GEM for a given research question. With the information about the availability, the format, and the strengths and potentials of each model, one can either choose an existing model or combine several models to create models with even higher predictive values. This facilitates model-driven discoveries of novel antimicrobial targets to fight multi-drug resistant S. aureus strains.
Collapse
Affiliation(s)
- Alina Renz
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
- Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany
| | - Andreas Dräger
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany.
- Department of Computer Science, University of Tübingen, Tübingen, Germany.
- Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany.
- German Center for Infection Research (DZIF), Partner Site Tübingen, Tübingen, Germany.
| |
Collapse
|
31
|
Shah HA, Liu J, Yang Z, Feng J. Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways. Front Mol Biosci 2021; 8:634141. [PMID: 34222327 PMCID: PMC8247443 DOI: 10.3389/fmolb.2021.634141] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
Prediction and reconstruction of metabolic pathways play significant roles in many fields such as genetic engineering, metabolic engineering, drug discovery, and are becoming the most active research topics in synthetic biology. With the increase of related data and with the development of machine learning techniques, there have many machine leaning based methods been proposed for prediction or reconstruction of metabolic pathways. Machine learning techniques are showing state-of-the-art performance to handle the rapidly increasing volume of data in synthetic biology. To support researchers in this field, we briefly review the research progress of metabolic pathway reconstruction and prediction based on machine learning. Some challenging issues in the reconstruction of metabolic pathways are also discussed in this paper.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Jing Feng
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| |
Collapse
|
32
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
33
|
McDonald AG, Davey GP. Simulating the enzymes of ganglioside biosynthesis with Glycologue. Beilstein J Org Chem 2021; 17:739-748. [PMID: 33828618 PMCID: PMC8008095 DOI: 10.3762/bjoc.17.64] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 03/12/2021] [Indexed: 02/03/2023] Open
Abstract
Gangliosides are an important class of sialylated glycosphingolipids linked to ceramide that are a component of the mammalian cell surface, especially those of the central nervous system, where they function in intercellular recognition and communication. We describe an in silico method for determining the metabolic pathways leading to the most common gangliosides, based on the known enzymes of their biosynthesis. A network of 41 glycolipids is produced by the actions of the 10 enzymes included in the model. The different ganglioside nomenclature systems in common use are compared and a systematic variant of the widely used Svennerholm nomenclature is described. Knockouts of specific enzyme activities are used to simulate congenital defects in ganglioside biosynthesis, and altered ganglioside status in cancer, and the effects on network structure are predicted. The simulator is available at the Glycologue website, https://glycologue.org/.
Collapse
Affiliation(s)
- Andrew G McDonald
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| | - Gavin P Davey
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
34
|
Bolleman J, de Castro E, Baratin D, Gehant S, Cuche BA, Auchincloss AH, Coudert E, Hulo C, Masson P, Pedruzzi I, Rivoire C, Xenarios I, Redaschi N, Bridge A. HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes. Gigascience 2021; 9:5731417. [PMID: 32034905 PMCID: PMC7007698 DOI: 10.1093/gigascience/giaa003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 11/30/2019] [Accepted: 01/13/2020] [Indexed: 12/24/2022] Open
Abstract
Background Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. Results Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. Conclusions HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
Collapse
Affiliation(s)
- Jerven Bolleman
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Edouard de Castro
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Delphine Baratin
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Sebastien Gehant
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Beatrice A Cuche
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Andrea H Auchincloss
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Chantal Hulo
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Patrick Masson
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ivo Pedruzzi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ioannis Xenarios
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.,Centre Hospitalier Universitaire Vaudois/Ludwig Institute for Cancer Research, Agora Centre, CH-1005 Lausanne, Switzerland
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| |
Collapse
|
35
|
Feuermann M, Boutet E, Morgat A, Axelsen KB, Bansal P, Bolleman J, de Castro E, Coudert E, Gasteiger E, Géhant S, Lieberherr D, Lombardot T, Neto TB, Pedruzzi I, Poux S, Pozzato M, Redaschi N, Bridge A. Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB. Metabolites 2021; 11:48. [PMID: 33445429 PMCID: PMC7827101 DOI: 10.3390/metabo11010048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/05/2021] [Accepted: 01/07/2021] [Indexed: 01/28/2023] Open
Abstract
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.
Collapse
Affiliation(s)
- Marc Feuermann
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Emmanuel Boutet
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Anne Morgat
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Kristian B. Axelsen
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Parit Bansal
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Jerven Bolleman
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Edouard de Castro
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Elisabeth Coudert
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Elisabeth Gasteiger
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Sébastien Géhant
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Damien Lieberherr
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Thierry Lombardot
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Teresa B. Neto
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Ivo Pedruzzi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Monica Pozzato
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - on behalf of the UniProt Consortium
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
- Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street NorthWest, Suite 1200, Washington, DC 20007, USA
| |
Collapse
|
36
|
Carbon S, Douglass E, Good BM, Unni DR, Harris NL, Mungall CJ, Basu S, Chisholm RL, Dodson RJ, Hartline E, Fey P, Thomas PD, Albou LP, Ebert D, Kesling MJ, Mi H, Muruganujan A, Huang X, Mushayahama T, LaBonte SA, Siegele DA, Antonazzo G, Attrill H, Brown NH, Garapati P, Marygold SJ, Trovisco V, dos Santos G, Falls K, Tabone C, Zhou P, Goodman JL, Strelets VB, Thurmond J, Garmiri P, Ishtiaq R, Rodríguez-López M, Acencio ML, Kuiper M, Lægreid A, Logie C, Lovering RC, Kramarz B, Saverimuttu SCC, Pinheiro SM, Gunn H, Su R, Thurlow KE, Chibucos M, Giglio M, Nadendla S, Munro J, Jackson R, Duesbury MJ, Del-Toro N, Meldal BHM, Paneerselvam K, Perfetto L, Porras P, Orchard S, Shrivastava A, Chang HY, Finn RD, Mitchell AL, Rawlings ND, Richardson L, Sangrador-Vegas A, Blake JA, Christie KR, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov DM, Harris MA, Oliver SG, Rutherford K, Wood V, Hayles J, Bähler J, Bolton ER, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Plasterer C, Tutaj MA, Vedi M, Wang SJ, D’Eustachio P, Matthews L, Balhoff JP, Aleksander SA, Alexander MJ, Cherry JM, Engel SR, Gondwe F, Karra K, Miyasato SR, Nash RS, Simison M, Skrzypek MS, Weng S, Wong ED, Feuermann M, Gaudet P, Morgat A, Bakker E, Berardini TZ, Reiser L, Subramaniam S, Huala E, Arighi CN, Auchincloss A, Axelsen K, Argoud-Puy G, Bateman A, Blatter MC, Boutet E, Bowler E, Breuza L, Bridge A, Britto R, Bye-A-Jee H, Casas CC, Coudert E, Denny P, Estreicher A, Famiglietti ML, Georghiou G, Gos A, Gruaz-Gumowski N, Hatton-Ellis E, Hulo C, Ignatchenko A, Jungo F, Laiho K, Le Mercier P, Lieberherr D, Lock A, Lussi Y, MacDougall A, Magrane M, Martin MJ, Masson P, Natale DA, Hyka-Nouspikel N, Orchard S, Pedruzzi I, Pourcel L, Poux S, Pundir S, Rivoire C, Speretta E, Sundaram S, Tyagi N, Warner K, Zaru R, Wu CH, Diehl AD, Chan JN, Grove C, Lee RYN, Muller HM, Raciti D, Van Auken K, Sternberg PW, Berriman M, Paulini M, Howe K, Gao S, Wright A, Stein L, Howe DG, Toro S, Westerfield M, Jaiswal P, Cooper L, Elser J. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res 2021; 49:D325-D334. [PMID: 33290552 PMCID: PMC7779012 DOI: 10.1093/nar/gkaa1113] [Citation(s) in RCA: 2095] [Impact Index Per Article: 523.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/22/2020] [Accepted: 12/02/2020] [Indexed: 12/28/2022] Open
Abstract
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
Collapse
|
37
|
Morgat A, Lombardot T, Coudert E, Axelsen K, Neto TB, Gehant S, Bansal P, Bolleman J, Gasteiger E, de Castro E, Baratin D, Pozzato M, Xenarios I, Poux S, Redaschi N, Bridge A. Enzyme annotation in UniProtKB using Rhea. Bioinformatics 2020; 36:1896-1901. [PMID: 31688925 PMCID: PMC7162351 DOI: 10.1093/bioinformatics/btz817] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/24/2019] [Accepted: 10/31/2019] [Indexed: 12/18/2022] Open
Abstract
Motivation To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. Results We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. Availability and implementation UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.
Collapse
Affiliation(s)
- Anne Morgat
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Thierry Lombardot
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Kristian Axelsen
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Teresa Batista Neto
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Sebastien Gehant
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Parit Bansal
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Jerven Bolleman
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Elisabeth Gasteiger
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Edouard de Castro
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Delphine Baratin
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Monica Pozzato
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | | | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 1211-4, Switzerland
| | | |
Collapse
|
38
|
Chen F, Yuan L, Ding S, Tian Y, Hu QN. Data-driven rational biosynthesis design: from molecules to cell factories. Brief Bioinform 2020; 21:1238-1248. [PMID: 31243440 DOI: 10.1093/bib/bbz065] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 04/28/2019] [Accepted: 05/08/2019] [Indexed: 11/12/2022] Open
Abstract
A proliferation of chemical, reaction and enzyme databases, new computational methods and software tools for data-driven rational biosynthesis design have emerged in recent years. With the coming of the era of big data, particularly in the bio-medical field, data-driven rational biosynthesis design could potentially be useful to construct target-oriented chassis organisms. Engineering the complicated metabolic systems of chassis organisms to biosynthesize target molecules from inexpensive biomass is the main goal of cell factory design. The process of data-driven cell factory design could be divided into several parts: (1) target molecule selection; (2) metabolic reaction and pathway design; (3) prediction of novel enzymes based on protein domain and structure transformation of biosynthetic reactions; (4) construction of large-scale DNA for metabolic pathways; and (5) DNA assembly methods and visualization tools. The construction of a one-stop cell factory system could achieve automated design from the molecule level to the chassis level. In this article, we outline data-driven rational biosynthesis design steps and provide an overview of related tools in individual steps.
Collapse
Affiliation(s)
- Fu Chen
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin, People's Republic of China.,Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Le Yuan
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Yu Tian
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| |
Collapse
|
39
|
Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, Sidiropoulos K, Cook J, Gillespie M, Haw R, Loney F, May B, Milacic M, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Weiser J, Wu G, Stein L, Hermjakob H, D'Eustachio P. The reactome pathway knowledgebase. Nucleic Acids Res 2020; 48:D498-D503. [PMID: 31691815 PMCID: PMC7145712 DOI: 10.1093/nar/gkz1031] [Citation(s) in RCA: 1200] [Impact Index Per Article: 240.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 10/18/2019] [Accepted: 10/21/2019] [Indexed: 12/20/2022] Open
Abstract
The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations in a single consistent data model, an extended version of a classic metabolic map. Reactome functions both as an archive of biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. To extend our ability to annotate human disease processes, we have implemented a new drug class and have used it initially to annotate drugs relevant to cardiovascular disease. Our annotation model depends on external domain experts to identify new areas for annotation and to review new content. New web pages facilitate recruitment of community experts and allow those who have contributed to Reactome to identify their contributions and link them to their ORCID records. To improve visualization of our content, we have implemented a new tool to automatically lay out the components of individual reactions with multiple options for downloading the reaction diagrams and associated data, and a new display of our event hierarchy that will facilitate visual interpretation of pathway analysis results.
Collapse
Affiliation(s)
- Bijay Jassal
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | | | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Pascual Lorente
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Antonio Fabregat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Konstantinos Sidiropoulos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Justin Cook
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Marc Gillespie
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada.,College of Pharmacy and Health Sciences, St. John's University, Queens, NY 11439, USA
| | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Fred Loney
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Bruce May
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Marija Milacic
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Solomon Shorser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Thawfeek Varusai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Joel Weiser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Guanming Wu
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,National Center for Protein Sciences, Beijing, China
| | | |
Collapse
|
40
|
Alexandrov T. Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence. Annu Rev Biomed Data Sci 2020; 3:61-87. [PMID: 34056560 DOI: 10.1146/annurev-biodatasci-011420-031537] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Spatial metabolomics is an emerging field of omics research that has enabled localizing metabolites, lipids, and drugs in tissue sections, a feat considered impossible just two decades ago. Spatial metabolomics and its enabling technology-imaging mass spectrometry-generate big hyper-spectral imaging data that have motivated the development of tailored computational methods at the intersection of computational metabolomics and image analysis. Experimental and computational developments have recently opened doors to applications of spatial metabolomics in life sciences and biomedicine. At the same time, these advances have coincided with a rapid evolution in machine learning, deep learning, and artificial intelligence, which are transforming our everyday life and promise to revolutionize biology and healthcare. Here, we introduce spatial metabolomics through the eyes of a computational scientist, review the outstanding challenges, provide a look into the future, and discuss opportunities granted by the ongoing convergence of human and artificial intelligence.
Collapse
Affiliation(s)
- Theodore Alexandrov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, USA
| |
Collapse
|
41
|
Tian Y, Wu L, Yuan L, Ding S, Chen F, Zhang T, Ren A, Zhang D, Tu W, Chen J, Hu QN. BCSExplorer: a customized biosynthetic chemical space explorer with multifunctional objective function analysis. Bioinformatics 2020; 36:1642-1643. [PMID: 31593245 DOI: 10.1093/bioinformatics/btz755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 09/11/2019] [Accepted: 10/01/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY The biosynthetic ability of living organisms has important applications in producing bulk chemicals, biofuels and natural products. Based on the most comprehensive biosynthesis knowledgebase, a computational system, BCSExplorer, is proposed to discover the unexplored chemical space using nature's biosynthetic potential. BCSExplorer first integrates the most comprehensive biosynthetic reaction database with 280 000 biochemical reactions and 60 000 chemicals biosynthesized globally over the past 130 years. Second, in this study, a biosynthesis tree is computed for a starting chemical molecule based on a comprehensive biotransformation rule library covering almost all biosynthetic possibilities, in which redundant rules are removed using a new algorithm. Moreover, biosynthesis feasibility, drug-likeness and toxicity analysis of a new generation of compounds will be pursued in further studies to meet various needs. BCSExplorer represents a novel method to explore biosynthetically available chemical space. AVAILABILITY AND IMPLEMENTATION BCSExplorer is available at: http://www.rxnfinder.org/bcsexplorer/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu Tian
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China.,University of Chinese Academy of Sciences, Beijing 100864, P.R. China
| | - Ling Wu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China
| | - Le Yuan
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China.,University of Chinese Academy of Sciences, Beijing 100864, P.R. China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P.R. China
| | - Fu Chen
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China.,College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, P.R. China
| | - Tong Zhang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China.,College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, P.R. China
| | - Ailin Ren
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P.R. China.,University of Chinese Academy of Sciences, Beijing 100864, P.R. China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P.R. China
| | - Weizhong Tu
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P.R. China
| | - Junni Chen
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P.R. China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P.R. China
| |
Collapse
|
42
|
Holliday GL, Brown SD, Mischel D, Polacco BJ, Babbitt PC. A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function. Database (Oxford) 2020; 2020:baaa034. [PMID: 32449511 PMCID: PMC7246345 DOI: 10.1093/database/baaa034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 12/12/2022]
Abstract
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Present Address: Medicines Discovery Catapult, Mereside, Alderley Park, Alderley Edge SK10 4TG, UK
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Benjamin J Polacco
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| |
Collapse
|
43
|
Lu H, Li F, Sánchez BJ, Zhu Z, Li G, Domenzain I, Marcišauskas S, Anton PM, Lappa D, Lieven C, Beber ME, Sonnenschein N, Kerkhoven EJ, Nielsen J. A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nat Commun 2019; 10:3586. [PMID: 31395883 PMCID: PMC6687777 DOI: 10.1038/s41467-019-11581-3] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/17/2019] [Indexed: 01/06/2023] Open
Abstract
Genome-scale metabolic models (GEMs) represent extensive knowledgebases that provide a platform for model simulations and integrative analysis of omics data. This study introduces Yeast8 and an associated ecosystem of models that represent a comprehensive computational resource for performing simulations of the metabolism of Saccharomyces cerevisiae--an important model organism and widely used cell-factory. Yeast8 tracks community development with version control, setting a standard for how GEMs can be continuously updated in a simple and reproducible way. We use Yeast8 to develop the derived models panYeast8 and coreYeast8, which in turn enable the reconstruction of GEMs for 1,011 different yeast strains. Through integration with enzyme constraints (ecYeast8) and protein 3D structures (proYeast8DB), Yeast8 further facilitates the exploration of yeast metabolism at a multi-scale level, enabling prediction of how single nucleotide variations translate to phenotypic traits.
Collapse
Affiliation(s)
- Hongzhong Lu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Feiran Li
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Benjamín J Sánchez
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Zhengming Zhu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
- School of Biotechnology, Jiangnan University, 1800 Lihu Road, 214122, Wuxi, Jiangsu, China
| | - Gang Li
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Iván Domenzain
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Simonas Marcišauskas
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Petre Mihail Anton
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Dimitra Lappa
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Christian Lieven
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Moritz Emanuel Beber
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Nikolaus Sonnenschein
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Eduard J Kerkhoven
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark.
- BioInnovation Institute, Ole Maaløes Vej 3, DK2200, Copenhagen N, Denmark.
| |
Collapse
|