1
|
Shah HA, Liu J, Yang Z, Yang F, Zhang Q, Feng J. DeepRT: Predicting compounds presence in pathway modules and classifying into module classes using deep neural networks based on molecular properties. J Bioinform Comput Biol 2023; 21:2350017. [PMID: 37632195 DOI: 10.1142/s0219720023500178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
Metabolic pathways play a crucial role in understanding the biochemistry of organisms. In metabolic pathways, modules refer to clusters of interconnected reactions or sub-networks representing specific functional units or biological processes within the overall pathway. In pathway modules, compounds are major elements and refer to the various molecules that participate in the biochemical reactions within the pathway modules. These molecules can include substrates, intermediates and final products. Determining the presence relation of compounds and pathway modules is essential for synthesizing new molecules and predicting hidden reactions. To date, several computational methods have been proposed to address this problem. However, all methods only predict the metabolic pathways and their types, not the pathway modules. To address this issue, we proposed a novel deep learning model, DeepRT that integrates message passing neural networks (MPNNs) and transformer encoder. This combination allows DeepRT to effectively extract global and local structure information from the molecular graph. The model is designed to perform two tasks: first, determining the present relation of the compound with the pathway module, and second, predicting the relation of query compound and module classes. The proposed DeepRT model evaluated on a dataset comprising compounds and pathway modules, and it outperforms existing approaches.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| | - Feng Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| | - Qiang Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| | - Jing Feng
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, P. R. China
| |
Collapse
|
2
|
Jiang J, Liu LP, Hassoun S. Learning graph representations of biochemical networks and its application to enzymatic link prediction. Bioinformatics 2021; 37:793-799. [PMID: 33051674 PMCID: PMC8097755 DOI: 10.1093/bioinformatics/btaa881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 08/01/2020] [Accepted: 09/29/2020] [Indexed: 11/20/2022] Open
Abstract
Motivation The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. Results We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. Availability and implementation The code and datasets are available through https://github.com/HassounLab/ELP.
Collapse
Affiliation(s)
- Julie Jiang
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Li-Ping Liu
- Department of Computer Science, Tufts University, Medford 02155, USA
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford 02155, USA.,Department of Chemical and Biological Engineering, Tufts University, Medford 02155, USA
| |
Collapse
|
3
|
M A Basher AR, Hallam SJ. Leveraging heterogeneous network embedding for metabolic pathway prediction. Bioinformatics 2021; 37:822-829. [PMID: 33305310 PMCID: PMC8098024 DOI: 10.1093/bioinformatics/btaa906] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 10/03/2020] [Accepted: 10/08/2020] [Indexed: 01/27/2023] Open
Abstract
Motivation Metabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible. Results Here, we present pathway2vec, a software package consisting of six representational learning modules used to automatically generate features for pathway inference. Specifically, we build a three-layered network composed of compounds, enzymes and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve prediction outcomes. Availability and implementation The software package and installation instructions are published on http://github.com/pathway2vec. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Abdur Rahman M A Basher
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Steven J Hallam
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.,Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.,Genome Science and Technology Program, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.,Life Sciences Institute, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.,ECOSCOPE Training Program, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| |
Collapse
|
4
|
Liu Y, Benitez MG, Chen J, Harrison E, Khusnutdinova AN, Mahadevan R. Opportunities and Challenges for Microbial Synthesis of Fatty Acid-Derived Chemicals (FACs). Front Bioeng Biotechnol 2021; 9:613322. [PMID: 33575251 PMCID: PMC7870715 DOI: 10.3389/fbioe.2021.613322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/04/2021] [Indexed: 11/13/2022] Open
Abstract
Global warming and uneven distribution of fossil fuels worldwide concerns have spurred the development of alternative, renewable, sustainable, and environmentally friendly resources. From an engineering perspective, biosynthesis of fatty acid-derived chemicals (FACs) is an attractive and promising solution to produce chemicals from abundant renewable feedstocks and carbon dioxide in microbial chassis. However, several factors limit the viability of this process. This review first summarizes the types of FACs and their widely applications. Next, we take a deep look into the microbial platform to produce FACs, give an outlook for the platform development. Then we discuss the bottlenecks in metabolic pathways and supply possible solutions correspondingly. Finally, we highlight the most recent advances in the fast-growing model-based strain design for FACs biosynthesis.
Collapse
Affiliation(s)
- Yilan Liu
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Mauricio Garcia Benitez
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Jinjin Chen
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Emma Harrison
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Anna N. Khusnutdinova
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
5
|
M A Basher AR, McLaughlin RJ, Hallam SJ. Metabolic pathway inference using multi-label classification with rich pathway features. PLoS Comput Biol 2020; 16:e1008174. [PMID: 33001968 PMCID: PMC7529316 DOI: 10.1371/journal.pcbi.1008174] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 07/21/2020] [Indexed: 12/15/2022] Open
Abstract
Metabolic inference from genomic sequence information is a necessary step in determining the capacity of cells to make a living in the world at different levels of biological organization. A common method for determining the metabolic potential encoded in genomes is to map conceptually translated open reading frames onto a database containing known product descriptions. Such gene-centric methods are limited in their capacity to predict pathway presence or absence and do not support standardized rule sets for automated and reproducible research. Pathway-centric methods based on defined rule sets or machine learning algorithms provide an adjunct or alternative inference method that supports hypothesis generation and testing of metabolic relationships within and between cells. Here, we present mlLGPR, multi-label based on logistic regression for pathway prediction, a software package that uses supervised multi-label classification and rich pathway features to infer metabolic networks in organismal and multi-organismal datasets. We evaluated mlLGPR performance using a corpora of 12 experimental datasets manifesting diverse multi-label properties, including manually curated organismal genomes, synthetic microbial communities and low complexity microbial communities. Resulting performance metrics equaled or exceeded previous reports for organismal genomes and identify specific challenges associated with features engineering and training data for community-level metabolic inference.
Collapse
Affiliation(s)
- Abdur Rahman M A Basher
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, British Columbia, Canada
| | - Ryan J McLaughlin
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, British Columbia, Canada
| | - Steven J Hallam
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, British Columbia, Canada
- Department of Microbiology & Immunology, University of British Columbia, 2552-2350 Health Sciences Mall, Vancouver, British Columbia, Canada
- Genome Science and Technology Program, University of British Columbia, 2329 West Mall, Vancouver, BC, Canada
- Life Sciences Institute, University of British Columbia, Vancouver, British Columbia, Canada
- ECOSCOPE Training Program, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
6
|
Barupal DK, Fan S, Fiehn O. Integrating bioinformatics approaches for a comprehensive interpretation of metabolomics datasets. Curr Opin Biotechnol 2018; 54:1-9. [PMID: 29413745 PMCID: PMC6358024 DOI: 10.1016/j.copbio.2018.01.010] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 01/09/2018] [Accepted: 01/11/2018] [Indexed: 12/28/2022]
Abstract
Access to high quality metabolomics data has become a routine component for biological studies. However, interpreting those datasets in biological contexts remains a challenge, especially because many identified metabolites are not found in biochemical pathway databases. Starting from statistical analyses, a range of new tools are available, including metabolite set enrichment analysis, pathway and network visualization, pathway prediction, biochemical databases and text mining. Integrating these approaches into comprehensive and unbiased interpretations must carefully consider both caveats of the metabolomics dataset itself as well as the structure and properties of the biological study design. Special considerations need to be taken when adopting approaches from genomics for use in metabolomics. R and Python programming language are enabling an easier exchange of diverse tools to deploy integrated workflows. This review summarizes the key ideas and latest developments in regards to these approaches.
Collapse
Affiliation(s)
- Dinesh Kumar Barupal
- NIH West Coast Metabolomics Center, University of California Davis, Davis, CA 95616, United States
| | - Sili Fan
- NIH West Coast Metabolomics Center, University of California Davis, Davis, CA 95616, United States
| | - Oliver Fiehn
- NIH West Coast Metabolomics Center, University of California Davis, Davis, CA 95616, United States; Biochemistry Department, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
7
|
Dangi AK, Sharma B, Hill RT, Shukla P. Bioremediation through microbes: systems biology and metabolic engineering approach. Crit Rev Biotechnol 2018; 39:79-98. [DOI: 10.1080/07388551.2018.1500997] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Arun Kumar Dangi
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Babita Sharma
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Russell T. Hill
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, MD, USA
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| |
Collapse
|
8
|
Sivakumar TV, Bhaduri A, Duvvuru Muni RR, Park JH, Kim TY. SimCAL: a flexible tool to compute biochemical reaction similarity. BMC Bioinformatics 2018; 19:254. [PMID: 29969981 PMCID: PMC6029250 DOI: 10.1186/s12859-018-2248-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 06/14/2018] [Indexed: 11/29/2022] Open
Abstract
Background Computation of reaction similarity is a pre-requisite for several bioinformatics applications including enzyme identification for specific biochemical reactions, enzyme classification and mining for specific inhibitors. Reaction similarity is often assessed at either two levels: (i) comparison across all the constituent substrates and products of a reaction, reaction level similarity, (ii) comparison at the transformation center with various degrees of neighborhood, transformation level similarity. Existing reaction similarity computation tools are designed for specific applications and use different features and similarity measures. A single system integrating these diverse features enables comparison of the impact of different molecular properties on similarity score computation. Results To address these requirements, we present SimCAL, an integrated system to calculate reaction similarity with novel features and capability to perform comparative assessment. SimCAL provides reaction similarity computation at both whole reaction level and transformation level. Novel physicochemical features such as stereochemistry, mass, volume and charge are included in computing reaction fingerprint. Users can choose from four different fingerprint types and nine molecular similarity measures. Further, a comparative assessment of these features is also enabled. The performance of SimCAL is assessed on 3,688,122 reaction pairs with Enzyme Commission (EC) number from MetaCyc and achieved an area under the curve (AUC) of > 0.9. In addition, SimCAL results showed strong correlation with state-of-the-art EC-BLAST and molecular signature based reaction similarity methods. Conclusions SimCAL is developed in java and is available as a standalone tool, with intuitive, user-friendly graphical interface and also as a console application. With its customizable feature selection and similarity calculations, it is expected to cater a wide audience interested in studying and analyzing biochemical reactions and metabolic networks. Electronic supplementary material The online version of this article (10.1186/s12859-018-2248-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Anirban Bhaduri
- Bioinformatics Lab, Samsung Advanced Institute of Technology, Bangalore, 560037, India
| | | | - Jin Hwan Park
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea
| | - Tae Yong Kim
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea.
| |
Collapse
|
9
|
Stroehlein AJ, Young ND, Gasser RB. Advances in kinome research of parasitic worms - implications for fundamental research and applied biotechnological outcomes. Biotechnol Adv 2018; 36:915-934. [PMID: 29477756 DOI: 10.1016/j.biotechadv.2018.02.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/15/2018] [Accepted: 02/21/2018] [Indexed: 12/17/2022]
Abstract
Protein kinases are enzymes that play essential roles in the regulation of many cellular processes. Despite expansions in the fields of genomics, transcriptomics and bioinformatics, there is limited information on the kinase complements (kinomes) of most eukaryotic organisms, including parasitic worms that cause serious diseases of humans and animals. The biological uniqueness of these worms and the draft status of their genomes pose challenges for the identification and classification of protein kinases using established tools. In this article, we provide an account of kinase biology, the roles of kinases in diseases and their importance as drug targets, and drug discovery efforts in key socioeconomically important parasitic worms. In this context, we summarise methods and resources commonly used for the curation, identification, classification and functional annotation of protein kinase sequences from draft genomes; review recent advances made in the characterisation of the worm kinomes; and discuss the implications of these advances for investigating kinase signalling and developing small-molecule inhibitors as new anti-parasitic drugs.
Collapse
Affiliation(s)
- Andreas J Stroehlein
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
10
|
Nolte TM, Ragas AMJ. A review of quantitative structure-property relationships for the fate of ionizable organic chemicals in water matrices and identification of knowledge gaps. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2017; 19:221-246. [PMID: 28296985 DOI: 10.1039/c7em00034k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Many organic chemicals are ionizable by nature. After use and release into the environment, various fate processes determine their concentrations, and hence exposure to aquatic organisms. In the absence of suitable data, such fate processes can be estimated using Quantitative Structure-Property Relationships (QSPRs). In this review we compiled available QSPRs from the open literature and assessed their applicability towards ionizable organic chemicals. Using quantitative and qualitative criteria we selected the 'best' QSPRs for sorption, (a)biotic degradation, and bioconcentration. The results indicate that many suitable QSPRs exist, but some critical knowledge gaps remain. Specifically, future focus should be directed towards the development of QSPR models for biodegradation in wastewater and sediment systems, direct photolysis and reaction with singlet oxygen, as well as additional reactive intermediates. Adequate QSPRs for bioconcentration in fish exist, but more accurate assessments can be achieved using pharmacologically based toxicokinetic (PBTK) models. No adequate QSPRs exist for bioconcentration in non-fish species. Due to the high variability of chemical and biological species as well as environmental conditions in QSPR datasets, accurate predictions for specific systems and inter-dataset conversions are problematic, for which standardization is needed. For all QSPR endpoints, additional data requirements involve supplementing the current chemical space covered and accurately characterizing the test systems used.
Collapse
Affiliation(s)
- Tom M Nolte
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud University Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands.
| | - Ad M J Ragas
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud University Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands.
| |
Collapse
|
11
|
Huang Y, Zhong C, Lin HX, Wang J. A Method for Finding Metabolic Pathways Using Atomic Group Tracking. PLoS One 2017; 12:e0168725. [PMID: 28068354 PMCID: PMC5221824 DOI: 10.1371/journal.pone.0168725] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 12/05/2016] [Indexed: 12/13/2022] Open
Abstract
A fundamental computational problem in metabolic engineering is to find pathways between compounds. Pathfinding methods using atom tracking have been widely used to find biochemically relevant pathways. However, these methods require the user to define the atoms to be tracked. This may lead to failing to predict the pathways that do not conserve the user-defined atoms. In this work, we propose a pathfinding method called AGPathFinder to find biochemically relevant metabolic pathways between two given compounds. In AGPathFinder, we find alternative pathways by tracking the movement of atomic groups through metabolic networks and use combined information of reaction thermodynamics and compound similarity to guide the search towards more feasible pathways and better performance. The experimental results show that atomic group tracking enables our method to find pathways without the need of defining the atoms to be tracked, avoid hub metabolites, and obtain biochemically meaningful pathways. Our results also demonstrate that atomic group tracking, when incorporated with combined information of reaction thermodynamics and compound similarity, improves the quality of the found pathways. In most cases, the average compound inclusion accuracy and reaction inclusion accuracy for the top resulting pathways of our method are around 0.90 and 0.70, respectively, which are better than those of the existing methods. Additionally, AGPathFinder provides the information of thermodynamic feasibility and compound similarity for the resulting pathways.
Collapse
Affiliation(s)
- Yiran Huang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
- School of Computer, Electronics and Information, Guangxi University, Nanning, China
- * E-mail: (YH); (CZ)
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China
- * E-mail: (YH); (CZ)
| | - Hai Xiang Lin
- Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jianyi Wang
- School of Chemistry and Chemical Engineering, Guangxi University, Nanning, China
| |
Collapse
|