1
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024:1-27. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
2
|
Li J, Lardon R, Mangelinckx S, Geelen D. A practical guide to the discovery of biomolecules with biostimulant activity. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:3797-3817. [PMID: 38630561 DOI: 10.1093/jxb/erae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 04/16/2024] [Indexed: 04/19/2024]
Abstract
The growing demand for sustainable solutions in agriculture, which are critical for crop productivity and food quality in the face of climate change and the need to reduce agrochemical usage, has brought biostimulants into the spotlight as valuable tools for regenerative agriculture. With their diverse biological activities, biostimulants can contribute to crop growth, nutrient use efficiency, and abiotic stress resilience, as well as to the restoration of soil health. Biomolecules include humic substances, protein lysates, phenolics, and carbohydrates have undergone thorough investigation because of their demonstrated biostimulant activities. Here, we review the process of the discovery and development of extract-based biostimulants, and propose a practical step-by-step pipeline that starts with initial identification of biomolecules, followed by extraction and isolation, determination of bioactivity, identification of active compound(s), elucidation of mechanisms, formulation, and assessment of effectiveness. The different steps generate a roadmap that aims to expedite the transfer of interdisciplinary knowledge from laboratory-scale studies to pilot-scale production in practical scenarios that are aligned with the prevailing regulatory frameworks.
Collapse
Affiliation(s)
- Jing Li
- HortiCell, Department Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Robin Lardon
- HortiCell, Department Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Sven Mangelinckx
- SynBioC, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Danny Geelen
- HortiCell, Department Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| |
Collapse
|
3
|
Sulaiman MK. Molecular mechanisms and therapeutic potential of natural flavonoids in diabetic nephropathy: Modulation of intracellular developmental signaling pathways. CURRENT RESEARCH IN PHARMACOLOGY AND DRUG DISCOVERY 2024; 7:100194. [PMID: 39071051 PMCID: PMC11276931 DOI: 10.1016/j.crphar.2024.100194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 06/26/2024] [Accepted: 07/02/2024] [Indexed: 07/30/2024] Open
Abstract
Recognized as a common microvascular complication of diabetes mellitus (DM), diabetic nephropathy (DN) is the principal cause of chronic end-stage renal disease (ESRD). Patients with diabetes have an approximately 25% risk of developing progressive renal disease. The underlying principles of DN control targets the dual outcomes of blood glucose regulation through sodium glucose cotransporter 2 (SGLT 2) blockade and hypertension management through renin-angiotensin-aldosterone inhibition. However, these treatments are ineffective in halting disease progression to kidney failure and cardiovascular comorbidities. Recently, the dysregulation of subcellular signaling pathways has been increasingly implicated in DN pathogenesis. Natural compounds are emerging as effective and side-effect-free therapeutic agents that target intracellular pathways. This narrative review synthesizes recent insights into the dysregulation of maintenance pathways in DN, drawing from animal and human studies. To compile this review, articles reporting DN signaling pathways and their treatment with natural flavonoids were collected from PubMed, Cochrane Library Web of Science, Google Scholar and EMBASE databases since 2000. As therapeutic interventions are frequently based on the results of clinical trials, a brief analysis of data from current phase II and III clinical trials on DN is discussed.
Collapse
|
4
|
Comajuncosa-Creus A, Lenes A, Sánchez-Palomino M, Dalton D, Aloy P. Stereochemically-aware bioactivity descriptors for uncharacterized chemical compounds. J Cheminform 2024; 16:70. [PMID: 38890727 PMCID: PMC11186078 DOI: 10.1186/s13321-024-00867-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/05/2024] [Indexed: 06/20/2024] Open
Abstract
Stereochemistry plays a fundamental role in pharmacology. Here, we systematically investigate the relationship between stereoisomerism and bioactivity on over 1 M compounds, finding that a very significant fraction (~ 40%) of spatial isomer pairs show, to some extent, distinct bioactivities. We then use the 3D representation of these molecules to train a collection of deep neural networks (Signaturizers3D) to generate bioactivity descriptors associated to small molecules, that capture their effects at increasing levels of biological complexity (i.e. from protein targets to clinical outcomes). Further, we assess the ability of the descriptors to distinguish between stereoisomers and to recapitulate their different target binding profiles. Overall, we show how these new stereochemically-aware descriptors provide an even more faithful description of complex small molecule bioactivity properties, capturing key differences in the activity of stereoisomers.Scientific contributionWe systematically assess the relationship between stereoisomerism and bioactivity on a large scale, focusing on compound-target binding events, and use our findings to train novel deep learning models to generate stereochemically-aware bioactivity signatures for any compound of interest.
Collapse
Affiliation(s)
- Arnau Comajuncosa-Creus
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Aksel Lenes
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Miguel Sánchez-Palomino
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Dylan Dalton
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
| |
Collapse
|
5
|
Jana T, Sarkar D, Ganguli D, Mukherjee SK, Mandal RS, Das S. ABDpred: Prediction of active antimicrobial compounds using supervised machine learning techniques. Indian J Med Res 2024; 159:78-90. [PMID: 38345040 PMCID: PMC10954100 DOI: 10.4103/ijmr.ijmr_1832_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 03/06/2024] Open
Abstract
BACKGROUND OBJECTIVES Discovery of new antibiotics is the need of the hour to treat infectious diseases. An ever-increasing repertoire of multidrug-resistant pathogens poses an imminent threat to human lives across the globe. However, the low success rate of the existing approaches and technologies for antibiotic discovery remains a major bottleneck. In silico methods like machine learning (ML) deem more promising to meet the above challenges compared with the conventional experimental approaches. The goal of this study was to create ML models that may be used to successfully predict new antimicrobial compounds. METHODS In this article, we employed eight different ML algorithms namely, extreme gradient boosting, random forest, gradient boosting classifier, deep neural network, support vector machine, multilayer perceptron, decision tree, and logistic regression. These models were trained using a dataset comprising 312 antibiotic drugs and a negative set of 936 non-antibiotic drugs in a five-fold cross validation approach. RESULTS The top four ML classifiers (extreme gradient boosting, random forest, gradient boosting classifier and deep neural network) were able to achieve an accuracy of 80 per cent and above during the evaluation of testing and blind datasets. INTERPRETATION CONCLUSIONS We aggregated the top performing four models through a soft-voting technique to develop an ensemble-based ML method and incorporated it into a freely accessible online prediction server named ABDpred ( http://clinicalmedicinessd.com.in/abdpred/ ).
Collapse
Affiliation(s)
- Tanmoy Jana
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Debasree Sarkar
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Debayan Ganguli
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Sandip Kumar Mukherjee
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Rahul Shubhra Mandal
- Department of Cancer Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Santasabuj Das
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
- ICMR-National Institute of Occupational Health, Ahmedabad, India
| |
Collapse
|
6
|
Paykan Heyrati M, Ghorbanali Z, Akbari M, Pishgahi G, Zare-Mirakabad F. BioAct-Het: A Heterogeneous Siamese Neural Network for Bioactivity Prediction Using Novel Bioactivity Representation. ACS OMEGA 2023; 8:44757-44772. [PMID: 38046344 PMCID: PMC10688196 DOI: 10.1021/acsomega.3c05778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/13/2023] [Accepted: 10/24/2023] [Indexed: 12/05/2023]
Abstract
Drug failure during experimental procedures due to low bioactivity presents a significant challenge. To mitigate this risk and enhance compound bioactivities, predicting bioactivity classes during lead optimization is essential. The existing studies on structure-activity relationships have highlighted the connection between the chemical structures of compounds and their bioactivity. However, these studies often overlook the intricate relationship between drugs and bioactivity, which encompasses multiple factors beyond the chemical structure alone. To address this issue, we propose the BioAct-Het model, employing a heterogeneous siamese neural network to model the complex relationship between drugs and bioactivity classes, bringing them into a unified latent space. In particular, we introduce a novel representation for the bioactivity classes, called Bio-Prof, and enhance the original bioactivity data sets to tackle data scarcity. These innovative approaches resulted in our model outperforming the previous ones. The evaluation of BioAct-Het is conducted through three distinct strategies: association-based, bioactivity class-based, and compound-based. The association-based strategy utilizes supervised learning classification, while the bioactivity class-based strategy adopts a retrospective study evaluation approach. On the other hand, the compound-based strategy demonstrates similarities to the concept of meta-learning. Furthermore, the model's effectiveness in addressing real-world problems is analyzed through a case study on the application of vancomycin and oseltamivir for COVID-19 treatment as well as molnupiravir's potential efficacy in treating COVID-19 patients. The data and code underlying this article are available on https://github.com/CBRC-lab/BioAct-Het. However, data sets were derived from sources in the public domain.
Collapse
Affiliation(s)
- Mehdi Paykan Heyrati
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Zahra Ghorbanali
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Mohammad Akbari
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Ghasem Pishgahi
- Students’
Scientific Research Center (SSRC), Tehran
University of Medical Sciences, Tehran 1416753955, Iran
| | - Fatemeh Zare-Mirakabad
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| |
Collapse
|
7
|
Yu Z, Wu Z, Zhou M, Cao K, Li W, Liu G, Tang Y. EDC-Predictor: A Novel Strategy for Prediction of Endocrine-Disrupting Chemicals by Integrating Pharmacological and Toxicological Profiles. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18013-18025. [PMID: 37053516 DOI: 10.1021/acs.est.2c08558] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Identification of endocrine-disrupting chemicals (EDCs) is crucial in the reduction of human health risks. However, it is hard to do so because of the complex mechanisms of the EDCs. In this study, we propose a novel strategy named EDC-Predictor to integrate pharmacological and toxicological profiles for the prediction of EDCs. Different from conventional methods that only focus on a few nuclear receptors (NRs), EDC-Predictor considers more targets. It uses computational target profiles from network-based and machine learning-based methods to characterize compounds, including both EDCs and non-EDCs. The best model constructed by these target profiles outperformed those models by molecular fingerprints. In a case study to predict NR-related EDCs, EDC-Predictor showed a wider applicability domain and higher accuracy than four previous tools. Another case study further demonstrated that EDC-Predictor could predict EDCs targeting other proteins rather than NRs. Finally, a free web server was developed to make EDC prediction easier (http://lmmd.ecust.edu.cn/edcpred/). In summary, EDC-Predictor would be a powerful tool in EDC prediction and drug safety assessment.
Collapse
Affiliation(s)
- Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Moran Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Kangjia Cao
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
8
|
Fan F, Wu G, Yang Y, Liu F, Qian Y, Yu Q, Ren H, Geng J. A Graph Neural Network Model with a Transparent Decision-Making Process Defines the Applicability Domain for Environmental Estrogen Screening. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18236-18245. [PMID: 37749748 DOI: 10.1021/acs.est.3c04571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
The application of deep learning (DL) models for screening environmental estrogens (EEs) for the sound management of chemicals has garnered significant attention. However, the currently available DL model for screening EEs lacks both a transparent decision-making process and effective applicability domain (AD) characterization, making the reliability of its prediction results uncertain and limiting its practical applications. To address this issue, a graph neural network (GNN) model was developed to screen EEs, achieving accuracy rates of 88.9% and 92.5% on the internal and external test sets, respectively. The decision-making process of the GNN model was explored through the network-like similarity graphs (NSGs) based on the model features (FT). We discovered that the accuracy of the predictions is dependent on the feature distribution of compounds in NSGs. An AD characterization method called ADFT was proposed, which excludes predictions falling outside of the model's prediction range, leading to a 15% improvement in the F1 score of the GNN model. The GNN model with the AD method may serve as an efficient tool for screening EEs, identifying 800 potential EEs in the Inventory of Existing Chemical Substances of China. Additionally, this study offers new insights into comprehending the decision-making process of DL models.
Collapse
Affiliation(s)
- Fan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Gang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yining Yang
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Fu Liu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yuli Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Qingmiao Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Jinju Geng
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| |
Collapse
|
9
|
van Heerden A, Turon G, Duran-Frigola M, Pillay N, Birkholtz LM. Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds. ACS OMEGA 2023; 8:43813-43826. [PMID: 38027377 PMCID: PMC10666252 DOI: 10.1021/acsomega.3c05664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/18/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]
Abstract
Efficacy data from diverse chemical libraries, screened against the various stages of the malaria parasite Plasmodium falciparum, including asexual blood stage (ABS) parasites and transmissible gametocytes, serve as a valuable reservoir of information on the chemical space of compounds that are either active (or not) against the parasite. We postulated that this data can be mined to define chemical features associated with the sole ABS activity and/or those that provide additional life cycle activity profiles like gametocytocidal activity. Additionally, this information could provide chemical features associated with inactive compounds, which could eliminate any future unnecessary screening of similar chemical analogs. Therefore, we aimed to use machine learning to identify the chemical space associated with stage-specific antimalarial activity. We collected data from various chemical libraries that were screened against the asexual (126 374 compounds) and sexual (gametocyte) stages of the parasite (93 941 compounds), calculated the compounds' molecular fingerprints, and trained machine learning models to recognize stage-specific active and inactive compounds. We were able to build several models that predict compound activity against ABS and dual activity against ABS and gametocytes, with Support Vector Machines (SVM) showing superior abilities with high recall (90 and 66%) and low false-positive predictions (15 and 1%). This allowed the identification of chemical features enriched in active and inactive populations, an important outcome that could be mined for essential chemical features to streamline hit-to-lead optimization strategies of antimalarial candidates. The predictive capabilities of the models held true in diverse chemical spaces, indicating that the ML models are therefore robust and can serve as a prioritization tool to drive and guide phenotypic screening and medicinal chemistry programs.
Collapse
Affiliation(s)
- Ashleigh van Heerden
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Gemma Turon
- Ersilia
Open Source Initiative, 28 Belgrave Road, Cambridge CB1 3DE, U.K.
| | | | - Nelishia Pillay
- Department
of Computer Science, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Lyn-Marié Birkholtz
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
10
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
11
|
von Hellfeld R, Gade C, Vargesson N, Hastings A. Considerations for future quantitative structure-activity relationship (QSAR) modelling for heavy metals - A case study of mercury. Toxicology 2023; 499:153661. [PMID: 37924932 DOI: 10.1016/j.tox.2023.153661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/16/2023] [Accepted: 10/28/2023] [Indexed: 11/06/2023]
Abstract
With increasing annual chemical development and production, safety testing demands and requirements have also increased. In addition to traditional animal testing, quantitative structure-activity relationship (QSAR) modelling can be used to predict the biological effect of a chemical structure, based on the analysis of quantitative characteristics of structure features. Whilst suitable for e.g., pharmaceuticals, other compounds can be more challenging to model. The naturally occurring heavy metal mercury speciates in the environment, with some toxic species accumulating in aquatic organisms. Although this is well known, only little data is available from (eco)toxicological studies, none of which account for this speciation behaviour. The present work highlights the current toxicity data for mercury in aquatic animals and gaps in our understanding and data for future QSAR modelling. All publicly available ecotoxicology data was obtained from databases and literature. Only few studies could be determined that assessed mercury toxicity in aquatic species. Of these, likely speciation products were determined using PHREEQc. This highlighted that the mercury exposure species was not always the predominant species in the medium. Finally, the descriptors for the modelled species were obtained from ChemDes, highlighting the limited availability of such details. Additional testing is required, accounting for speciation and biological interactions, to successfully determine the toxicity profile of different mercury species in aquatic environments. In the present work, insufficient mercury-species specific data was obtained, to conduct QSAR modelling successfully. This highlights a significant lack of data, for a heavy metal with potentially fatal repercussions.
Collapse
Affiliation(s)
- Rebecca von Hellfeld
- School of Biological Sciences, University of Aberdeen, Aberdeen, Scotland, United Kingdom; National Decommissioning Centre, Aberdeen, Scotland, United Kingdom.
| | - Christoph Gade
- School of Biological Sciences, University of Aberdeen, Aberdeen, Scotland, United Kingdom; National Decommissioning Centre, Aberdeen, Scotland, United Kingdom
| | - Neil Vargesson
- School of Medicine, Medical Sciences and Nutrition, Institute of Medical Sciences, University of Aberdeen, Aberdeen, Scotland, United Kingdom
| | - Astley Hastings
- School of Biological Sciences, University of Aberdeen, Aberdeen, Scotland, United Kingdom; National Decommissioning Centre, Aberdeen, Scotland, United Kingdom
| |
Collapse
|
12
|
Turon G, Hlozek J, Woodland JG, Kumar A, Chibale K, Duran-Frigola M. First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa. Nat Commun 2023; 14:5736. [PMID: 37714843 PMCID: PMC10504240 DOI: 10.1038/s41467-023-41512-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/06/2023] [Indexed: 09/17/2023] Open
Abstract
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. We present ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool for quantitative structure-activity/property relationship (QSAR/QSPR) modelling. ZairaChem is fully automated, requires low computational resources and works across a broad spectrum of datasets. We describe an end-to-end implementation at the H3D Centre, the leading integrated drug discovery unit in Africa, at which no prior AI/ML capabilities were available. By leveraging in-house data collected over a decade, we have developed a virtual screening cascade for malaria and tuberculosis drug discovery comprising 15 models for key decision-making assays ranging from whole-cell phenotypic screening and cytotoxicity to aqueous solubility, permeability, microsomal metabolic stability, cytochrome inhibition, and cardiotoxicity. We show how computational profiling of compounds, prior to synthesis and testing, can inform progression of frontrunner compounds at H3D. This project is a first-of-its-kind deployment at scale of AI/ML tools in a research centre operating in a low-resource setting.
Collapse
Affiliation(s)
- Gemma Turon
- Ersilia Open Source Initiative, Cambridge, UK
| | - Jason Hlozek
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
| | - John G Woodland
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ankur Kumar
- Ersilia Open Source Initiative, Cambridge, UK
| | - Kelly Chibale
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa.
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa.
| | | |
Collapse
|
13
|
Wang B, Guo J, Liu X, Yu Y, Wu J, Wang Y. Prediction of the effects of small molecules on the gut microbiome using machine learning method integrating with optimal molecular features. BMC Bioinformatics 2023; 24:338. [PMID: 37697256 PMCID: PMC10496404 DOI: 10.1186/s12859-023-05455-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/25/2023] [Indexed: 09/13/2023] Open
Abstract
BACKGROUND The human gut microbiome (HGM), consisting of trillions of microorganisms, is crucial to human health. Adverse drug use is one of the most important causes of HGM disorder. Thus, it is necessary to identify drugs or compounds with anti-commensal effects on HGM in the early drug discovery stage. This study proposes a novel anti-commensal effects classification using a machine learning method and optimal molecular features. To improve the prediction performance, we explored combinations of six fingerprints and three descriptors to filter the best characterization as molecular features. RESULTS The final consensus model based on optimal features yielded the F1-score of 0.725 ± 0.014, ACC of 82.9 ± 0.7%, and AUC of 0.791 ± 0.009 for five-fold cross-validation. In addition, this novel model outperformed the prior studies by using the same algorithm. Furthermore, the important chemical descriptors and misclassified anti-commensal compounds are analyzed to better understand and interpret the model. Finally, seven structural alerts responsible for the chemical anti-commensal effect are identified, implying valuable information for drug design. CONCLUSION Our study would be a promising tool for screening anti-commensal compounds in the early stage of drug discovery and assessing the potential risks of these drugs in vivo.
Collapse
Affiliation(s)
- Binyou Wang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China
- School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
| | - Jianmin Guo
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China
| | - Xiaofeng Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China
| | - Yang Yu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China
- Key Laboratory of Medical Electrophysiology, Ministry of Education and Medical Electrophysiological Key Laboratory of Sichuan Province, Institute of Cardiovascular Research, Southwest Medical University, Luzhou, 646000, China
| | - Jianming Wu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China.
- School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Key Laboratory of Medical Electrophysiology, Ministry of Education and Medical Electrophysiological Key Laboratory of Sichuan Province, Institute of Cardiovascular Research, Southwest Medical University, Luzhou, 646000, China.
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
| | - Yiwei Wang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China.
- School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Key Laboratory of Medical Electrophysiology, Ministry of Education and Medical Electrophysiological Key Laboratory of Sichuan Province, Institute of Cardiovascular Research, Southwest Medical University, Luzhou, 646000, China.
| |
Collapse
|
14
|
Béquignon OM, Gómez-Tamayo JC, Lenselink EB, Wink S, Hiemstra S, Lam CC, Gadaleta D, Roncaglioni A, Norinder U, Water BVD, Pastor M, van Westen GJP. Collaborative SAR Modeling and Prospective In Vitro Validation of Oxidative Stress Activation in Human HepG2 Cells. J Chem Inf Model 2023; 63:5433-5445. [PMID: 37616385 PMCID: PMC10498489 DOI: 10.1021/acs.jcim.3c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Indexed: 08/26/2023]
Abstract
Oxidative stress is the consequence of an abnormal increase of reactive oxygen species (ROS). ROS are generated mainly during the metabolism in both normal and pathological conditions as well as from exposure to xenobiotics. Xenobiotics can, on the one hand, disrupt molecular machinery involved in redox processes and, on the other hand, reduce the effectiveness of the antioxidant activity. Such dysregulation may lead to oxidative damage when combined with oxidative stress overpassing the cell capacity to detoxify ROS. In this work, a green fluorescent protein (GFP)-tagged nuclear factor erythroid 2-related factor 2 (NRF2)-regulated sulfiredoxin reporter (Srxn1-GFP) was used to measure the antioxidant response of HepG2 cells to a large series of drug and drug-like compounds (2230 compounds). These compounds were then classified as positive or negative depending on cellular response and distributed among different modeling groups to establish structure-activity relationship (SAR) models. A selection of models was used to prospectively predict oxidative stress induced by a new set of compounds subsequently experimentally tested to validate the model predictions. Altogether, this exercise exemplifies the different challenges of developing SAR models of a phenotypic cellular readout, model combination, chemical space selection, and results interpretation.
Collapse
Affiliation(s)
- Olivier
J. M. Béquignon
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Jose C. Gómez-Tamayo
- Research
Programme on Biomedical Informatics (GRIB), Department of Medicine
and Life Sciences, Hospital del Mar Medical Research Institute, Universitat Pompeu Fabra, Carrer del Dr. Aiguader 88, 08002 Barcelona, Spain
| | - Eelke B. Lenselink
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Steven Wink
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Steven Hiemstra
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Chi Chung Lam
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Domenico Gadaleta
- Laboratory
of Environmental Chemistry and Toxicology, Department of Environmental
Health Sciences, IRCCS—Istituto di
Ricerche Farmacologiche Mario Negri, Via la Masa 19, 20156 Milano, Italy
| | - Alessandra Roncaglioni
- Laboratory
of Environmental Chemistry and Toxicology, Department of Environmental
Health Sciences, IRCCS—Istituto di
Ricerche Farmacologiche Mario Negri, Via la Masa 19, 20156 Milano, Italy
| | - Ulf Norinder
- MTM
Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - Bob van de Water
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Manuel Pastor
- Research
Programme on Biomedical Informatics (GRIB), Department of Medicine
and Life Sciences, Hospital del Mar Medical Research Institute, Universitat Pompeu Fabra, Carrer del Dr. Aiguader 88, 08002 Barcelona, Spain
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| |
Collapse
|
15
|
Mushtaq M, Usmani S, Jabeen A, Nur-E-Alam M, Ahmed S, Ahmad A, Ul-Haq Z. Identification of potent anti-immunogenic agents through virtual screening, 3D-QSAR studies, and in vitro experiments. Mol Divers 2023:10.1007/s11030-023-10709-4. [PMID: 37550601 DOI: 10.1007/s11030-023-10709-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 07/30/2023] [Indexed: 08/09/2023]
Abstract
A wealth of literature has highlighted the discovery of various immune modulators, frequently used in clinical practice, yet associated with numerous drawbacks. In light of this pharmacological deficiency, medical scientists are motivated to develop new immune modulators with minimized adverse effects yet retaining the improved therapeutic potential. T-cell differentiation and growth are central to human defense and are regulated by interleukin-2 (IL-2), an immune-modulatory cytokine. However, scientific investigation is hindered due to its flat binding site and widespread hotspot residues. In this regard, a prompt and logical investigation guided by integrated computational techniques was undertaken to unravel new and potential leads against IL-2. In particular, the combination of score-based and pharmacophore-based virtual screening approaches were employed, reducing the data from millions of small molecules to a manageable number. Subsequent docking and 3D-QSAR prediction via CoMFA further helped remove false positives from the data. The reliability of the model was assessed via standard metrics, which explain the model's fitness and the robustness of the model in predicting the activity of new compounds. The extensive virtual screening herein led to the identification of a total of 24 leads with potential anti-IL-2 activity. Furthermore, the theoretical findings were corroborated with in vitro testing, further endorsing the anti-inflammatory potential of the identified leads.
Collapse
Affiliation(s)
- Mamona Mushtaq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, ICCBS,, University of Karachi, Karachi, 75270, Pakistan
| | - Saman Usmani
- Dr. Panjwani Center for Molecular Medicine and Drug Research, ICCBS,, University of Karachi, Karachi, 75270, Pakistan
| | - Almas Jabeen
- Dr. Panjwani Center for Molecular Medicine and Drug Research, ICCBS,, University of Karachi, Karachi, 75270, Pakistan
| | - Mohammad Nur-E-Alam
- Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. Box. 2457, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Sarfaraz Ahmed
- Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. Box. 2457, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Aftab Ahmad
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, 92618, USA
| | - Zaheer Ul-Haq
- Dr. Panjwani Center for Molecular Medicine and Drug Research, ICCBS,, University of Karachi, Karachi, 75270, Pakistan.
| |
Collapse
|
16
|
Mittal A, Ahuja G. Advancing chemical carcinogenicity prediction modeling: opportunities and challenges. Trends Pharmacol Sci 2023; 44:400-410. [PMID: 37183054 DOI: 10.1016/j.tips.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/18/2023] [Indexed: 05/16/2023]
Abstract
Carcinogenicity assessment of any compound is a laborious and expensive exercise with several associated ethical and practical concerns. While artificial intelligence (AI) offers promising solutions, unfortunately, it is contingent on several challenges concerning the inadequacy of available experimentally validated (non)carcinogen datasets and variabilities within bioassays, which contribute to the compromised model training. Existing AI solutions that leverage classical chemistry-driven descriptors do not provide adequate biological interpretability involved in imparting carcinogenicity. This highlights the urgency to devise alternative AI strategies. We propose multiple strategies, including implementing data-driven (integrated databases) and known carcinogen-characteristic-derived features to overcome these apparent shortcomings. In summary, these next-generation approaches will continue facilitating robust chemical carcinogenicity prediction, concomitant with deeper mechanistic insights.
Collapse
Affiliation(s)
- Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| |
Collapse
|
17
|
Xu JY, Wang K, Men SH, Yang Y, Zhou Q, Yan ZG. QSAR-QSIIR-based prediction of bioconcentration factor using machine learning and preliminary application. ENVIRONMENT INTERNATIONAL 2023; 177:108003. [PMID: 37276762 DOI: 10.1016/j.envint.2023.108003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/25/2023] [Accepted: 05/29/2023] [Indexed: 06/07/2023]
Abstract
Bioconcentration factor (BCF) is one of the important parameters for developing human health ambient water quality criteria (HHAWQC) for chemical pollutants. Traditional experimental method to obtain BCF is time-consuming and costly. Therefore, prediction of BCF by modeling has attracted much attention. QSAR (Quantitative Structure-Activity Relationship) model based on molecular descriptor is often used to predict BCF, however, in order to improve the accuracy of prediction, previous models are only applicable for prediction for a single category of substance and a single species, and cannot meet the needs of BCF prediction of pollutants lacing toxicity data. In this study, optimized 17 traditional molecular descriptor and five kinds of bioactivity descriptor were selected from more than 200 molecular descriptor and 25 kinds of biological activity descriptors. A QSAR-QSIIR (Quantitative Structure In vitro-In vivo Relationship) model suitable for multiple chemical substances and whole species is constructed by using optimized 4-MLP machine learning algorithm with selected molecular and bioactivity descriptors. The constructed model significantly improves the prediction accuracy of BCF. The R2 of verification set and test set are 0.8575 and 0.7924, respectively, and the difference between predicted BCF and measured BCF is mostly less than 1.5 times. Then, BCF of BTEX in Chinese common aquatic products is predicted using the constructed QSAR-QSIIR model, and the HHAWQC of BTEX in China are derived using the predicted BCF, which provides a valuable reference for establishment of China's BTEX water quality standards.
Collapse
Affiliation(s)
- Jia-Yun Xu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Kun Wang
- National Engineering Laboratory for Lake Pollution Control and Ecological Restoration, State Environment Protection Key Laboratory for Lake Pollution Control, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Shu-Hui Men
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Yang Yang
- China Energy Longyuan Environmental Protection Co.,Ltd., Beijing 100039, China
| | - Quan Zhou
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Zhen-Guang Yan
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China.
| |
Collapse
|
18
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
19
|
An S, Hwang SY, Gong J, Ahn S, Park IG, Oh S, Chin YW, Noh M. Computational Prediction of the Phenotypic Effect of Flavonoids on Adiponectin Biosynthesis. J Chem Inf Model 2023; 63:856-869. [PMID: 36716271 DOI: 10.1021/acs.jcim.3c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In silico machine learning applications for phenotype-based screening have primarily been limited due to the lack of machine-readable data related to disease phenotypes. Adiponectin, a nuclear receptor (NR)-regulated adipocytokine, is relatively downregulated in human metabolic diseases. Here, we present a machine-learning model to predict the adiponectin-secretion-promoting activity of flavonoid-associated phytochemicals (FAPs). We modeled a structure-activity relationship between the chemical similarity of FAPs and their bioactivities using a random forest-based classifier, which provided the NR activity of each FAP as a probability. To link the classifier-predicted NR activity to the phenotype, we next designed a single-cell transcriptomics-based multiple linear regression model to generate the relative adiponectin score (RAS) of FAPs. In experimental validation, estimated RAS values of FAPs isolated from Scutellaria baicalensis exhibited a significant correlation with their adiponectin-secretion-promoting activity. The combined cheminformatics and bioinformatics approach enables the computational reconstruction of phenotype-based screening systems.
Collapse
Affiliation(s)
- Seungchan An
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Seok Young Hwang
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Junpyo Gong
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Sungjin Ahn
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - In Guk Park
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Soyeon Oh
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Young-Won Chin
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| | - Minsoo Noh
- Natural Products Research Institute, College of Pharmacy, Seoul National University, Seoul08826, Republic of Korea
| |
Collapse
|
20
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom,
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,
| |
Collapse
|
21
|
Yang H, Obrezanova O, Pointon A, Stebbeds W, Francis J, Beattie KA, Clements P, Harvey JS, Smith GF, Bender A. Prediction of inotropic effect based on calcium transients in human iPSC-derived cardiomyocytes and machine learning. Toxicol Appl Pharmacol 2023; 459:116342. [PMID: 36502871 DOI: 10.1016/j.taap.2022.116342] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/23/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022]
Abstract
Functional changes to cardiomyocytes are undesirable during drug discovery and identifying the inotropic effects of compounds is hence necessary to decrease the risk of cardiovascular adverse effects in the clinic. Recently, approaches leveraging calcium transients in human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) have been developed to detect contractility changes, induced by a variety of mechanisms early during drug discovery projects. Although these approaches have been able to provide some predictive ability, we hypothesised that using additional waveform parameters could offer improved insights, as well as predictivity. In this study, we derived 25 parameters from each calcium transient waveform and developed a modified Random Forest method to predict the inotropic effects of the compounds. In total annotated data for 48 compounds were available for modelling, out of which 31 were inotropes. The results show that the Random Forest model with a modified purity criterion performed slightly better than an unmodified algorithm in terms of the Area Under the Curve, giving values of 0.84 vs 0.81 in a cross-validation, and outperformed the ToxCast Pipeline model, for which the highest value was 0.76 when using the best-performing parameter, PW10. Our study hence demonstrates that more advanced parameters derived from waveforms, in combination with additional machine learning methods, provide improved predictivity of cardiovascular risk associated with inotropic effects.
Collapse
Affiliation(s)
- Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, UK
| | - Olga Obrezanova
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Amy Pointon
- Functional and Mechanistic Safety, Clinical Pharmacology & Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Will Stebbeds
- Screening Profiling and Mechanistic Biology, Medicinal Science and Technology, GlaxoSmithKline, Stevenage, UK
| | - Jo Francis
- Mechanistic & Structural Biology, AstraZeneca, Cambridge, UK
| | - Kylie A Beattie
- Target and Systems Safety, Non-Clinical Safety, In Vivo/In Vitro Translation, GlaxoSmithKline, Ware, UK
| | - Peter Clements
- Pathology UK, Non-Clinical Safety, In Vivo/In Vitro Translation, GlaxoSmithKline, Ware, UK
| | - James S Harvey
- Target and Systems Safety, Non-Clinical Safety, In Vivo/In Vitro Translation, GlaxoSmithKline, Ware, UK
| | - Graham F Smith
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, UK.
| |
Collapse
|
22
|
Using chemical and biological data to predict drug toxicity. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:53-64. [PMID: 36639032 DOI: 10.1016/j.slasd.2022.12.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 12/19/2022] [Accepted: 12/31/2022] [Indexed: 01/12/2023]
Abstract
Various sources of information can be used to better understand and predict compound activity and safety-related endpoints, including biological data such as gene expression and cell morphology. In this review, we first introduce types of chemical, in vitro and in vivo information that can be used to describe compounds and adverse effects. We then explore how compound descriptors based on chemical structure or biological perturbation response can be used to predict safety-related endpoints, and how especially biological data can help us to better understand adverse effects mechanistically. Overall, the described applications demonstrate how large-scale biological information presents new opportunities to anticipate and understand the biological effects of compounds, and how this can support predictive toxicology and drug discovery projects.
Collapse
|
23
|
Zhao H, Yang Y, Wang S, Yang X, Zhou K, Xu C, Zhang X, Fan J, Hou D, Li X, Lin H, Tan Y, Wang S, Chu XY, Zhuoma D, Zhang F, Ju D, Zeng X, Chen YZ. NPASS database update 2023: quantitative natural product activity and species source database for biomedical research. Nucleic Acids Res 2022; 51:D621-D628. [PMID: 36624664 PMCID: PMC9825494 DOI: 10.1093/nar/gkac1069] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/16/2022] [Accepted: 10/26/2022] [Indexed: 11/30/2022] Open
Abstract
Quantitative activity and species source data of natural products (NPs) are important for drug discovery, medicinal plant research, and microbial investigations. Activity values of NPs against specific targets are useful for discovering targeted therapeutic agents and investigating the mechanism of medicinal plants. Composition/concentration values of NPs in individual species facilitate the assessments and investigations of the therapeutic quality of herbs and phenotypes of microbes. Here, we describe an update of the NPASS natural product activity and species source database previously featured in NAR. This update includes: (i) new data of ∼95 000 records of the composition/concentration values of ∼1 490 NPs/NP clusters in ∼390 species, (ii) extended data of activity values of ∼43 200 NPs against ∼7 700 targets (∼40% and ∼32% increase, respectively), (iii) extended data of ∼31 600 species sources of ∼94 400 NPs (∼26% and ∼32% increase, respectively), (iv) new species types of ∼440 co-cultured microbes and ∼420 engineered microbes, (v) new data of ∼66 600 NPs without experimental activity values but with estimated activity profiles from the established chemical similarity tool Chemical Checker, (vi) new data of the computed drug-likeness properties and the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties for all NPs. NPASS update version is freely accessible at http://bidd.group/NPASS.
Collapse
Affiliation(s)
| | | | | | | | - Kaicheng Zhou
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Caili Xu
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Xuyao Zhang
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Jiajun Fan
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Dongyue Hou
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Xingxiu Li
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Hanbo Lin
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai 201203, China
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics & Key Laboratory of Chemical Biology, Tsinghua University Shenzhen Graduate School, Shenzhen Kivita Innovative Drug Discovery Institute, China
| | - Shanshan Wang
- Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Xin-Yi Chu
- Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | | | - Fengying Zhang
- Key Lab of Agricultural Products Processing and Quality Control of Nanchang City, Jiangxi Agricultural University, Nanchang 330045, China
| | - Dianwen Ju
- Correspondence may also be addressed to Dianwen Ju. Tel: +86 51980037;
| | - Xian Zeng
- Correspondence may also be addressed to Xian Zeng. Tel: +86 51980035;
| | - Yu Zong Chen
- To whom correspondence should be addressed. Tel: +86 755 26032094;
| |
Collapse
|
24
|
Li Q, Zhang X, Wu L, Bo X, He S, Wang S. PLA-MoRe: A Protein-Ligand Binding Affinity Prediction Model via Comprehensive Molecular Representations. J Chem Inf Model 2022; 62:4380-4390. [PMID: 36054653 DOI: 10.1021/acs.jcim.2c00960] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurately predicting the binding affinity of protein-ligand pairs is an essential part of drug discovery. Since wet laboratory experiments to determine the binding affinity are expensive and time-consuming, several computational methods for binding affinity prediction have been proposed. In the representation of compounds, most methods only focus on the structural properties such as SMILES and ignore the bioactive properties. In this study, we proposed a novel model named PLA-MoRe to predict protein-ligand binding affinity, which represents compounds based on both structural and bioactive properties and mainly contains three feature extractors. First, a structure feature extractor based on the graph isomorphism network was constructed to learn the representations of the molecular graphs. Second, we designed an Autoencoder-based bioactive feature extractor to integrate the multisource bioactive information including chemical, target, network, cellular, and clinical. The above two parts aimed to learn representations of compounds in terms of structures and bioactivities, respectively. Then, we constructed a sequence feature extractor to learn embeddings for protein sequences. The output of the three extractors was concatenated and fed into a fully connected network for affinity prediction. We compared PLA-MoRe with three state-of-the-art methods, and an ablation study was conducted to test the role of each part of the model. Further attention visualization showed that our model had the potential to locate the binding sites, which might help explain the mechanism of interaction. These results prove that PLA-MoRe is competitive and reliable. The resource codes are freely available at the GitHub repository https://github.com/QingyuLiaib/PLA-MoRe.
Collapse
Affiliation(s)
- Qingyu Li
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| | - Xiaochang Zhang
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China.,Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Shengqi Wang
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| |
Collapse
|
25
|
Mittal A, Mohanty SK, Gautam V, Arora S, Saproo S, Gupta R, Sivakumar R, Garg P, Aggarwal A, Raghavachary P, Dixit NK, Singh VP, Mehta A, Tayal J, Naidu S, Sengupta D, Ahuja G. Artificial intelligence uncovers carcinogenic human metabolites. Nat Chem Biol 2022; 18:1204-1213. [PMID: 35953549 DOI: 10.1038/s41589-022-01110-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 07/07/2022] [Indexed: 12/14/2022]
Abstract
The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Despite the availability of innate DNA damage responses, some genomic lesions trigger malignant transformation of cells. Accurate prediction of carcinogens is an ever-challenging task owing to the limited information about bona fide (non-)carcinogens. We developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. Concomitant with the carcinogenicity prediction, Metabokiller is fully interpretable and outperforms existing best-practice methods for carcinogenicity prediction. Metabokiller unraveled potential carcinogenic human metabolites. To cross-validate Metabokiller predictions, we performed multiple functional assays using Saccharomyces cerevisiae and human cells with two Metabokiller-flagged human metabolites, namely 4-nitrocatechol and 3,4-dihydroxyphenylacetic acid, and observed high synergy between Metabokiller predictions and experimental validations.
Collapse
Affiliation(s)
- Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Sheetanshu Saproo
- Department of Bio-Medical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
| | - Ria Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Roshan Sivakumar
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Prakriti Garg
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Anmol Aggarwal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Padmasini Raghavachary
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Nilesh Kumar Dixit
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
| | - Vijay Pal Singh
- CSIR-Institute of Genomics & Integrative Biology, New Delhi, Delhi, India
| | - Anurag Mehta
- Rajiv Gandhi Cancer Institute & Research Centre, New Delhi, Delhi, India
| | - Juhi Tayal
- Rajiv Gandhi Cancer Institute & Research Centre, New Delhi, Delhi, India
| | - Srivatsava Naidu
- Department of Bio-Medical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India.
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India.
| |
Collapse
|
26
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People's Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People's Republic of China.
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China.
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People's Republic of China.
| |
Collapse
|
27
|
Katritsis NM, Liu A, Youssef G, Rathee S, MacMahon M, Hwang W, Wollman L, Han N. dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature. Front Genet 2022; 13:894209. [PMID: 36017500 PMCID: PMC9395939 DOI: 10.3389/fgene.2022.894209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/17/2022] [Indexed: 11/13/2022] Open
Abstract
Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or “positive”) literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work–contributing to the “Literature AI for DILI Challenge” of the Critical Assessment of Massive Data Analysis (CAMDA) 2021– presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine’s (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F1-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F1-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature.
Collapse
Affiliation(s)
- Nicholas M. Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Méabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Centre for Therapeutics Discovery, LifeArc, Stevenage, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| |
Collapse
|
28
|
Gautam V, Gupta R, Gupta D, Ruhela A, Mittal A, Mohanty SK, Arora S, Gupta R, Saini C, Sengupta D, Murugan NA, Ahuja G. deepGraphh: AI-driven web service for graph-based quantitative structure-activity relationship analysis. Brief Bioinform 2022; 23:6648791. [PMID: 35868454 DOI: 10.1093/bib/bbac288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/01/2022] [Accepted: 06/23/2022] [Indexed: 11/12/2022] Open
Abstract
Artificial intelligence (AI)-based computational techniques allow rapid exploration of the chemical space. However, representation of the compounds into computational-compatible and detailed features is one of the crucial steps for quantitative structure-activity relationship (QSAR) analysis. Recently, graph-based methods are emerging as a powerful alternative to chemistry-restricted fingerprints or descriptors for modeling. Although graph-based modeling offers multiple advantages, its implementation demands in-depth domain knowledge and programming skills. Here we introduce deepGraphh, an end-to-end web service featuring a conglomerate of established graph-based methods for model generation for classification or regression tasks. The graphical user interface of deepGraphh supports highly configurable parameter support for model parameter tuning, model generation, cross-validation and testing of the user-supplied query molecules. deepGraphh supports four widely adopted methods for QSAR analysis, namely, graph convolution network, graph attention network, directed acyclic graph and Attentive FP. Comparative analysis revealed that deepGraphh supported methods are comparable to the descriptors-based machine learning techniques. Finally, we used deepGraphh models to predict the blood-brain barrier permeability of human and microbiome-generated metabolites. In summary, deepGraphh offers a one-stop web service for graph-based methods for chemoinformatics.
Collapse
Affiliation(s)
- Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Rahul Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Deepti Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Anubhav Ruhela
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Ria Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Chandan Saini
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, New Delhi, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| |
Collapse
|
29
|
Blay V, Radivojevic T, Allen JE, Hudson CM, Garcia Martin H. MACAW: An Accessible Tool for Molecular Embedding and Inverse Molecular Design. J Chem Inf Model 2022; 62:3551-3564. [PMID: 35857932 PMCID: PMC9364320 DOI: 10.1021/acs.jcim.2c00229] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
![]()
The growing capabilities of synthetic biology and organic
chemistry
demand tools to guide syntheses toward useful molecules. Here, we
present Molecular AutoenCoding Auto-Workaround (MACAW), a tool that
uses a novel approach to generate molecules predicted to meet a desired
property specification (e.g., a binding affinity of 50 nM or an octane
number of 90). MACAW describes molecules by embedding them into a
smooth multidimensional numerical space, avoiding uninformative dimensions
that previous methods often introduce. The coordinates in this embedding
provide a natural choice of features for accurately predicting molecular
properties, which we demonstrate with examples for cetane and octane
numbers, flash points, and histamine H1 receptor binding affinity.
The approach is computationally efficient and well-suited to the small-
and medium-size datasets commonly used in biosciences. We showcase
the utility of MACAW for virtual screening by identifying molecules
with high predicted binding affinity to the histamine H1 receptor
and limited affinity to the muscarinic M2 receptor, which are targets
of medicinal relevance. Combining these predictive capabilities with
a novel generative algorithm for molecules allows us to recommend
molecules with a desired property value (i.e., inverse molecular design).
We demonstrate this capability by recommending molecules with predicted
octane numbers of 40, 80, and 120, which is an important characteristic
of biofuels. Thus, MACAW augments classical retrosynthesis tools by
providing recommendations for molecules on specification.
Collapse
Affiliation(s)
- Vincent Blay
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Biofuels and Bioproducts Division, DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
| | - Tijana Radivojevic
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Biofuels and Bioproducts Division, DOE Joint BioEnergy Institute, Emeryville, California 94608, United States.,DOE Agile BioFoundry, Emeryville, California 94608, United States
| | - Jonathan E Allen
- Global Security Computing Applications, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Corey M Hudson
- Sandia National Laboratories, Livermore, California 94550, United States
| | - Hector Garcia Martin
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Biofuels and Bioproducts Division, DOE Joint BioEnergy Institute, Emeryville, California 94608, United States.,DOE Agile BioFoundry, Emeryville, California 94608, United States
| |
Collapse
|
30
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
31
|
Metabolite-based biosensors for natural product discovery and overproduction. Curr Opin Biotechnol 2022; 75:102699. [DOI: 10.1016/j.copbio.2022.102699] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/25/2022] [Accepted: 02/05/2022] [Indexed: 12/22/2022]
|
32
|
Song Z, Trozzi F, Tian H, Yin C, Tao P. Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways. ACS PHYSICAL CHEMISTRY AU 2022; 2:316-330. [PMID: 35936506 PMCID: PMC9344433 DOI: 10.1021/acsphyschemau.2c00005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.
Collapse
|
33
|
Drug-Induced Immune Thrombocytopenia Toxicity Prediction Based on Machine Learning. Pharmaceutics 2022; 14:pharmaceutics14050943. [PMID: 35631529 PMCID: PMC9143325 DOI: 10.3390/pharmaceutics14050943] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 04/20/2022] [Accepted: 04/22/2022] [Indexed: 11/29/2022] Open
Abstract
Drug-induced immune thrombocytopenia (DITP) often occurs in patients receiving many drug treatments simultaneously. However, clinicians usually fail to accurately distinguish which drugs can be plausible culprits. Despite significant advances in laboratory-based DITP testing, in vitro experimental assays have been expensive and, in certain cases, cannot provide a timely diagnosis to patients. To address these shortcomings, this paper proposes an efficient machine learning-based method for DITP toxicity prediction. A small dataset consisting of 225 molecules was constructed. The molecules were represented by six fingerprints, three descriptors, and their combinations. Seven classical machine learning-based models were examined to determine an optimal model. The results show that the RDMD + PubChem-k-NN model provides the best prediction performance among all the models, achieving an area under the curve of 76.9% and overall accuracy of 75.6% on the external validation set. The application domain (AD) analysis demonstrates the prediction reliability of the RDMD + PubChem-k-NN model. Five structural fragments related to the DITP toxicity are identified through information gain (IG) method along with fragment frequency analysis. Overall, as far as known, it is the first machine learning-based classification model for recognizing chemicals with DITP toxicity and can be used as an efficient tool in drug design and clinical therapy.
Collapse
|
34
|
|
35
|
Ortea I. Foodomics in health: advanced techniques for studying the bioactive role of foods. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
36
|
Fernández-Torras A, Comajuncosa-Creus A, Duran-Frigola M, Aloy P. Connecting chemistry and biology through molecular descriptors. Curr Opin Chem Biol 2021; 66:102090. [PMID: 34626922 DOI: 10.1016/j.cbpa.2021.09.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 08/23/2021] [Accepted: 09/03/2021] [Indexed: 01/14/2023]
Abstract
Through the representation of small molecule structures as numerical descriptors and the exploitation of the similarity principle, chemoinformatics has made paramount contributions to drug discovery, from unveiling mechanisms of action and repurposing approved drugs to de novo crafting of molecules with desired properties and tailored targets. Yet, the inherent complexity of biological systems has fostered the implementation of large-scale experimental screenings seeking a deeper understanding of the targeted proteins, the disrupted biological processes and the systemic responses of cells to chemical perturbations. After this wealth of data, a new generation of data-driven descriptors has arisen providing a rich portrait of small molecule characteristics that goes beyond chemical properties. Here, we give an overview of biologically relevant descriptors, covering chemical compounds, proteins and other biological entities, such as diseases and cell lines, while aligning them to the major contributions in the field from disciplines, such as natural language processing or computer vision. We now envision a new scenario for chemical and biological entities where they both are translated into a common numerical format. In this computational framework, complex connections between entities can be unveiled by means of simple arithmetic operations, such as distance measures, additions, and subtractions.
Collapse
Affiliation(s)
- Adrià Fernández-Torras
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Arnau Comajuncosa-Creus
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Miquel Duran-Frigola
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Ersilia Open Source Initiative, Cambridge, United Kingdom
| | - Patrick Aloy
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
| |
Collapse
|