1
|
Lin M, Cai J, Wei Y, Peng X, Luo Q, Li B, Chen Y, Wang L. MalariaFlow: A comprehensive deep learning platform for multistage phenotypic antimalarial drug discovery. Eur J Med Chem 2024; 277:116776. [PMID: 39173285 DOI: 10.1016/j.ejmech.2024.116776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 07/31/2024] [Accepted: 08/01/2024] [Indexed: 08/24/2024]
Abstract
Malaria remains a significant global health challenge due to the growing drug resistance of Plasmodium parasites and the failure to block transmission within human host. While machine learning (ML) and deep learning (DL) methods have shown promise in accelerating antimalarial drug discovery, the performance of deep learning models based on molecular graph and other co-representation approaches warrants further exploration. Current research has overlooked mutant strains of the malaria parasite with varying degrees of sensitivity or resistance, and has not covered the prediction of inhibitory activities across the three major life cycle stages (liver, asexual blood, and gametocyte) within the human host, which is crucial for both treatment and transmission blocking. In this study, we manually curated a benchmark antimalarial activity dataset comprising 407,404 unique compounds and 410,654 bioactivity data points across ten Plasmodium phenotypes and three stages. The performance was systematically compared among two fingerprint-based ML models (RF::Morgan and XGBoost:Morgan), four graph-based DL models (GCN, GAT, MPNN, and Attentive FP), and three co-representations DL models (FP-GNN, HiGNN, and FG-BERT), which reveal that: 1) The FP-GNN model achieved the best predictive performance, outperforming the other methods in distinguishing active and inactive compounds across balanced, more positive, and more negative datasets, with an overall AUROC of 0.900; 2) Fingerprint-based ML models outperformed graph-based DL models on large datasets (>1000 compounds), but the three co-representations DL models were able to incorporate domain-specific chemical knowledge to bridge this gap, achieving better predictive performance. These findings provide valuable guidance for selecting appropriate ML and DL methods for antimalarial activity prediction tasks. The interpretability analysis of the FP-GNN model revealed its ability to accurately capture the key structural features responsible for the liver- and blood-stage activities of the known antimalarial drug atovaquone. Finally, we developed a web server, MalariaFlow, incorporating these high-quality models for antimalarial activity prediction, virtual screening, and similarity search, successfully predicting novel triple-stage antimalarial hits validated through experimental testing, demonstrating its effectiveness and value in discovering potential multistage antimalarial drug candidates.
Collapse
Affiliation(s)
- Mujie Lin
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Junxi Cai
- School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, 510006, China
| | - Yuancheng Wei
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Xinru Peng
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Qianhui Luo
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Biaoshun Li
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
2
|
Miao Y, Liu W, Alsallameh SMS, Albekairi NA, Muhseen ZT, Butch CJ. Unraveling Cordia myxa's anti-malarial potential: integrative insights from network pharmacology, molecular modeling, and machine learning. BMC Infect Dis 2024; 24:1180. [PMID: 39427127 PMCID: PMC11490058 DOI: 10.1186/s12879-024-10078-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 10/10/2024] [Indexed: 10/21/2024] Open
Abstract
Malaria is a potentially fatal infective illness caused due to parasites that belong to the Plasmodium genus, which are transferred to humans with the help of the stings of affected female Anopheles mosquitoes, and it persists as a serious public wellness problem worldwide. Cordia myxa is a medicinal plant that possesses various medicinal characteristics like antimicrobial, anti-inflammation, antioxidant, and antidiabetic activities, which makes it an important natural resource for the therapy of different maladies in traditional medicine. In this investigation, a certain network pharmacology method has been utilized to identify the potent active components, possible targets as well as signaling pathways present in C. myxa in relation to malaria therapy. The active compounds were submitted to molecular docking approaches to validate their successful activity against the potential targets. The study concluded that three constituents named cosmosiin, stigmastanol, robinetin, and quercetin were highly active and could regulate the expression of Interleukin 6 (IL6) and Cysteine-aspartic acid protease 3 (CASP3), which may act as a potential therapeutic target for malaria treatment. These analyses are validated by molecular dynamics simulation which reflects on the overall structural stability of the intermolecular conformation and interactions. These results can also be witnessed in simulation-based trajectories binding free energies, which concluded the significant role of electrostatic and van der Waals energies in total intermolecular interactions. Finally, we utilized machine learning to predict the anti-malarial activity of C. myxa compounds, comparing them with approved drugs. Using the Chemprop model and MAIP predictions, we assessed ten compounds, revealing their potential as lead anti-malarial agents. This study establishes a groundwork for comprehending the function of the anti-malaria action of C. myxa.
Collapse
Affiliation(s)
- Yufei Miao
- Department of Biomedical Engineering, College of Engineering and Applied Sciences, Nanjing University, Nanjing, 210093, China
| | - Wenkang Liu
- Department of Biomedical Engineering, College of Engineering and Applied Sciences, Nanjing University, Nanjing, 210093, China
| | - Sarah Mohammed Saeed Alsallameh
- Department of Medical Laboratories Techniques, College of Health and Medical Techniques, Gilgamesh Ahliya University Gau, Baghdad, Iraq
| | - Norah A Albekairi
- College of Pharmacy, King Saud University, Post Box 2455, Riyadh, 11451, Saudi Arabia
| | - Ziyad Tariq Muhseen
- Department of Biomedical Engineering, College of Engineering and Applied Sciences, Nanjing University, Nanjing, 210093, China.
- Department of Pharmacy, Al-Mustaqbal University, Hillah, Babylon, 51001, Iraq.
| | - Christopher J Butch
- Department of Biomedical Engineering, College of Engineering and Applied Sciences, Nanjing University, Nanjing, 210093, China.
- State Key Laboratory of Analytical Chemistry for Life Science, Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
3
|
Turon G, Tse E, Qiu X, Todd M, Duran-Frigola M. Open Source Code Contributions to Global Health: The Case of Antimalarial Drug Discovery. ACS Med Chem Lett 2024; 15:1645-1650. [PMID: 39291016 PMCID: PMC11403727 DOI: 10.1021/acsmedchemlett.4c00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/17/2024] [Accepted: 07/29/2024] [Indexed: 09/19/2024] Open
Abstract
The discovery of treatments for infectious diseases that affect the poorest countries has been stagnant for decades. As long as expected returns on investment remain low, pharmaceutical companies' lack of interest in this disease area must be compensated for with collaborative efforts from the public sector. New approaches to drug discovery, inspired by the "open source" philosophy prevalent in software development, offer a platform for experts from diverse backgrounds to contribute their skills, enhancing reproducibility, progress tracking, and public discussion. Here, we present the first efforts of Ersilia, an initiative focused on attracting data scientists into contributing to global health, toward meeting the goals of Open Source Malaria, a consortium of medicinal chemists investigating antimalarial compounds using a purely open science approach. We showcase the chemical space exploration of a set of triazolopyrazine compounds with potent antiplasmodial activity and discuss how open source practices can serve as a common ground to make drug discovery more inclusive and participative.
Collapse
Affiliation(s)
- Gemma Turon
- Ersilia Open Source Initiative, Barcelona 08039, Spain
| | - Edwin Tse
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | - Xin Qiu
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | - Matthew Todd
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | | |
Collapse
|
4
|
Hlozek J, Chibale K, Woodland JG. Ongoing Implementation and Prospective Validation of Artificial Intelligence/Machine Learning Tools at an African Drug Discovery Center. ACS Med Chem Lett 2024; 15:989-993. [PMID: 39015279 PMCID: PMC11247640 DOI: 10.1021/acsmedchemlett.4c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 06/05/2024] [Indexed: 07/18/2024] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) are anticipated to accelerate drug discovery programs. Following our development of an end-to-end virtual screening cascade at the University of Cape Town (UCT) Holistic Drug Discovery and Development (H3D) Center, we report the ongoing implementation of open-source AI/ML tools for use in resource-constrained settings.
Collapse
Affiliation(s)
- Jason Hlozek
- Department
of Chemistry and Holistic Drug Discovery and Development (H3D) Center, University of Cape Town, Cape Town 7701, South Africa
| | - Kelly Chibale
- Department
of Chemistry and Holistic Drug Discovery and Development (H3D) Center, University of Cape Town, Cape Town 7701, South Africa
- South
African Medical Research Council Drug Discovery and Development Research
Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town 7925, South Africa
| | - John G. Woodland
- Department
of Chemistry and Holistic Drug Discovery and Development (H3D) Center, University of Cape Town, Cape Town 7701, South Africa
- South
African Medical Research Council Drug Discovery and Development Research
Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town 7925, South Africa
| |
Collapse
|
5
|
Shen X, Zeng T, Chen N, Li J, Wu R. NIMO: A Natural Product-Inspired Molecular Generative Model Based on Conditional Transformer. Molecules 2024; 29:1867. [PMID: 38675687 PMCID: PMC11053988 DOI: 10.3390/molecules29081867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/11/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024] Open
Abstract
Natural products (NPs) have diverse biological activity and significant medicinal value. The structural diversity of NPs is the mainstay of drug discovery. Expanding the chemical space of NPs is an urgent need. Inspired by the concept of fragment-assembled pseudo-natural products, we developed a computational tool called NIMO, which is based on the transformer neural network model. NIMO employs two tailor-made motif extraction methods to map a molecular graph into a semantic motif sequence. All these generated motif sequences are used to train our molecular generative models. Various NIMO models were trained under different task scenarios by recognizing syntactic patterns and structure-property relationships. We further explored the performance of NIMO in structure-guided, activity-oriented, and pocket-based molecule generation tasks. Our results show that NIMO had excellent performance for molecule generation from scratch and structure optimization from a scaffold.
Collapse
Affiliation(s)
- Xiaojuan Shen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Nianhang Chen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Jiabo Li
- ChemXAI Inc., 53 Barry Lane, Syosset, NY 11791, USA
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| |
Collapse
|
6
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
7
|
Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera J, Magarinos M, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento A, Adasme M, Monecke P, Landrum G, Leach A. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 2024; 52:D1180-D1192. [PMID: 37933841 PMCID: PMC10767899 DOI: 10.1093/nar/gkad1004] [Citation(s) in RCA: 71] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open
Abstract
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements.
Collapse
Affiliation(s)
- Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eloy Felix
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Emma J Manners
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James Blackshaw
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sybilla Corbett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marleen de Veij
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Harris Ioannidis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Juan F Mosquera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Maria Paula Magarinos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nicolas Bosc
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ricardo Arcila
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tevfik Kizilören
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Melissa F Adasme
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peter Monecke
- Sanofi, R&D, Preclinical Safety, Industriepark Höchst, 65926 Frankfurt am Main, Germany
| | - Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
8
|
Li Y, Cardoso-Silva J, Kelly JM, Delves MJ, Furnham N, Papageorgiou LG, Tsoka S. Optimisation-based modelling for explainable lead discovery in malaria. Artif Intell Med 2024; 147:102700. [PMID: 38184363 DOI: 10.1016/j.artmed.2023.102700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/17/2023] [Accepted: 10/29/2023] [Indexed: 01/08/2024]
Abstract
BACKGROUND The search for new antimalarial treatments is urgent due to growing resistance to existing therapies. The Open Source Malaria (OSM) project offers a promising starting point, having extensively screened various compounds for their effectiveness. Further analysis of the chemical space surrounding these compounds could provide the means for innovative drugs. METHODS We report an optimisation-based method for quantitative structure-activity relationship (QSAR) modelling that provides explainable modelling of ligand activity through a mathematical programming formulation. The methodology is based on piecewise regression principles and offers optimal detection of breakpoint features, efficient allocation of samples into distinct sub-groups based on breakpoint feature values, and insightful regression coefficients. Analysis of OSM antimalarial compounds yields interpretable results through rules generated by the model that reflect the contribution of individual fingerprint fragments in ligand activity prediction. Using knowledge of fragment prioritisation and screening of commercially available compound libraries, potential lead compounds for antimalarials are identified and evaluated experimentally via a Plasmodium falciparum asexual growth inhibition assay (PfGIA) and a human cell cytotoxicity assay. CONCLUSIONS Three compounds are identified as potential leads for antimalarials using the methodology described above. This work illustrates how explainable predictive models based on mathematical optimisation can pave the way towards more efficient fragment-based lead discovery as applied in malaria.
Collapse
Affiliation(s)
- Yutong Li
- Department of Informatics, King's College London, Bush House, London, WC2B 4BG, UK
| | - Jonathan Cardoso-Silva
- Data Science Institute, London School of Economics and Political Science, Houghton St, London, WC2A 2AE, UK
| | - John M Kelly
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Michael J Delves
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Lazaros G Papageorgiou
- The Sargent Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London, WC1E 7JE, UK
| | - Sophia Tsoka
- Department of Informatics, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
9
|
Bosc N, Felix E, Gardner JMF, Mills J, Timmerman M, Asveld D, Rensen K, Mukherjee P, Das R, Chenu E, Besson D, Burrows JN, Duffy J, Laleu B, Guantai EM, Leach AR. MAIP: An Open-Source Tool to Enrich High-Throughput Screening Output and Identify Novel, Druglike Molecules with Antimalarial Activity. ACS Med Chem Lett 2023; 14:1733-1741. [PMID: 38116432 PMCID: PMC10726451 DOI: 10.1021/acsmedchemlett.3c00369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/30/2023] [Accepted: 11/06/2023] [Indexed: 12/21/2023] Open
Abstract
Efforts to tackle malaria must continue for a disease that threatens half of the global population. Parasite resistance to current therapies requires new chemotypes that are able to demonstrate effectiveness and safety. Previously, we developed a machine-learning-based approach to predict compound antimalarial activity, which was trained on the compound collections of several organizations. The resulting prediction platform, MAIP, was made freely available to the scientific community and offers a solution to prioritize molecules of interest in virtual screening and hit-to-lead optimization. Here, we experimentally validate MAIP and demonstrate how the approach was used in combination with a robust compound selection workflow and a recently introduced innovative high-throughput screening (HTS) cascade to select and purchase compounds from a public library for subsequent experimental screening. We observed a 12-fold enrichment compared with a randomly selected set of molecules, and the eight hits we ultimately selected exhibit good potency and absorption, distribution, metabolism, and excretion (ADME) profiles.
Collapse
Affiliation(s)
- Nicolas Bosc
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| | - Eloy Felix
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| | - J. Mark F. Gardner
- AMG
Consultants Ltd, Discovery
Park House, Discovery Park, Sandwich, Kent CT13 9ND, United Kingdom
| | - James Mills
- Sandexis
Medicinal Chemistry Ltd, Innovation House, Discovery Park, Sandwich, Kent CT13 9FF, United Kingdom
| | - Martijn Timmerman
- Pivot
Park Screening Centre, Pivot Park (Frederick Banting Building), Kloosterstraat 9, 5349 AB Oss, The Netherlands
| | - Dennis Asveld
- Pivot
Park Screening Centre, Pivot Park (Frederick Banting Building), Kloosterstraat 9, 5349 AB Oss, The Netherlands
| | - Kim Rensen
- Pivot
Park Screening Centre, Pivot Park (Frederick Banting Building), Kloosterstraat 9, 5349 AB Oss, The Netherlands
| | - Partha Mukherjee
- TCG
Life Sciences, Bengal Intelligent Park Limited, Block EP & GP, Salt Lake Electronics
Complex, Sector V, Kolkata, West Bengal 700091, India
| | - Rishi Das
- TCG
Life Sciences, Bengal Intelligent Park Limited, Block EP & GP, Salt Lake Electronics
Complex, Sector V, Kolkata, West Bengal 700091, India
| | - Elodie Chenu
- Medicines
for Malaria Ventures, 1215 Geneva, Switzerland
| | | | | | - James Duffy
- Medicines
for Malaria Ventures, 1215 Geneva, Switzerland
| | - Benoît Laleu
- Medicines
for Malaria Ventures, 1215 Geneva, Switzerland
| | - Eric M. Guantai
- Department
of Pharmacy, Faculty of Health Sciences, University of Nairobi, 00202 Nairobi, Kenya
| | - Andrew R. Leach
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| |
Collapse
|
10
|
van Heerden A, Turon G, Duran-Frigola M, Pillay N, Birkholtz LM. Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds. ACS OMEGA 2023; 8:43813-43826. [PMID: 38027377 PMCID: PMC10666252 DOI: 10.1021/acsomega.3c05664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/18/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]
Abstract
Efficacy data from diverse chemical libraries, screened against the various stages of the malaria parasite Plasmodium falciparum, including asexual blood stage (ABS) parasites and transmissible gametocytes, serve as a valuable reservoir of information on the chemical space of compounds that are either active (or not) against the parasite. We postulated that this data can be mined to define chemical features associated with the sole ABS activity and/or those that provide additional life cycle activity profiles like gametocytocidal activity. Additionally, this information could provide chemical features associated with inactive compounds, which could eliminate any future unnecessary screening of similar chemical analogs. Therefore, we aimed to use machine learning to identify the chemical space associated with stage-specific antimalarial activity. We collected data from various chemical libraries that were screened against the asexual (126 374 compounds) and sexual (gametocyte) stages of the parasite (93 941 compounds), calculated the compounds' molecular fingerprints, and trained machine learning models to recognize stage-specific active and inactive compounds. We were able to build several models that predict compound activity against ABS and dual activity against ABS and gametocytes, with Support Vector Machines (SVM) showing superior abilities with high recall (90 and 66%) and low false-positive predictions (15 and 1%). This allowed the identification of chemical features enriched in active and inactive populations, an important outcome that could be mined for essential chemical features to streamline hit-to-lead optimization strategies of antimalarial candidates. The predictive capabilities of the models held true in diverse chemical spaces, indicating that the ML models are therefore robust and can serve as a prioritization tool to drive and guide phenotypic screening and medicinal chemistry programs.
Collapse
Affiliation(s)
- Ashleigh van Heerden
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Gemma Turon
- Ersilia
Open Source Initiative, 28 Belgrave Road, Cambridge CB1 3DE, U.K.
| | | | - Nelishia Pillay
- Department
of Computer Science, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Lyn-Marié Birkholtz
- Department
of Biochemistry, Genetics and Microbiology, Institute for Sustainable
Malaria Control, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
11
|
Rodríguez-Belenguer P, March-Vila E, Pastor M, Mangas-Sanjuan V, Soria-Olivas E. Usage of model combination in computational toxicology. Toxicol Lett 2023; 389:34-44. [PMID: 37890682 DOI: 10.1016/j.toxlet.2023.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 10/29/2023]
Abstract
New Approach Methodologies (NAMs) have ushered in a new era in the field of toxicology, aiming to replace animal testing. However, despite these advancements, they are not exempt from the inherent complexities associated with the study's endpoint. In this review, we have identified three major groups of complexities: mechanistic, chemical space, and methodological. The mechanistic complexity arises from interconnected biological processes within a network that are challenging to model in a single step. In the second group, chemical space complexity exhibits significant dissimilarity between compounds in the training and test series. The third group encompasses algorithmic and molecular descriptor limitations and typical class imbalance problems. To address these complexities, this work provides a guide to the usage of a combination of predictive Quantitative Structure-Activity Relationship (QSAR) models, known as metamodels. This combination of low-level models (LLMs) enables a more precise approach to the problem by focusing on different sub-mechanisms or sub-processes. For mechanistic complexity, multiple Molecular Initiating Events (MIEs) or levels of information are combined to form a mechanistic-based metamodel. Regarding the complexity arising from chemical space, two types of approaches were reviewed to construct a fragment-based chemical space metamodel: those with and without structure sharing. Metamodels with structure sharing utilize unsupervised strategies to identify data patterns and build low-level models for each cluster, which are then combined. For situations without structure sharing due to pharmaceutical industry intellectual property, the use of prediction sharing, and federated learning approaches have been reviewed. Lastly, to tackle methodological complexity, various algorithms are combined to overcome their limitations, diverse descriptors are employed to enhance problem definition and balanced dataset combinations are used to address class imbalance issues (methodological-based metamodels). Remarkably, metamodels consistently outperformed classical QSAR models across all cases, highlighting the importance of alternatives to classical QSAR models when faced with such complexities.
Collapse
Affiliation(s)
- Pablo Rodríguez-Belenguer
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain; Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain
| | - Eric March-Vila
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Victor Mangas-Sanjuan
- Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain; Interuniversity Research Institute for Molecular Recognition and Technological Development, Universitat Politècnica de València, 46100 Valencia, Spain
| | - Emilio Soria-Olivas
- IDAL, Intelligent Data Analysis Laboratory, ETSE, Universitat de València, 46100 Valencia, Spain.
| |
Collapse
|
12
|
Turon G, Hlozek J, Woodland JG, Kumar A, Chibale K, Duran-Frigola M. First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa. Nat Commun 2023; 14:5736. [PMID: 37714843 PMCID: PMC10504240 DOI: 10.1038/s41467-023-41512-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/06/2023] [Indexed: 09/17/2023] Open
Abstract
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. We present ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool for quantitative structure-activity/property relationship (QSAR/QSPR) modelling. ZairaChem is fully automated, requires low computational resources and works across a broad spectrum of datasets. We describe an end-to-end implementation at the H3D Centre, the leading integrated drug discovery unit in Africa, at which no prior AI/ML capabilities were available. By leveraging in-house data collected over a decade, we have developed a virtual screening cascade for malaria and tuberculosis drug discovery comprising 15 models for key decision-making assays ranging from whole-cell phenotypic screening and cytotoxicity to aqueous solubility, permeability, microsomal metabolic stability, cytochrome inhibition, and cardiotoxicity. We show how computational profiling of compounds, prior to synthesis and testing, can inform progression of frontrunner compounds at H3D. This project is a first-of-its-kind deployment at scale of AI/ML tools in a research centre operating in a low-resource setting.
Collapse
Affiliation(s)
- Gemma Turon
- Ersilia Open Source Initiative, Cambridge, UK
| | - Jason Hlozek
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
| | - John G Woodland
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ankur Kumar
- Ersilia Open Source Initiative, Cambridge, UK
| | - Kelly Chibale
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa.
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa.
| | | |
Collapse
|
13
|
Smajić A, Rami I, Sosnin S, Ecker GF. Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets. Chem Res Toxicol 2023; 36:1300-1312. [PMID: 37439496 PMCID: PMC10445286 DOI: 10.1021/acs.chemrestox.3c00042] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Indexed: 07/14/2023]
Abstract
Each year, publicly available databases are updated with new compounds from different research institutions. Positive experimental outcomes are more likely to be reported; therefore, they account for a considerable fraction of these entries. Established publicly available databases such as ChEMBL allow researchers to use information without constrictions and create predictive tools for a broad spectrum of applications in the field of toxicology. Therefore, we investigated the distribution of positive and nonpositive entries within ChEMBL for a set of off-targets and its impact on the performance of classification models when applied to pharmaceutical industry data sets. Results indicate that models trained on publicly available data tend to overpredict positives, and models based on industry data sets predict negatives more often than those built using publicly available data sets. This is strengthened even further by the visualization of the prediction space for a set of 10,000 compounds, which makes it possible to identify regions in the chemical space where predictions converge. Finally, we highlight the utilization of these models for consensus modeling for potential adverse events prediction.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Iris Rami
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
14
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
15
|
Richard-Bollans A, Aitken C, Antonelli A, Bitencourt C, Goyder D, Lucas E, Ondo I, Pérez-Escobar OA, Pironon S, Richardson JE, Russell D, Silvestro D, Wright CW, Howes MJR. Machine learning enhances prediction of plants as potential sources of antimalarials. FRONTIERS IN PLANT SCIENCE 2023; 14:1173328. [PMID: 37304721 PMCID: PMC10248027 DOI: 10.3389/fpls.2023.1173328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 04/20/2023] [Indexed: 06/13/2023]
Abstract
Plants are a rich source of bioactive compounds and a number of plant-derived antiplasmodial compounds have been developed into pharmaceutical drugs for the prevention and treatment of malaria, a major public health challenge. However, identifying plants with antiplasmodial potential can be time-consuming and costly. One approach for selecting plants to investigate is based on ethnobotanical knowledge which, though having provided some major successes, is restricted to a relatively small group of plant species. Machine learning, incorporating ethnobotanical and plant trait data, provides a promising approach to improve the identification of antiplasmodial plants and accelerate the search for new plant-derived antiplasmodial compounds. In this paper we present a novel dataset on antiplasmodial activity for three flowering plant families - Apocynaceae, Loganiaceae and Rubiaceae (together comprising c. 21,100 species) - and demonstrate the ability of machine learning algorithms to predict the antiplasmodial potential of plant species. We evaluate the predictive capability of a variety of algorithms - Support Vector Machines, Logistic Regression, Gradient Boosted Trees and Bayesian Neural Networks - and compare these to two ethnobotanical selection approaches - based on usage as an antimalarial and general usage as a medicine. We evaluate the approaches using the given data and when the given samples are reweighted to correct for sampling biases. In both evaluation settings each of the machine learning models have a higher precision than the ethnobotanical approaches. In the bias-corrected scenario, the Support Vector classifier performs best - attaining a mean precision of 0.67 compared to the best performing ethnobotanical approach with a mean precision of 0.46. We also use the bias correction method and the Support Vector classifier to estimate the potential of plants to provide novel antiplasmodial compounds. We estimate that 7677 species in Apocynaceae, Loganiaceae and Rubiaceae warrant further investigation and that at least 1300 active antiplasmodial species are highly unlikely to be investigated by conventional approaches. While traditional and Indigenous knowledge remains vital to our understanding of people-plant relationships and an invaluable source of information, these results indicate a vast and relatively untapped source in the search for new plant-derived antiplasmodial compounds.
Collapse
Affiliation(s)
| | - Conal Aitken
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- EaStCHEM, School of Chemistry, University of St Andrews, St Andrews, United Kingdom
| | - Alexandre Antonelli
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Department of Biology, University of Oxford, Oxford, United Kingdom
| | | | - David Goyder
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Eve Lucas
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Ian Ondo
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | | | - Samuel Pironon
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC), Cambridge, United Kingdom
| | - James E. Richardson
- School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland
- Tropical Diversity Section, Royal Botanic Garden, Edinburgh, United Kingdom
- Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
- Environmental Research Institute, University College Cork, Cork, Ireland
| | - David Russell
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Daniele Silvestro
- Gothenburg Global Biodiversity Centre, Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Colin W. Wright
- School of Pharmacy and Medical Sciences, University of Bradford, Bradford, United Kingdom
| | - Melanie-Jayne R. Howes
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, London, United Kingdom
| |
Collapse
|
16
|
Mughal H, Bell EC, Mughal K, Derbyshire ER, Freundlich JS. Random Forest Model Predictions Afford Dual-Stage Antimalarial Agents. ACS Infect Dis 2022; 8:1553-1562. [PMID: 35894649 PMCID: PMC9987178 DOI: 10.1021/acsinfecdis.2c00189] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The need for novel antimalarials is apparent given the continuing disease burden worldwide, despite significant drug discovery advances from the bench to the bedside. In particular, small-molecule agents with potent efficacy against both the liver and blood stages of Plasmodium parasite infection are critical for clinical settings as they would simultaneously prevent and treat malaria with a reduced selection pressure for resistance. While experimental screens for such dual-stage inhibitors have been conducted, the time and cost of these efforts limit their scope. Here, we have focused on leveraging machine learning approaches to discover novel antimalarials with such properties. A random forest modeling approach was taken to predict small molecules with in vitro efficacy versus liver-stage Plasmodium berghei parasites and a lack of human liver cell cytotoxicity. Empirical validation of the model was achieved with the realization of hits with liver-stage efficacy after prospective scoring of a commercial diversity library and consideration of structural diversity. A subset of these hits also demonstrated promising blood-stage Plasmodium falciparum efficacy. These 18 validated dual-stage antimalarials represent novel starting points for drug discovery and mechanism of action studies with significant potential for seeding a new generation of therapies.
Collapse
Affiliation(s)
- Haseeb Mughal
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University – New Jersey Medical School, 185 South Orange Ave, Newark, NJ, 07103
| | - Elise C. Bell
- Department of Chemistry, Duke University, 124 Science Drive, Durham, NC 27708, USA
| | - Khadija Mughal
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University – New Jersey Medical School, 185 South Orange Ave, Newark, NJ, 07103
| | - Emily R. Derbyshire
- Department of Chemistry, Duke University, 124 Science Drive, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, 213 Research Drive, Durham, NC 27710, USA
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University – New Jersey Medical School, 185 South Orange Ave, Newark, NJ, 07103
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University – New Jersey Medical School, Newark, NJ, 07103
| |
Collapse
|
17
|
In Silico Prediction of Plasmodium falciparum Cytoadherence Inhibitors That Disrupt Interaction between gC1qR-DBLβ12 Complex. Pharmaceuticals (Basel) 2022; 15:ph15060691. [PMID: 35745611 PMCID: PMC9230678 DOI: 10.3390/ph15060691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 02/06/2023] Open
Abstract
Malaria causes about half a million deaths per year, mainly in children below 5 years of age. Cytoadherence of Plasmodium falciparum infected erythrocytes in brain and placenta has been linked to severe malaria and malarial related deaths. Cytoadherence is mediated by binding of human receptor gC1qR to the DBLβ12 domain of a P. falciparum erythrocyte membrane protein family 1 (PfEMP1) protein. In the present work, molecular dynamic simulation was extensively studied for the gC1qR-DBLβ12 complex. The stabilized protein complex was used to study the protein–protein interface interactions and mapping of interactive amino acid residues as hotspot were performed. Prediction of inhibitors were performed by using virtual protein–protein inhibitor database Timbal screening of about 15,000 compounds. In silico mutagenesis studies, binding profile and protein ligand interaction fingerprinting were used to strengthen the screening of the potential inhibitors of gC1qR-DBLβ12 interface. Six compounds were selected and were further subjected to the MAIP analysis and ADMET studies. From these six compounds, the compounds 3, 5, and 6 were found to outperform on all screening criteria from the rest selected compounds. These compounds may provide novel drugs to treat and manage severe falciparum malaria. Additionally. the identified hotspots can be used in future for designing novel interventions for disruption of interface interactions, such as through peptides or vaccines. Futher in vitro and in vivo studies are required for the confirmation of these compounds as potential inhibitors of gC1qR-DBLβ12 interaction.
Collapse
|
18
|
Kim HW, Wang M, Leber CA, Nothias LF, Reher R, Kang KB, van der Hooft JJJ, Dorrestein PC, Gerwick WH, Cottrell GW. NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products. JOURNAL OF NATURAL PRODUCTS 2021; 84:2795-2807. [PMID: 34662515 PMCID: PMC8631337 DOI: 10.1021/acs.jnatprod.1c00399] [Citation(s) in RCA: 133] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Indexed: 05/04/2023]
Abstract
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.
Collapse
Affiliation(s)
- Hyun Woo Kim
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
| | - Mingxun Wang
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Ometa
Laboratories LLC, San Diego, California 92121, United States
| | - Christopher A. Leber
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
| | - Louis-Félix Nothias
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Raphael Reher
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
- Institute
of Pharmacy Martin-Luther-University Halle-Wittenberg, Universitätsplatz 10, 06108 Halle (Saale), Germany
| | - Kyo Bin Kang
- Research
Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women’s University, Seoul 04310, Korea
| | | | - Pieter C. Dorrestein
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - William H. Gerwick
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Garrison W. Cottrell
- Department
of Computer Science and Engineering, University
of California, San Diego, La Jolla, California 92093, United States
| |
Collapse
|
19
|
Humbeck L, Morawietz T, Sturm N, Zalewski A, Harnqvist S, Heyndrickx W, Holmes M, Beck B. Don't Overweight Weights: Evaluation of Weighting Strategies for Multi-Task Bioactivity Classification Models. Molecules 2021; 26:6959. [PMID: 34834051 PMCID: PMC8620420 DOI: 10.3390/molecules26226959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 11/11/2021] [Accepted: 11/12/2021] [Indexed: 11/17/2022] Open
Abstract
Machine learning models predicting the bioactivity of chemical compounds belong nowadays to the standard tools of cheminformaticians and computational medicinal chemists. Multi-task and federated learning are promising machine learning approaches that allow privacy-preserving usage of large amounts of data from diverse sources, which is crucial for achieving good generalization and high-performance results. Using large, real world data sets from six pharmaceutical companies, here we investigate different strategies for averaging weighted task loss functions to train multi-task bioactivity classification models. The weighting strategies shall be suitable for federated learning and ensure that learning efforts are well distributed even if data are diverse. Comparing several approaches using weights that depend on the number of sub-tasks per assay, task size, and class balance, respectively, we find that a simple sub-task weighting approach leads to robust model performance for all investigated data sets and is especially suited for federated learning.
Collapse
Affiliation(s)
- Lina Humbeck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397 Biberach an der Riss, Germany
| | - Tobias Morawietz
- Bayer AG, Pharmaceuticals, R&D, Digital Technologies, Computational Molecular Design, 42096 Wuppertal, Germany
| | - Noe Sturm
- Novartis Institutes for BioMedical Research, CH-4002 Basel, Switzerland
| | - Adam Zalewski
- Amgen Research (Munich) GmbH, Staffelseestraße 2, 81477 Munich, Germany
| | - Simon Harnqvist
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | | | - Matthew Holmes
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Bernd Beck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
20
|
Martin EJ, Zhu XW. Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies. J Chem Inf Model 2021; 61:1603-1616. [PMID: 33844519 DOI: 10.1021/acs.jcim.0c01342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company's compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators' models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.
Collapse
Affiliation(s)
- Eric J Martin
- Novartis Institute for Biomedical Research, 5959 Horton Street, Emeryville, California 94608-2916, United States
| | - Xiang-Wei Zhu
- Novartis Institute for Biomedical Research, 5959 Horton Street, Emeryville, California 94608-2916, United States
| |
Collapse
|