1
|
Wang J, Zhu F. Multi-objective molecular generation via clustered Pareto-based reinforcement learning. Neural Netw 2024; 179:106596. [PMID: 39163823 DOI: 10.1016/j.neunet.2024.106596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/16/2024] [Accepted: 08/01/2024] [Indexed: 08/22/2024]
Abstract
De novo molecular design is the process of learning knowledge from existing data to propose new chemical structures that satisfy the desired properties. By using de novo design to generate compounds in a directed manner, better solutions can be obtained in large chemical libraries with less comparison cost. But drug design needs to take multiple factors into consideration. For example, in polypharmacology, molecules that activate or inhibit multiple target proteins produce multiple pharmacological activities and are less susceptible to drug resistance. However, most existing molecular generation methods either focus only on affinity for a single target or fail to effectively balance the relationship between multiple targets, resulting in insufficient validity and desirability of the generated molecules. To address the problems, an approach called clustered Pareto-based reinforcement learning (CPRL) is proposed. In CPRL, a pre-trained model is constructed to grasp existing molecular knowledge in a supervised learning manner. In addition, the clustered Pareto optimization algorithm is presented to find the best solution between different objectives. The algorithm first extracts an update set from the sampled molecules through the designed aggregation-based molecular clustering. Then, the final reward is computed by constructing the Pareto frontier ranking of the molecules from the updated set. To explore the vast chemical space, a reinforcement learning agent is designed in CPRL that can be updated under the guidance of the final reward to balance multiple properties. Furthermore, to increase the internal diversity of the molecules, a fixed-parameter exploration model is used for sampling in conjunction with the agent. The experimental results demonstrate that CPRL is capable of balancing multiple properties of the molecule and has higher desirability and validity, reaching 0.9551 and 0.9923, respectively.
Collapse
Affiliation(s)
- Jing Wang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| |
Collapse
|
2
|
Wang J, Wang X, Pang Y. StructNet-DDI: Molecular Structure Characterization-Based ResNet for Prediction of Drug-Drug Interactions. Molecules 2024; 29:4829. [PMID: 39459198 PMCID: PMC11510539 DOI: 10.3390/molecules29204829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 09/30/2024] [Accepted: 10/09/2024] [Indexed: 10/28/2024] Open
Abstract
This study introduces a deep learning framework based on SMILES representations of chemical structures to predict drug-drug interactions (DDIs). The model extracts Morgan fingerprints and key molecular descriptors, transforming them into raw graphical features for input into a modified ResNet18 architecture. The deep residual network, enhanced with regularization techniques, efficiently addresses training issues such as gradient vanishing and exploding, resulting in superior predictive performance. Experimental results show that StructNet-DDI achieved an AUC of 99.7%, an accuracy of 94.4%, and an AUPR of 99.9%, demonstrating the model's effectiveness and reliability. These findings highlight that StructNet-DDI can effectively extract crucial features from molecular structures, offering a simple yet robust tool for DDI prediction.
Collapse
Affiliation(s)
- Jihong Wang
- School of Computer, Guangdong University of Education, Guangzhou 510310, China
| | - Xiaodan Wang
- School of Pharmaceutical Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan 528458, China
| | - Yuyao Pang
- School of Pharmaceutical Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan 528458, China
| |
Collapse
|
3
|
Zhai S, Tan Y, Zhu C, Zhang C, Gao Y, Mao Q, Zhang Y, Duan H, Yin Y. PepExplainer: An explainable deep learning model for selection-based macrocyclic peptide bioactivity prediction and optimization. Eur J Med Chem 2024; 275:116628. [PMID: 38944933 DOI: 10.1016/j.ejmech.2024.116628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yan Gao
- Qilu Institute of Technology, Jinan, 250200, China
| | - Qingyi Mao
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China; Shandong Research Institute of Industrial Technology, Jinan, 250101, China.
| |
Collapse
|
4
|
Zheng Y, Ma Y, Xiong Q, Zhu K, Weng N, Zhu Q. The role of artificial intelligence in the development of anticancer therapeutics from natural polyphenols: Current advances and future prospects. Pharmacol Res 2024; 208:107381. [PMID: 39218422 DOI: 10.1016/j.phrs.2024.107381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/06/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Natural polyphenols, abundant in the human diet, are derived from a wide variety of sources. Numerous preclinical studies have demonstrated their significant anticancer properties against various malignancies, making them valuable resources for drug development. However, traditional experimental methods for developing anticancer therapies from natural polyphenols are time-consuming and labor-intensive. Recently, artificial intelligence has shown promising advancements in drug discovery. Integrating AI technologies into the development process for natural polyphenols can substantially reduce development time and enhance efficiency. In this study, we review the crucial roles of natural polyphenols in anticancer treatment and explore the potential of AI technologies to aid in drug development. Specifically, we discuss the application of AI in key stages such as drug structure prediction, virtual drug screening, prediction of biological activity, and drug-target protein interaction, highlighting the potential to revolutionize the development of natural polyphenol-based anticancer therapies.
Collapse
Affiliation(s)
- Ying Zheng
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Yifei Ma
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Qunli Xiong
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Kai Zhu
- Department of Medical Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fujian 350011, PR China
| | - Ningna Weng
- Department of Medical Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fujian 350011, PR China
| | - Qing Zhu
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China.
| |
Collapse
|
5
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
6
|
Snyder SH, Vignaux PA, Ozalp MK, Gerlach J, Puhl AC, Lane TR, Corbett J, Urbina F, Ekins S. The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications. Commun Chem 2024; 7:134. [PMID: 38866916 PMCID: PMC11169557 DOI: 10.1038/s42004-024-01220-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 06/04/2024] [Indexed: 06/14/2024] Open
Abstract
Recent advances in machine learning (ML) have led to newer model architectures including transformers (large language models, LLMs) showing state of the art results in text generation and image analysis as well as few-shot learning (FSLC) models which offer predictive power with extremely small datasets. These new architectures may offer promise, yet the 'no-free lunch' theorem suggests that no single model algorithm can outperform at all possible tasks. Here, we explore the capabilities of classical (SVR), FSLC, and transformer models (MolBART) over a range of dataset tasks and show a 'goldilocks zone' for each model type, in which dataset size and feature distribution (i.e. dataset "diversity") determines the optimal algorithm strategy. When datasets are small ( < 50 molecules), FSLC tend to outperform both classical ML and transformers. When datasets are small-to-medium sized (50-240 molecules) and diverse, transformers outperform both classical models and few-shot learning. Finally, when datasets are of larger and of sufficient size, classical models then perform the best, suggesting that the optimal model to choose likely depends on the dataset available, its size and diversity. These findings may help to answer the perennial question of which ML algorithm is to be used when faced with a new dataset.
Collapse
Affiliation(s)
- Scott H Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Patricia A Vignaux
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Mustafa Kemal Ozalp
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - John Corbett
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
7
|
Karasev DA, Sobolev BN, Filimonov DA, Lagunin A. Prediction of viral protease inhibitors using proteochemometrics approach. Comput Biol Chem 2024; 110:108061. [PMID: 38574417 DOI: 10.1016/j.compbiolchem.2024.108061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/21/2024] [Accepted: 03/23/2024] [Indexed: 04/06/2024]
Abstract
Being widely accepted tools in computational drug search, the (Q)SAR methods have limitations related to data incompleteness. The proteochemometrics (PCM) approach expands the applicability area by using description for both protein and ligand structures. The PCM algorithms are urgently required for the development of new antiviral agents. We suggest the PCM method using the TLMNA descriptors, combining the MNA descriptors of ligands and protein sequence N-grams. Our method was validated on the viral chymotrypsin-like proteases and their ligands. We have developed an original protocol allowing us to collect a comprehensive set of 15 protein sequences and more than 9000 ligands from the ChEMBL database. The N-grams were derived from the 3D-based alignment, accurately superposing ligand-binding regions. In testing the ligand set in SAR mode with MNA descriptors, an accuracy above 0.95 was determined that shows the perspective of the antiviral drug search in virtual chemical libraries. The effective PCM models were built with the TLMNA descriptor. The strong validation procedure with pair exclusion simulated the prediction of interactions between the new ligands and new targets, resulting in accuracy estimation up to 0.89. The PCM approach shows slightly lower accuracy caused by more uncertainty compared with SAR, but it overcomes the problem of data incompleteness.
Collapse
Affiliation(s)
- Dmitry A Karasev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia.
| | - Boris N Sobolev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Dmitry A Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow 117997, Russia
| |
Collapse
|
8
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
9
|
Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024; 64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]
Abstract
A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.
Collapse
Affiliation(s)
- Emma Svensson
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden
| | - Pieter-Jan Hoedt
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Sepp Hochreiter
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Institute
of Advanced Research in Artificial Intelligence (IARAI), Vienna 1030, Austria
| | - Günter Klambauer
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
10
|
Rocha SM, Gustafson DL, Safe S, Tjalkens RB. Comparative safety, pharmacokinetics, and off-target assessment of 1,1-bis(3'-indolyl)-1-( p-chlorophenyl) methane in mouse and dog: implications for therapeutic development. Toxicol Res (Camb) 2024; 13:tfae059. [PMID: 38655145 PMCID: PMC11033559 DOI: 10.1093/toxres/tfae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
The modified phytochemical derivative, 1,1-bis(3'-indolyl)-1-(p-chlorophenyl) methane (C-DIM12), has been identified as a potential therapeutic platform based on its capacity to improve disease outcomes in models of neurodegeneration and cancer. However, comprehensive safety studies investigating pathology and off-target binding have not been conducted. To address this, we administered C-DIM12 orogastrically to outbred male CD-1 mice for 7 days (50 mg/kg/day, 200 mg/kg/day, and 300 mg/kg/day) and investigated changes in hematology, clinical chemistry, and whole-body tissue pathology. We also delivered a single dose of C-DIM12 (1 mg/kg, 5 mg/kg, 25 mg/kg, 100 mg/kg, 300 mg/kg, 1,000 mg/kg) orogastrically to male and female beagle dogs and investigated hematology and clinical chemistry, as well as plasma pharmacokinetics over 48-h. Consecutive in-vitro off-target binding through inhibition was performed with 10 μM C-DIM12 against 68 targets in tandem with predictive off-target structural binding capacity. These data show that the highest dose C-DIM12 administered in each species caused modest liver pathology in mouse and dog, whereas lower doses were unremarkable. Off-target screening and predictive modeling of C-DIM12 show inhibition of serine/threonine kinases, calcium signaling, G-protein coupled receptors, extracellular matrix degradation, and vascular and transcriptional regulation pathways. Collectively, these data demonstrate that low doses of C-DIM12 do not induce pathology and are capable of modulating targets relevant to neurodegeneration and cancer.
Collapse
Affiliation(s)
- Savannah M Rocha
- Department of Environmental and Radiological Health Sciences, Colorado State University, 1680 Campus Delivery Fort Collins, CO 80523, USA
| | - Daniel L Gustafson
- Department of Clinical Sciences, Colorado State University, 1678 Campus Delivery Fort Collins, CO 80523, USA
| | - Stephen Safe
- Department of Veterinary Physiology and Pharmacology, Texas A&M School of Veterinary, Medicine & Biomedical Sciences, 4466 TAMU College Station, TX 77843-4466, USA
| | - Ronald B Tjalkens
- Department of Environmental and Radiological Health Sciences, Colorado State University, 1680 Campus Delivery Fort Collins, CO 80523, USA
| |
Collapse
|
11
|
Jimenes-Vargas K, Pazos A, Munteanu CR, Perez-Castillo Y, Tejera E. Prediction of compound-target interaction using several artificial intelligence algorithms and comparison with a consensus-based strategy. J Cheminform 2024; 16:27. [PMID: 38449058 PMCID: PMC10919000 DOI: 10.1186/s13321-024-00816-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/15/2024] [Indexed: 03/08/2024] Open
Abstract
For understanding a chemical compound's mechanism of action and its side effects, as well as for drug discovery, it is crucial to predict its possible protein targets. This study examines 15 developed target-centric models (TCM) employing different molecular descriptions and machine learning algorithms. They were contrasted with 17 third-party models implemented as web tools (WTCM). In both sets of models, consensus strategies were implemented as potential improvement over individual predictions. The findings indicate that TCM reach f1-score values greater than 0.8. Comparing both approaches, the best TCM achieves values of 0.75, 0.61, 0.25 and 0.38 for true positive/negative rates (TPR, TNR) and false negative/positive rates (FNR, FPR); outperforming the best WTCM. Moreover, the consensus strategy proves to have the most relevant results in the top 20 % of target profiles. TCM consensus reach TPR and FNR values of 0.98 and 0; while on WTCM reach values of 0.75 and 0.24. The implemented computational tool with the TCM and their consensus strategy at: https://bioquimio.udla.edu.ec/tidentification01/ . Scientific Contribution: We compare and discuss the performances of 17 public compound-target interaction prediction models and 15 new constructions. We also explore a compound-target interaction prioritization strategy using a consensus approach, and we analyzed the challenging involved in interactions modeling.
Collapse
Affiliation(s)
- Karina Jimenes-Vargas
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito, 170504, Ecuador.
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain.
| | - Alejandro Pazos
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruna (CHUAC), 15006, A Coruna, Spain
| | - Cristian R Munteanu
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruna (CHUAC), 15006, A Coruna, Spain
| | | | - Eduardo Tejera
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito, 170504, Ecuador.
| |
Collapse
|
12
|
Manelfi C, Tazzari V, Lunghini F, Cerchia C, Fava A, Pedretti A, Stouten PFW, Vistoli G, Beccari AR. "DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases. J Cheminform 2024; 16:21. [PMID: 38395961 PMCID: PMC10893756 DOI: 10.1186/s13321-024-00813-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.
Collapse
Affiliation(s)
- Candida Manelfi
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Valerio Tazzari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Carmen Cerchia
- Department of Pharmacy, University of Naples "Federico II", Via D. Montesano 49, 80131, Napoli, Italy
| | - Anna Fava
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, 20133, Milano, Italy
| | - Pieter F W Stouten
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
- Stouten Pharma Consultancy BV, Kempenarestraat 47, 2860, Sint-Katelijne-Waver, Belgium
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, 20133, Milano, Italy
| | | |
Collapse
|
13
|
Guo J. Improving structure-based protein-ligand affinity prediction by graph representation learning and ensemble learning. PLoS One 2024; 19:e0296676. [PMID: 38232063 DOI: 10.1371/journal.pone.0296676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024] Open
Abstract
Predicting protein-ligand binding affinity presents a viable solution for accelerating the discovery of new lead compounds. The recent widespread application of machine learning approaches, especially graph neural networks, has brought new advancements in this field. However, some existing structure-based methods treat protein macromolecules and ligand small molecules in the same way and ignore the data heterogeneity, potentially leading to incomplete exploration of the biochemical information of ligands. In this work, we propose LGN, a graph neural network-based fusion model with extra ligand feature extraction to effectively capture local features and global features within the protein-ligand complex, and make use of interaction fingerprints. By combining the ligand-based features and interaction fingerprints, LGN achieves Pearson correlation coefficients of up to 0.842 on the PDBbind 2016 core set, compared to 0.807 when using the features of complex graphs alone. Finally, we verify the rationalization and generalization of our model through comprehensive experiments. We also compare our model with state-of-the-art baseline methods, which validates the superiority of our model. To reduce the impact of data similarity, we increase the robustness of the model by incorporating ensemble learning.
Collapse
Affiliation(s)
- Jia Guo
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Beijing, P.R. China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
14
|
Zhang R, Xie X, Ni D, Wang H, Li J, Xiao W. MT-EpiPred: Multitask Learning for Prediction of Small-Molecule Epigenetic Modulators. J Chem Inf Model 2024; 64:110-118. [PMID: 38109786 DOI: 10.1021/acs.jcim.3c01368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Epigenetic modulators play an increasingly crucial role in the treatment of various diseases. In this case, it is imperative to systematically investigate the activity of these agents and understand their influence on the entire epigenetic regulatory network rather than solely concentrate on individual targets. This work introduces MT-EpiPred, a multitask learning method capable of predicting the activity of compounds against 78 epigenetic targets. MT-EpiPred demonstrated outstanding performance, boasting an average auROC of 0.915 and the ability to handle few-shot targets. In comparison to the existing method, MT-EpiPred not only expands the target pool but also achieves superior predictive performance with the same data set. MT-EpiPred was then applied to predict the epigenetic target of a newly synthesized compound (1), where the molecular target was unknown. The method identified KDM4D as a potential target, which was subsequently validated through an in vitro enzyme inhibition assay, revealing an IC50 of 4.8 μM. The MT-EpiPred method has been implemented in the web server MT-EpiPred (http://epipred.com), providing free accessibility. In summary, this work presents a convenient and accurate tool for discovering novel small-molecule epigenetic modulators, particularly in the development of selective inhibitors and evaluating the impact of these inhibitors over a broad epigenetic network.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Xingran Xie
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Dongxuan Ni
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Hairong Wang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Jin Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| |
Collapse
|
15
|
Zhu Z, Yao Z, Zheng X, Qi G, Li Y, Mazur N, Gao X, Gong Y, Cong B. Drug-target affinity prediction method based on multi-scale information interaction and graph optimization. Comput Biol Med 2023; 167:107621. [PMID: 37907030 DOI: 10.1016/j.compbiomed.2023.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/02/2023]
Abstract
Drug-target affinity (DTA) prediction as an emerging and effective method is widely applied to explore the strength of drug-target interactions in drug development research. By predicting these interactions, researchers can assess the potential efficacy and safety of candidate drugs at an early stage, narrowing down the search space for therapeutic targets and accelerating the discovery and development of new drugs. However, existing DTA prediction models mainly use graphical representations of drug molecules, which lack information on interactions between individual substructures, thus affecting prediction accuracy and model interpretability. Therefore, transformer and diffusion on drug graphs in DTA prediction (TDGraphDTA) are introduced to predict drug-target interactions using multi-scale information interaction and graph optimization. An interactive module is integrated into feature extraction of drug and target features at different granularity levels. A diffusion model-based graph optimization module is proposed to improve the representation of molecular graph structures and enhance the interpretability of graph representations while obtaining optimal feature representations. In addition, TDGraphDTA improves the accuracy and reliability of predictions by capturing relationships and contextual information between molecular substructures. The performance of the proposed TDGraphDTA in DTA prediction was verified on three publicly available benchmark datasets (Davis, Metz, and KIBA). Compared with state-of-the-art baseline models, it achieved better results in terms of consistency index, R-squared, etc. Furthermore, compared with some existing methods, the proposed TDGraphDTA is demonstrated to have better structure capturing capabilities by visualizing the feature capturing capabilities of the model using Grad-AAM toxicity labels in the ToxCast dataset. The corresponding source codes are available at https://github.com/Lamouryz/TDGraph.
Collapse
Affiliation(s)
- Zhiqin Zhu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Zheng Yao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Xin Zheng
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Guanqiu Qi
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Yuanyuan Li
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Neal Mazur
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Xinbo Gao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yifei Gong
- Faculty of applied science & engineering, the Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto at Toronto, ON M5S, Canada.
| | - Baisen Cong
- Diagnostics Digital, DH(Shanghai) Diagnostics Co, Ltd, a Danaher company, Shanghai, 200335, China.
| |
Collapse
|
16
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
17
|
Béquignon OM, Gómez-Tamayo JC, Lenselink EB, Wink S, Hiemstra S, Lam CC, Gadaleta D, Roncaglioni A, Norinder U, Water BVD, Pastor M, van Westen GJP. Collaborative SAR Modeling and Prospective In Vitro Validation of Oxidative Stress Activation in Human HepG2 Cells. J Chem Inf Model 2023; 63:5433-5445. [PMID: 37616385 PMCID: PMC10498489 DOI: 10.1021/acs.jcim.3c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Indexed: 08/26/2023]
Abstract
Oxidative stress is the consequence of an abnormal increase of reactive oxygen species (ROS). ROS are generated mainly during the metabolism in both normal and pathological conditions as well as from exposure to xenobiotics. Xenobiotics can, on the one hand, disrupt molecular machinery involved in redox processes and, on the other hand, reduce the effectiveness of the antioxidant activity. Such dysregulation may lead to oxidative damage when combined with oxidative stress overpassing the cell capacity to detoxify ROS. In this work, a green fluorescent protein (GFP)-tagged nuclear factor erythroid 2-related factor 2 (NRF2)-regulated sulfiredoxin reporter (Srxn1-GFP) was used to measure the antioxidant response of HepG2 cells to a large series of drug and drug-like compounds (2230 compounds). These compounds were then classified as positive or negative depending on cellular response and distributed among different modeling groups to establish structure-activity relationship (SAR) models. A selection of models was used to prospectively predict oxidative stress induced by a new set of compounds subsequently experimentally tested to validate the model predictions. Altogether, this exercise exemplifies the different challenges of developing SAR models of a phenotypic cellular readout, model combination, chemical space selection, and results interpretation.
Collapse
Affiliation(s)
- Olivier
J. M. Béquignon
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Jose C. Gómez-Tamayo
- Research
Programme on Biomedical Informatics (GRIB), Department of Medicine
and Life Sciences, Hospital del Mar Medical Research Institute, Universitat Pompeu Fabra, Carrer del Dr. Aiguader 88, 08002 Barcelona, Spain
| | - Eelke B. Lenselink
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Steven Wink
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Steven Hiemstra
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Chi Chung Lam
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Domenico Gadaleta
- Laboratory
of Environmental Chemistry and Toxicology, Department of Environmental
Health Sciences, IRCCS—Istituto di
Ricerche Farmacologiche Mario Negri, Via la Masa 19, 20156 Milano, Italy
| | - Alessandra Roncaglioni
- Laboratory
of Environmental Chemistry and Toxicology, Department of Environmental
Health Sciences, IRCCS—Istituto di
Ricerche Farmacologiche Mario Negri, Via la Masa 19, 20156 Milano, Italy
| | - Ulf Norinder
- MTM
Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - Bob van de Water
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| | - Manuel Pastor
- Research
Programme on Biomedical Informatics (GRIB), Department of Medicine
and Life Sciences, Hospital del Mar Medical Research Institute, Universitat Pompeu Fabra, Carrer del Dr. Aiguader 88, 08002 Barcelona, Spain
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, Wassenaarseweg 76, 2333 AL Leiden, The Netherlands
| |
Collapse
|
18
|
Shi Y, Zhang X, Yang Y, Cai T, Peng C, Wu L, Zhou L, Han J, Ma M, Zhu W, Xu Z. D3CARP: a comprehensive platform with multiple-conformation based docking, ligand similarity search and deep learning approaches for target prediction and virtual screening. Comput Biol Med 2023; 164:107283. [PMID: 37536095 DOI: 10.1016/j.compbiomed.2023.107283] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 07/15/2023] [Accepted: 07/28/2023] [Indexed: 08/05/2023]
Abstract
Resource- and time-consuming biological experiments are unavoidable in traditional drug discovery, which have directly driven the evolution of various computational algorithms and tools for drug-target interaction (DTI) prediction. For improving the prediction reliability, a comprehensive platform is highly expected as some previously reported webservers are small in scale, single-method, or even out of service. In this study, we integrated the multiple-conformation based docking, 2D/3D ligand similarity search and deep learning approaches to construct a comprehensive webserver, namely D3CARP, for target prediction and virtual screening. Specifically, 9352 conformations with positive control of 1970 targets were used for molecular docking, and approximately 2 million target-ligand pairs were used for 2D/3D ligand similarity search and deep learning. Besides, the positive compounds were added as references, and related diseases of therapeutic targets were annotated for further disease-based DTI study. The accuracies of the molecular docking and deep learning approaches were 0.44 and 0.89, respectively. And the average accuracy of five ligand similarity searches was 0.94. The strengths of D3CARP encompass the support for multiple computational methods, ensemble docking, utilization of positive controls as references, cross-validation of predicted outcomes, diverse disease types, and broad applicability in drug discovery. The D3CARP is freely accessible at https://www.d3pharma.com/D3CARP/index.php.
Collapse
Affiliation(s)
- Yulong Shi
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinben Zhang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Yanqing Yang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingting Cai
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Cheng Peng
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Leyun Wu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liping Zhou
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Han
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Minfei Ma
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weiliang Zhu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhijian Xu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
19
|
Kanev GK, Zhang Y, Kooistra AJ, Bender A, Leurs R, Bailey D, Würdinger T, de Graaf C, de Esch IJP, Westerman BA. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput Biol 2023; 19:e1011301. [PMID: 37669273 PMCID: PMC10508635 DOI: 10.1371/journal.pcbi.1011301] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/19/2023] [Accepted: 06/25/2023] [Indexed: 09/07/2023] Open
Abstract
Many therapies in clinical trials are based on single drug-single target relationships. To further extend this concept to multi-target approaches using multi-targeted drugs, we developed a machine learning pipeline to unravel the target landscape of kinase inhibitors. This pipeline, which we call 3D-KINEssence, uses a new type of protein fingerprints (3D FP) based on the structure of kinases generated through a 3D convolutional neural network (3D-CNN). These 3D-CNN kinase fingerprints were matched to molecular Morgan fingerprints to predict the targets of each respective kinase inhibitor based on available bioactivity data. The performance of the pipeline was evaluated on two test sets: a sparse drug-target set where each drug is matched in most cases to a single target and also on a densely-covered drug-target set where each drug is matched to most if not all targets. This latter set is more challenging to train, given its non-exclusive character. Our model's root-mean-square error (RMSE) based on the two datasets was 0.68 and 0.8, respectively. These results indicate that 3D FP can predict the target landscape of kinase inhibitors at around 0.8 log units of bioactivity. Our strategy can be utilized in proteochemometric or chemogenomic workflows by consolidating the target landscape of kinase inhibitors.
Collapse
Affiliation(s)
- Georgi K. Kanev
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Yaran Zhang
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Albert J. Kooistra
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Rob Leurs
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - David Bailey
- The WINDOW consortium, www.window-consortium.org
- IOTA Pharmaceuticals Ltd, St Johns Innovation Centre, Cambridge, United Kingdom
| | - Thomas Würdinger
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| | - Chris de Graaf
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Bart A. Westerman
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| |
Collapse
|
20
|
Flanary VL, Fisher JL, Wilk EJ, Howton TC, Lasseigne BN. Computational Advancements in Cancer Combination Therapy Prediction. JCO Precis Oncol 2023; 7:e2300261. [PMID: 37824797 DOI: 10.1200/po.23.00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/20/2023] [Accepted: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Given the high attrition rate of de novo drug discovery and limited efficacy of single-agent therapies in cancer treatment, combination therapy prediction through in silico drug repurposing has risen as a time- and cost-effective alternative for identifying novel and potentially efficacious therapies for cancer. The purpose of this review is to provide an introduction to computational methods for cancer combination therapy prediction and to summarize recent studies that implement each of these methods. A systematic search of the PubMed database was performed, focusing on studies published within the past 10 years. Our search included reviews and articles of ongoing and retrospective studies. We prioritized articles with findings that suggest considerations for improving combination therapy prediction methods over providing a meta-analysis of all currently available cancer combination therapy prediction methods. Computational methods used for drug combination therapy prediction in cancer research include networks, regression-based machine learning, classifier machine learning models, and deep learning approaches. Each method class has its own advantages and disadvantages, so careful consideration is needed to determine the most suitable class when designing a combination therapy prediction method. Future directions to improve current combination therapy prediction technology include incorporation of disease pathobiology, drug characteristics, patient multiomics data, and drug-drug interactions to determine maximally efficacious and tolerable drug regimens for cancer. As computational methods improve in their capability to integrate patient, drug, and disease data, more comprehensive models can be developed to more accurately predict safe and efficacious combination drug therapies for cancer and other complex diseases.
Collapse
Affiliation(s)
- Victoria L Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Jennifer L Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| |
Collapse
|
21
|
Gorostiola González M, van den Broek RL, Braun TGM, Chatzopoulou M, Jespers W, IJzerman AP, Heitman LH, van Westen GJP. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. J Cheminform 2023; 15:74. [PMID: 37641107 PMCID: PMC10463931 DOI: 10.1186/s13321-023-00745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/10/2023] [Indexed: 08/31/2023] Open
Abstract
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Remco L van den Broek
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Thomas G M Braun
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Magdalini Chatzopoulou
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
22
|
Chen L, Fan Z, Chang J, Yang R, Hou H, Guo H, Zhang Y, Yang T, Zhou C, Sui Q, Chen Z, Zheng C, Hao X, Zhang K, Cui R, Zhang Z, Ma H, Ding Y, Zhang N, Lu X, Luo X, Jiang H, Zhang S, Zheng M. Sequence-based drug design as a concept in computational drug design. Nat Commun 2023; 14:4217. [PMID: 37452028 PMCID: PMC10349078 DOI: 10.1038/s41467-023-39856-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 06/27/2023] [Indexed: 07/18/2023] Open
Abstract
Drug development based on target proteins has been a successful approach in recent decades. However, the conventional structure-based drug design (SBDD) pipeline is a complex, human-engineered process with multiple independently optimized steps. Here, we propose a sequence-to-drug concept for computational drug design based on protein sequence information by end-to-end differentiable learning. We validate this concept in three stages. First, we design TransformerCPI2.0 as a core tool for the concept, which demonstrates generalization ability across proteins and compounds. Second, we interpret the binding knowledge that TransformerCPI2.0 learned. Finally, we use TransformerCPI2.0 to discover new hits for challenging drug targets, and identify new target for an existing drug based on an inverse application of the concept. Overall, this proof-of-concept study shows that the sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.
Collapse
Affiliation(s)
- Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zisheng Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Jie Chang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Hui Hou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hao Guo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yinghui Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chenmao Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Qibang Sui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhengyang Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xinyue Hao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Keke Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hudson Ma
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yiluan Ding
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Naixia Zhang
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaojie Lu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China.
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China.
| |
Collapse
|
23
|
Valls-Margarit J, Piñero J, Füzi B, Cerisier N, Taboureau O, Furlong LI. Assessing network-based methods in the context of system toxicology. Front Pharmacol 2023; 14:1225697. [PMID: 37502213 PMCID: PMC10369070 DOI: 10.3389/fphar.2023.1225697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 06/30/2023] [Indexed: 07/29/2023] Open
Abstract
Introduction: Network-based methods are promising approaches in systems toxicology because they can be used to predict the effects of drugs and chemicals on health, to elucidate the mode of action of compounds, and to identify biomarkers of toxicity. Over the years, the network biology community has developed a wide range of methods, and users are faced with the task of choosing the most appropriate method for their own application. Furthermore, the advantages and limitations of each method are difficult to determine without a proper standard and comparative evaluation of their performance. This study aims to evaluate different network-based methods that can be used to gain biological insight into the mechanisms of drug toxicity, using valproic acid (VPA)-induced liver steatosis as a benchmark. Methods: We provide a comprehensive analysis of the results produced by each method and highlight the fact that the experimental design (how the method is applied) is relevant in addition to the method specifications. We also contribute with a systematic methodology to analyse the results of the methods individually and in a comparative manner. Results: Our results show that the evaluated tools differ in their performance against the benchmark and in their ability to provide novel insights into the mechanism of adverse effects of the drug. We also suggest that aggregation of the results provided by different methods provides a more confident set of candidate genes and processes to further the knowledge of the drug's mechanism of action. Discussion: By providing a detailed and systematic analysis of the results of different network-based tools, we aim to assist users in making informed decisions about the most appropriate method for systems toxicology applications.
Collapse
Affiliation(s)
| | - Janet Piñero
- Medbioinformatics Solutions SL, Barcelona, Spain
| | - Barbara Füzi
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna, Austria
| | - Natacha Cerisier
- Université Paris Cité, CNRS, INSERM U1133, Unité de Biologie Fonctionnelle et Adaptative, Paris, France
| | - Olivier Taboureau
- Université Paris Cité, CNRS, INSERM U1133, Unité de Biologie Fonctionnelle et Adaptative, Paris, France
| | | |
Collapse
|
24
|
Lunghini F, Fava A, Pisapia V, Sacco F, Iaconis D, Beccari AR. ProfhEX: AI-based platform for small molecules liability profiling. J Cheminform 2023; 15:60. [PMID: 37296454 DOI: 10.1186/s13321-023-00728-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 05/28/2023] [Indexed: 06/12/2023] Open
Abstract
Off-target drug interactions are a major reason for candidate failure in the drug discovery process. Anticipating potential drug's adverse effects in the early stages is necessary to minimize health risks to patients, animal testing, and economical costs. With the constantly increasing size of virtual screening libraries, AI-driven methods can be exploited as first-tier screening tools to provide liability estimation for drug candidates. In this work we present ProfhEX, an AI-driven suite of 46 OECD-compliant machine learning models that can profile small molecules on 7 relevant liability groups: cardiovascular, central nervous system, gastrointestinal, endocrine, renal, pulmonary and immune system toxicities. Experimental affinity data was collected from public and commercial data sources. The entire chemical space comprised 289'202 activity data for a total of 210'116 unique compounds, spanning over 46 targets with dataset sizes ranging from 819 to 18896. Gradient boosting and random forest algorithms were initially employed and ensembled for the selection of a champion model. Models were validated according to the OECD principles, including robust internal (cross validation, bootstrap, y-scrambling) and external validation. Champion models achieved an average Pearson correlation coefficient of 0.84 (SD of 0.05), an R2 determination coefficient of 0.68 (SD = 0.1) and a root mean squared error of 0.69 (SD of 0.08). All liability groups showed good hit-detection power with an average enrichment factor at 5% of 13.1 (SD of 4.5) and AUC of 0.92 (SD of 0.05). Benchmarking against already existing tools demonstrated the predictive power of ProfhEX models for large-scale liability profiling. This platform will be further expanded with the inclusion of new targets and through complementary modelling approaches, such as structure and pharmacophore-based models. ProfhEX is freely accessible at the following address: https://profhex.exscalate.eu/ .
Collapse
Affiliation(s)
- Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | - Anna Fava
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | - Vincenzo Pisapia
- Professional Service Department, SAS Institute, Via Darwin 20/22, 20143, Milan, Italy
| | - Francesco Sacco
- Professional Service Department, SAS Institute, Via Darwin 20/22, 20143, Milan, Italy
| | - Daniela Iaconis
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Naples, Italy
| | | |
Collapse
|
25
|
Luukkonen S, Meijer E, Tricarico GA, Hofmans J, Stouten PFW, van Westen GJP, Lenselink EB. Large-Scale Modeling of Sparse Protein Kinase Activity Data. J Chem Inf Model 2023. [PMID: 37294674 DOI: 10.1021/acs.jcim.3c00132] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Protein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multitarget drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of protein kinase activity data in the public domain, which can be used in many different ways. Multitask machine learning models are expected to excel for these kinds of data sets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multitask modeling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a protein kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing protein kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random split-based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multitask deep learning models, on this very sparse data set, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.
Collapse
Affiliation(s)
- Sohvi Luukkonen
- Leiden Academic Centre of Drug Research, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | - Erik Meijer
- Leiden Academic Centre of Drug Research, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | | | - Johan Hofmans
- Galapagos NV, Generaal De Wittelaan L11 A3, 2800 Mechelen, Belgium
| | - Pieter F W Stouten
- Leiden Academic Centre of Drug Research, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
- Galapagos NV, Generaal De Wittelaan L11 A3, 2800 Mechelen, Belgium
- Stouten Pharma Consultancy BV, Kempenarestraat 47, 2860 Sint-Katelijne-Waver, Belgium
| | - Gerard J P van Westen
- Leiden Academic Centre of Drug Research, Leiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| | | |
Collapse
|
26
|
Srivathsa AV, Sadashivappa NM, Hegde AK, Radha S, Mahesh AR, Ammunje DN, Sen D, Theivendren P, Govindaraj S, Kunjiappan S, Pavadai P. A Review on Artificial Intelligence Approaches and Rational Approaches in Drug Discovery. Curr Pharm Des 2023; 29:1180-1192. [PMID: 37132148 DOI: 10.2174/1381612829666230428110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/06/2023] [Accepted: 02/27/2023] [Indexed: 05/04/2023]
Abstract
Artificial intelligence (AI) speeds up the drug development process and reduces its time, as well as the cost which is of enormous importance in outbreaks such as COVID-19. It uses a set of machine learning algorithms that collects the available data from resources, categorises, processes and develops novel learning methodologies. Virtual screening is a successful application of AI, which is used in screening huge drug-like databases and filtering to a small number of compounds. The brain's thinking of AI is its neural networking which uses techniques such as Convoluted Neural Network (CNN), Recursive Neural Network (RNN) or Generative Adversial Neural Network (GANN). The application ranges from small molecule drug discovery to the development of vaccines. In the present review article, we discussed various techniques of drug design, structure and ligand-based, pharmacokinetics and toxicity prediction using AI. The rapid phase of discovery is the need of the hour and AI is a targeted approach to achieve this.
Collapse
Affiliation(s)
- Anjana Vidya Srivathsa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Nandini Markuli Sadashivappa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Apeksha Krishnamurthy Hegde
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Srimathi Radha
- Department of Pharmaceutical Chemistry, SRM College of Pharmacy, Faculty of Medicine and Health Sciences, SRM Institute of Science and Technology, Chengalpattu District, Kattankulathur, Tamil Nadu, 603203, India
| | - Agasa Ramu Mahesh
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Damodar Nayak Ammunje
- Department of Pharmacology, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Debanjan Sen
- Department of Pharmaceutical Chemistry, BCDA College of Pharmacy & Technology, Hridaypur, Kolkata, 700127, West Bengal, India
| | - Panneerselvam Theivendren
- Department of Pharmaceutical Chemistry, Swamy Vivekanandha College of Pharmacy, Elayampalayam, Tiruchengode, 637205, India
| | - Saravanan Govindaraj
- Department of Pharmaceutical Chemistry, MNR College of Pharmacy, Fasalwadi, Sangareddy, 502 001, India
| | - Selvaraj Kunjiappan
- Department of Biotechnology, Kalasalingam Academy of Research and Education, Krishnankoil, 626126, India
| | - Parasuraman Pavadai
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| |
Collapse
|
27
|
Amiri Souri E, Chenoweth A, Karagiannis SN, Tsoka S. Drug repurposing and prediction of multiple interaction types via graph embedding. BMC Bioinformatics 2023; 24:202. [PMID: 37193964 DOI: 10.1186/s12859-023-05317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 04/30/2023] [Indexed: 05/18/2023] Open
Abstract
BACKGROUND Finding drugs that can interact with a specific target to induce a desired therapeutic outcome is key deliverable in drug discovery for targeted treatment. Therefore, both identifying new drug-target links, as well as delineating the type of drug interaction, are important in drug repurposing studies. RESULTS A computational drug repurposing approach was proposed to predict novel drug-target interactions (DTIs), as well as to predict the type of interaction induced. The methodology is based on mining a heterogeneous graph that integrates drug-drug and protein-protein similarity networks, together with verified drug-disease and protein-disease associations. In order to extract appropriate features, the three-layer heterogeneous graph was mapped to low dimensional vectors using node embedding principles. The DTI prediction problem was formulated as a multi-label, multi-class classification task, aiming to determine drug modes of action. DTIs were defined by concatenating pairs of drug and target vectors extracted from graph embedding, which were used as input to classification via gradient boosted trees, where a model is trained to predict the type of interaction. After validating the prediction ability of DT2Vec+, a comprehensive analysis of all unknown DTIs was conducted to predict the degree and type of interaction. Finally, the model was applied to propose potential approved drugs to target cancer-specific biomarkers. CONCLUSION DT2Vec+ showed promising results in predicting type of DTI, which was achieved via integrating and mapping triplet drug-target-disease association graphs into low-dimensional dense vectors. To our knowledge, this is the first approach that addresses prediction between drugs and targets across six interaction types.
Collapse
Affiliation(s)
- E Amiri Souri
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
| | - A Chenoweth
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S N Karagiannis
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
28
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
29
|
Bongers BJ, Sijben HJ, Hartog PBR, Tarnovskiy A, IJzerman AP, Heitman LH, van Westen GJP. Proteochemometric Modeling Identifies Chemically Diverse Norepinephrine Transporter Inhibitors. J Chem Inf Model 2023; 63:1745-1755. [PMID: 36926886 PMCID: PMC10052348 DOI: 10.1021/acs.jcim.2c01645] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Solute carriers (SLCs) are relatively underexplored compared to other prominent protein families such as kinases and G protein-coupled receptors. However, proteins from the SLC family play an essential role in various diseases. One such SLC is the high-affinity norepinephrine transporter (NET/SLC6A2). In contrast to most other SLCs, the NET has been relatively well studied. However, the chemical space of known ligands has a low chemical diversity, making it challenging to identify chemically novel ligands. Here, a computational screening pipeline was developed to find new NET inhibitors. The approach increases the chemical space to model for NETs using the chemical space of related proteins that were selected utilizing similarity networks. Prior proteochemometric models added data from related proteins, but here we use a data-driven approach to select the optimal proteins to add to the modeled data set. After optimizing the data set, the proteochemometric model was optimized using stepwise feature selection. The final model was created using a two-step approach combining several proteochemometric machine learning models through stacking. This model was applied to the extensive virtual compound database of Enamine, from which the top predicted 22,000 of the 600 million virtual compounds were clustered to end up with 46 chemically diverse candidates. A subselection of 32 candidates was synthesized and subsequently tested using an impedance-based assay. There were five hit compounds identified (hit rate 16%) with sub-micromolar inhibitory potencies toward NET, which are promising for follow-up experimental research. This study demonstrates a data-driven approach to diversify known chemical space to identify novel ligands and is to our knowledge the first to select this set based on the sequence similarity of related targets.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Huub J Sijben
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Peter B R Hartog
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | | | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands.,Oncode Institute, Jaarbeursplein 6, Utrecht 3521 AL, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| |
Collapse
|
30
|
Kour S, Biswas I, Sheoran S, Arora S, Sheela P, Duppala SK, Murthy DK, Pawar SC, Singh H, Kumar D, Prabhu D, Vuree S, Kumar R. Artificial intelligence and nanotechnology for cervical cancer treatment: Current status and future perspectives. J Drug Deliv Sci Technol 2023. [DOI: 10.1016/j.jddst.2023.104392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
31
|
Gui C, Li Y, Peng T. Development of predictive QSAR models for the substrates/inhibitors of OATP1B1 by deep neural networks. Toxicol Lett 2023; 376:20-25. [PMID: 36649904 DOI: 10.1016/j.toxlet.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 01/15/2023]
Abstract
The organic anion transporting polypeptide 1B1 (OATP1B1) is an important hepatic uptake transporter. Inhibition of its normal function could lead to drug-drug interactions. In silico prediction is an effective means to identify potential OATP1B1 inhibitors and quantitative structure-activity relationship (QSAR) modeling is extensively used. As the structures of OATP1B1 substrates/inhibitors are quite diverse, machine learning based methods should be a good option for their QSAR analysis. In the present study, deep neural networks (DNNs) were employed to develop QSAR models for the substrates/inhibitors of OATP1B1 with different molecular fingerprints. Our results showed that QSAR models based on 4-hidden layer DNNs and ECFP4/FCFP4 fingerprints had the best generalization performance. The correlation coefficients (R2) of test set for ECFP4 and FCFP4 models were 0.641 and 0.653, respectively. Model application domain (AD) was calculated with Euclidean distance-based method, and AD could improve the performance of ECFP4 model but has little effect on FCFP4 model. Finally, the prediction of additional 8 compounds that not included in the data set further demonstrated that our QSAR models had a good predictive ability (averaged prediction accuracy >92%). The developed QSAR models could be used to screen large data sets and discover novel inhibitors for OATP1B1.
Collapse
Affiliation(s)
- Chunshan Gui
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China.
| | - Ying Li
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| | - Taotao Peng
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| |
Collapse
|
32
|
Liu X, Ye K, van Vlijmen HWT, IJzerman AP, van Westen GJP. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J Cheminform 2023; 15:24. [PMID: 36803659 PMCID: PMC9940339 DOI: 10.1186/s13321-023-00694-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 02/06/2023] [Indexed: 02/22/2023] Open
Abstract
Rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like molecules. With the rapid growth of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. Here, a Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules a novel positional encoding for each atom and bond based on an adjacency matrix was proposed, extending the architecture of the Transformer. The graph Transformer model contains growing and connecting procedures for molecule generation starting from a given scaffold based on fragments. Moreover, the generator was trained under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, the method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results show that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.
Collapse
Affiliation(s)
- Xuhan Liu
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Kai Ye
- grid.43169.390000 0001 0599 1243School of Electrics and Information Engineering, Xi’an Jiaotong University, 28 XianningW Rd, Xi’an, China
| | - Herman W. T. van Vlijmen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands ,grid.419619.20000 0004 0623 0341Janssen Pharmaceutica NV, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Adriaan P. IJzerman
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
33
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
34
|
Thakur M, Bateman A, Brooksbank C, Freeberg M, Harrison M, Hartley M, Keane T, Kleywegt G, Leach A, Levchenko M, Morgan S, McDonagh E, Orchard S, Papatheodorou I, Velankar S, Vizcaino J, Witham R, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. Nucleic Acids Res 2023; 51:D9-D17. [PMID: 36477213 PMCID: PMC9825486 DOI: 10.1093/nar/gkac1098] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/31/2022] [Indexed: 12/13/2022] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Collapse
Affiliation(s)
| | - Alex Bateman
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Cath Brooksbank
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mallory Freeberg
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Melissa Harrison
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Thomas Keane
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gerard Kleywegt
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mariia Levchenko
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sarah Morgan
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- OpenTargets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sandra Orchard
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Barbara Zdrazil
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | |
Collapse
|
35
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
36
|
On the ability of machine learning methods to discover novel scaffolds. J Mol Model 2022; 29:22. [PMID: 36574054 DOI: 10.1007/s00894-022-05359-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 10/21/2022] [Indexed: 12/28/2022]
Abstract
The recent advances in the application of machine learning to drug discovery have made it a 'hot topic' for research, with hundreds of academic groups and companies integrating machine learning into their drug discovery projects. Nevertheless, there remains great uncertainty regarding the most appropriate ways to evaluate the relative performance of these powerful methods against more traditional cheminformatics approaches, and many pitfalls remain for the unwary. In 2020, researchers at MIT (Stokes et al., Cell 180(4), 688-702, 2020) reported the discovery of a new compound with antibacterial activity, halicin, through the use of a neural network machine learning method. A robust ability to identify new active chemotypes through computational methods would be very useful. In this study, we have used the Stokes et al. dataset to compare the performance of this method to two other approaches, Mapping of Activity Through Dichotomic Scores (MADS) by Todeschini et al. (J Chemom 32(4):e2994, 2018) and Random Matrix Theory (RMT) by Lee et al. (Proc Natl Acad Sci 116(9):3373-3378, 2019). Our results demonstrate that all three methods are capable of predicting halicin as an active antibacterial compound, but that this result is dependent on the dataset composition, pre-processing and the molecular fingerprint used. We have further assessed overall performance as determined by several performance metrics. We also investigated the scaffold hopping potential of the methods by modifying the dataset by removal of the β-lactam and fluoroquinolone chemotypes. MADS and RMT are able to identify actives in the test set that contained these substructures. This ability arises because of high scoring fragments of the withheld chemotypes that are in common with other active antibiotic classes. Interestingly, MADS is relatively better compared to the other two methods based on general predictive performance.
Collapse
|
37
|
Pan-cancer functional analysis of somatic mutations in G protein-coupled receptors. Sci Rep 2022; 12:21534. [PMID: 36513718 PMCID: PMC9747925 DOI: 10.1038/s41598-022-25323-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022] Open
Abstract
G Protein-coupled receptors (GPCRs) are the most frequently exploited drug target family, moreover they are often found mutated in cancer. Here we used a dataset of mutations found in patient samples derived from the Genomic Data Commons and compared it to the natural human variance as exemplified by data from the 1000 genomes project. We explored cancer-related mutation patterns in all GPCR classes combined and individually. While the location of the mutations across the protein domains did not differ significantly in the two datasets, a mutation enrichment in cancer patients was observed among class-specific conserved motifs in GPCRs such as the Class A "DRY" motif. A Two-Entropy Analysis confirmed the correlation between residue conservation and cancer-related mutation frequency. We subsequently created a ranking of high scoring GPCRs, using a multi-objective approach (Pareto Front Ranking). Our approach was confirmed by re-discovery of established cancer targets such as the LPA and mGlu receptor families, but also discovered novel GPCRs which had not been linked to cancer before such as the P2Y Receptor 10 (P2RY10). Overall, this study presents a list of GPCRs that are amenable to experimental follow up to elucidate their role in cancer.
Collapse
|
38
|
Liao J, Chen H, Wei L, Wei L. GSAML-DTA: An interpretable drug-target binding affinity prediction model based on graph neural networks with self-attention mechanism and mutual information. Comput Biol Med 2022; 150:106145. [PMID: 37859276 DOI: 10.1016/j.compbiomed.2022.106145] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/23/2022] [Accepted: 09/24/2022] [Indexed: 11/03/2022]
Abstract
Identifying drug-target affinity (DTA) has great practical importance in the process of designing efficacious drugs for known diseases. Recently, numerous deep learning-based computational methods have been developed to predict drug-target affinity and achieved impressive performance. However, most of them construct the molecule (drug or target) encoder without considering the weights of features of each node (atom or residue). Besides, they generally combine drug and target representations directly, which may contain irrelevant-task information. In this study, we develop GSAML-DTA, an interpretable deep learning framework for DTA prediction. GSAML-DTA integrates a self-attention mechanism and graph neural networks (GNNs) to build representations of drugs and target proteins from the structural information. In addition, mutual information is introduced to filter out redundant information and retain relevant information in the combined representations of drugs and targets. Extensive experimental results demonstrate that GSAML-DTA outperforms state-of-the-art methods for DTA prediction on two benchmark datasets. Furthermore, GSAML-DTA has the interpretation ability to analyze binding atoms and residues, which may be conducive to chemical biology studies from data. Overall, GSAML-DTA can serve as a powerful and interpretable tool suitable for DTA modelling.
Collapse
Affiliation(s)
- Jiaqi Liao
- School of Software, Shandong University, Jinan, China
| | - Haoyang Chen
- School of Software, Shandong University, Jinan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.
| |
Collapse
|
39
|
Liu Q, van der Stel W, van der Noord VE, Leegwater H, Coban B, Elbertse K, Pruijs JTM, Béquignon OJM, van Westen G, Dévédec SEL, Danen EHJ. Hypoxia Triggers TAZ Phosphorylation in Basal A Triple Negative Breast Cancer Cells. Int J Mol Sci 2022; 23:ijms231710119. [PMID: 36077517 PMCID: PMC9456181 DOI: 10.3390/ijms231710119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 08/31/2022] [Accepted: 09/02/2022] [Indexed: 12/02/2022] Open
Abstract
Hypoxia and HIF signaling drive cancer progression and therapy resistance and have been demonstrated in breast cancer. To what extent breast cancer subtypes differ in their response to hypoxia has not been resolved. Here, we show that hypoxia similarly triggers HIF1 stabilization in luminal and basal A triple negative breast cancer cells and we use high throughput targeted RNA sequencing to analyze its effects on gene expression in these subtypes. We focus on regulation of YAP/TAZ/TEAD targets and find overlapping as well as distinct target genes being modulated in luminal and basal A cells under hypoxia. We reveal a HIF1 mediated, basal A specific response to hypoxia by which TAZ, but not YAP, is phosphorylated at Ser89. While total YAP/TAZ localization is not affected by hypoxia, hypoxia drives a shift of [p-TAZ(Ser89)/p-YAP(Ser127)] from the nucleus to the cytoplasm in basal A but not luminal breast cancer cells. Cell fractionation and YAP knock-out experiments confirm cytoplasmic sequestration of TAZ(Ser89) in hypoxic basal A cells. Pharmacological and genetic interference experiments identify c-Src and CDK3 as kinases involved in such phosphorylation of TAZ at Ser89 in hypoxic basal A cells. Hypoxia attenuates growth of basal A cells and the effect of verteporfin, a disruptor of YAP/TAZ-TEAD–mediated transcription, is diminished under those conditions, while expression of a TAZ-S89A mutant does not confer basal A cells with a growth advantage under hypoxic conditions, indicating that other hypoxia regulated pathways suppressing cell growth are dominant.
Collapse
|
40
|
Jeong J, Choi J. Artificial Intelligence-Based Toxicity Prediction of Environmental Chemicals: Future Directions for Chemical Management Applications. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7532-7543. [PMID: 35666838 DOI: 10.1021/acs.est.1c07413] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, research on the development of artificial intelligence (AI)-based computational toxicology models that predict toxicity without the use of animal testing has emerged because of the rapid development of computer technology. Various computational toxicology techniques that predict toxicity based on the structure of chemical substances are gaining attention, including the quantitative structure-activity relationship. To understand the recent development of these models, we analyzed the databases, molecular descriptors, fingerprints, and algorithms considered in recent studies. Based on a selection of 96 papers published since 2014, we found that AI models have been developed to predict approximately 30 different toxicity end points using more than 20 toxicity databases. For model development, molecular access system and extended-connectivity fingerprints are the most commonly used molecular descriptors. The most used algorithm among the machine learning techniques is the random forest, while the most used algorithm among the deep learning techniques is a deep neural network. The use of AI technology in the development of toxicity prediction models is a new concept that will aid in achieving a scientific accord and meet regulatory applications. The comprehensive overview provided in this study will provide a useful guide for the further development and application of toxicity prediction models.
Collapse
Affiliation(s)
- Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| |
Collapse
|
41
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
42
|
Li F, Zhang Z, Guan J, Zhou S. Effective drug-target interaction prediction with mutual interaction neural network. Bioinformatics 2022; 38:3582-3589. [PMID: 35652721 PMCID: PMC9272808 DOI: 10.1093/bioinformatics/btac377] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 11/30/2022] Open
Abstract
Motivation Accurately predicting drug–target interaction (DTI) is a crucial step to drug discovery. Recently, deep learning techniques have been widely used for DTI prediction and achieved significant performance improvement. One challenge in building deep learning models for DTI prediction is how to appropriately represent drugs and targets. Target distance map and molecular graph are low dimensional and informative representations, which however have not been jointly used in DTI prediction. Another challenge is how to effectively model the mutual impact between drugs and targets. Though attention mechanism has been used to capture the one-way impact of targets on drugs or vice versa, the mutual impact between drugs and targets has not yet been explored, which is very important in predicting their interactions. Results Therefore, in this article we propose MINN-DTI, a new model for DTI prediction. MINN-DTI combines an interacting-transformer module (called Interformer) with an improved Communicative Message Passing Neural Network (CMPNN) (called Inter-CMPNN) to better capture the two-way impact between drugs and targets, which are represented by molecular graph and distance map, respectively. The proposed method obtains better performance than the state-of-the-art methods on three benchmark datasets: DUD-E, human and BindingDB. MINN-DTI also provides good interpretability by assigning larger weights to the amino acids and atoms that contribute more to the interactions between drugs and targets. Availability and implementation The data and code of this study are available at https://github.com/admislf/MINN-DTI.
Collapse
Affiliation(s)
- Fei Li
- School of Computer Science, Fudan University, Shanghai 200438, China
| | - Ziqiao Zhang
- School of Computer Science, Fudan University, Shanghai 200438, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- School of Computer Science, Fudan University, Shanghai 200438, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200438, China
| |
Collapse
|
43
|
Murali V, Muralidhar YP, Königs C, Nair M, Madhu S, Nedungadi P, Srinivasa G, Athri P. Predicting clinical trial outcomes using drug bioactivities through graph database integration and machine learning. Chem Biol Drug Des 2022; 100:169-184. [PMID: 35587730 DOI: 10.1111/cbdd.14092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/24/2022] [Accepted: 05/15/2022] [Indexed: 11/29/2022]
Abstract
The ability to estimate the probability of a drug to receive approval in clinical trials provides natural advantages to optimizing pharmaceutical research workflows. Success rates of clinical trials have deep implications for costs, duration of development, and under pressure due to stringent regulatory approval processes. We propose a machine learning approach that can predict the outcome of the trial with reliable accuracies, using biological activities, physicochemical properties of the compounds, target-related features, and NLP-based compound representation. In the above list, biological activities have never been used as an independent variable towards the prediction of clinical trial outcomes. We have extracted the drug-disease pair from clinical trials and mapped target(s) to that pair using multiple data sources. Empirical results demonstrate that ensemble learning outperforms independently trained, small-data ML models. We report results and inferences derived from a Random forest classifier with an average accuracy of 93%, and an F1 score of 0.96 for the "Pass" class. "Pass" refers to one of the two classes (Pass/Fail) of all clinical trials, and the model performed well in predicting the "Pass" category. Through the analysis of feature contributions to predictive capability, we have demonstrated that bioactivity plays a statistically significant role in predicting clinical trial outcome. A significant effort has gone into the production of the dataset that, for the first time, integrates clinical trial information with protein targets. Cleaned, organized, integrated data and code to map these entities, created as a part of this work, are available open-source. This reproducibility and the freely available code ensure that researchers with access to deep curated and proprietary clinical trial databases (we only use open-source data in this study) can further expand the scope of the results.
Collapse
Affiliation(s)
- Vidhya Murali
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, India
| | - Y Pradyumna Muralidhar
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, India
| | - Cassandra Königs
- Bioinformatics and Medical Informatics, Bielefeld University, Northrhine-Westphalia, Germany
| | - Meera Nair
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | - Sethulekshmi Madhu
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | - Prema Nedungadi
- Department of Computer Science and Engineering, Amrita School of Engineering, Kerala, India
| | - Gowri Srinivasa
- PES Center for Pattern Recognition, Department of Computer Science and Engineering, PES University, Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, India
| |
Collapse
|
44
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
45
|
Kalakoti Y, Yadav S, Sundar D. Deep Neural Network-Assisted Drug Recommendation Systems for Identifying Potential Drug-Target Interactions. ACS OMEGA 2022; 7:12138-12146. [PMID: 35449922 PMCID: PMC9016825 DOI: 10.1021/acsomega.2c00424] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/18/2022] [Indexed: 06/14/2023]
Abstract
In silico methods to identify novel drug-target interactions (DTIs) have gained significant importance over conventional techniques owing to their labor-intensive and low-throughput nature. Here, we present a machine learning-based multiclass classification workflow that segregates interactions between active, inactive, and intermediate drug-target pairs. Drug molecules, protein sequences, and molecular descriptors were transformed into machine-interpretable embeddings to extract critical features from standard datasets. Tools such as CHEMBL web resource, iFeature, and an in-house developed deep neural network-assisted drug recommendation (dNNDR)-featx were employed for data retrieval and processing. The models were trained with large-scale DTI datasets, which reported an improvement in performance over baseline methods. External validation results showed that models based on att-biLSTM and gCNN could help predict novel DTIs. When tested with a completely different dataset, the proposed models significantly outperformed competing methods. The validity of novel interactions predicted by dNNDR was backed by experimental and computational evidence in the literature. The proposed methodology could elucidate critical features that govern the relationship between a drug and its target.
Collapse
Affiliation(s)
- Yogesh Kalakoti
- DAILAB,
Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi 110 016, India
| | - Shashank Yadav
- DAILAB,
Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi 110 016, India
| | - Durai Sundar
- DAILAB,
Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi 110 016, India
- School
of Artificial Intelligence, Indian Institute
of Technology (IIT) Delhi, New Delhi 110 016, India
| |
Collapse
|
46
|
Amiri Souri E, Laddach R, Karagiannis SN, Papageorgiou LG, Tsoka S. Novel drug-target interactions via link prediction and network embedding. BMC Bioinformatics 2022; 23:121. [PMID: 35379165 PMCID: PMC8978405 DOI: 10.1186/s12859-022-04650-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/17/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As many interactions between the chemical and genomic space remain undiscovered, computational methods able to identify potential drug-target interactions (DTIs) are employed to accelerate drug discovery and reduce the required cost. Predicting new DTIs can leverage drug repurposing by identifying new targets for approved drugs. However, developing an accurate computational framework that can efficiently incorporate chemical and genomic spaces remains extremely demanding. A key issue is that most DTI predictions suffer from the lack of experimentally validated negative interactions or limited availability of target 3D structures. RESULTS We report DT2Vec, a pipeline for DTI prediction based on graph embedding and gradient boosted tree classification. It maps drug-drug and protein-protein similarity networks to low-dimensional features and the DTI prediction is formulated as binary classification based on a strategy of concatenating the drug and target embedding vectors as input features. DT2Vec was compared with three top-performing graph similarity-based algorithms on a standard benchmark dataset and achieved competitive results. In order to explore credible novel DTIs, the model was applied to data from the ChEMBL repository that contain experimentally validated positive and negative interactions which yield a strong predictive model. Then, the developed model was applied to all possible unknown DTIs to predict new interactions. The applicability of DT2Vec as an effective method for drug repurposing is discussed through case studies and evaluation of some novel DTI predictions is undertaken using molecular docking. CONCLUSIONS The proposed method was able to integrate and map chemical and genomic space into low-dimensional dense vectors and showed promising results in predicting novel DTIs.
Collapse
Affiliation(s)
- E Amiri Souri
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
| | - R Laddach
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, King's College London, Guy's Hospital, London, SE1 9RT, UK
| | - S N Karagiannis
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, King's College London, Guy's Hospital, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, King's College London, Guy's Cancer Centre, London, SE1 9RT, UK
| | - L G Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London, WC1E 7JE, UK
| | - S Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
47
|
Zhong F, Wu X, Yang R, Li X, Wang D, Fu Z, Liu X, Wan X, Yang T, Fan Z, Zhang Y, Luo X, Chen K, Zhang S, Jiang H, Zheng M. Drug target inference by mining transcriptional data using a novel graph convolutional network framework. Protein Cell 2022; 13:281-301. [PMID: 34677780 PMCID: PMC8532448 DOI: 10.1007/s13238-021-00885-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 09/08/2021] [Indexed: 12/14/2022] Open
Abstract
A fundamental challenge that arises in biomedicine is the need to characterize compounds in a relevant cellular context in order to reveal potential on-target or off-target effects. Recently, the fast accumulation of gene transcriptional profiling data provides us an unprecedented opportunity to explore the protein targets of chemical compounds from the perspective of cell transcriptomics and RNA biology. Here, we propose a novel Siamese spectral-based graph convolutional network (SSGCN) model for inferring the protein targets of chemical compounds from gene transcriptional profiles. Although the gene signature of a compound perturbation only provides indirect clues of the interacting targets, and the biological networks under different experiment conditions further complicate the situation, the SSGCN model was successfully trained to learn from known compound-target pairs by uncovering the hidden correlations between compound perturbation profiles and gene knockdown profiles. On a benchmark set and a large time-split validation dataset, the model achieved higher target inference accuracy as compared to previous methods such as Connectivity Map. Further experimental validations of prediction results highlight the practical usefulness of SSGCN in either inferring the interacting targets of compound, or reversely, in finding novel inhibitors of a given target of interest.
Collapse
Affiliation(s)
- Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai, 200031, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Xiaohong Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai, 200031, China
| | - XiaoZhe Wan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zisheng Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Yinghui Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai, 200031, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, Nanjing, 210023, China.
| |
Collapse
|
48
|
Prediction of the Neurotoxic Potential of Chemicals Based on Modelling of Molecular Initiating Events Upstream of the Adverse Outcome Pathways of (Developmental) Neurotoxicity. Int J Mol Sci 2022; 23:ijms23063053. [PMID: 35328472 PMCID: PMC8954925 DOI: 10.3390/ijms23063053] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 12/23/2022] Open
Abstract
Developmental and adult/ageing neurotoxicity is an area needing alternative methods for chemical risk assessment. The formulation of a strategy to screen large numbers of chemicals is highly relevant due to potential exposure to compounds that may have long-term adverse health consequences on the nervous system, leading to neurodegeneration. Adverse Outcome Pathways (AOPs) provide information on relevant molecular initiating events (MIEs) and key events (KEs) that could inform the development of computational alternatives for these complex effects. We propose a screening method integrating multiple Quantitative Structure–Activity Relationship (QSAR) models. The MIEs of existing AOP networks of developmental and adult/ageing neurotoxicity were modelled to predict neurotoxicity. Random Forests were used to model each MIE. Predictions returned by single models were integrated and evaluated for their capability to predict neurotoxicity. Specifically, MIE predictions were used within various types of classifiers and compared with other reference standards (chemical descriptors and structural fingerprints) to benchmark their predictive capability. Overall, classifiers based on MIE predictions returned predictive performances comparable to those based on chemical descriptors and structural fingerprints. The integrated computational approach described here will be beneficial for large-scale screening and prioritisation of chemicals as a function of their potential to cause long-term neurotoxic effects.
Collapse
|
49
|
Broccatelli F, Trager R, Reutlinger M, Karypis G, Li M. Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces. Mol Inform 2022; 41:e2100321. [DOI: 10.1002/minf.202100321] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 02/13/2022] [Indexed: 11/09/2022]
|
50
|
Lee I, Nam H. Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform 2022; 14:5. [PMID: 35135622 PMCID: PMC8822694 DOI: 10.1186/s13321-022-00584-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open
Abstract
Identifying drug-target interactions (DTIs) is important for drug discovery. However, searching all drug-target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model's performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein-ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence-ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub ( https://github.com/GIST-CSBL/HoTS ).
Collapse
Affiliation(s)
- Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| |
Collapse
|