1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
3
|
Khan MF, Ali A, Rehman HM, Noor Khan S, Hammad HM, Waseem M, Wu Y, Clark TG, Jabbar A. Exploring optimal drug targets through subtractive proteomics analysis and pangenomic insights for tailored drug design in tuberculosis. Sci Rep 2024; 14:10904. [PMID: 38740859 DOI: 10.1038/s41598-024-61752-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/09/2024] [Indexed: 05/16/2024] Open
Abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis, ranks among the top causes of global human mortality, as reported by the World Health Organization's 2022 TB report. The prevalence of M. tuberculosis strains that are multiple and extensive-drug resistant represents a significant barrier to TB eradication. Fortunately, having many completely sequenced M. tuberculosis genomes available has made it possible to investigate the species pangenome, conduct a pan-phylogenetic investigation, and find potential new drug targets. The 442 complete genome dataset was used to estimate the pangenome of M. tuberculosis. This study involved phylogenomic classification and in-depth analyses. Sequential filters were applied to the conserved core genome containing 2754 proteins. These filters assessed non-human homology, virulence, essentiality, physiochemical properties, and pathway analysis. Through these intensive filtering approaches, promising broad-spectrum therapeutic targets were identified. These targets were docked with FDA-approved compounds readily available on the ZINC database. Selected highly ranked ligands with inhibitory potential include dihydroergotamine and abiraterone acetate. The effectiveness of the ligands has been supported by molecular dynamics simulation of the ligand-protein complexes, instilling optimism that the identified lead compounds may serve as a robust basis for the development of safe and efficient drugs for TB treatment, subject to further lead optimization and subsequent experimental validation.
Collapse
Affiliation(s)
- Muhammad Fayaz Khan
- Department of Medical Laboratory Technology, The University of Haripur, Haripur, KP, Pakistan
| | - Amjad Ali
- Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology, Islamabad, Pakistan
| | - Hafiz Muzzammel Rehman
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore, Punjab, Pakistan
| | - Sadiq Noor Khan
- Department of Medical Laboratory Technology, The University of Haripur, Haripur, KP, Pakistan
| | - Hafiz Muhammad Hammad
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore, Punjab, Pakistan
| | - Maaz Waseem
- Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology, Islamabad, Pakistan
| | - Yurong Wu
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
| | - Taane G Clark
- London School of Hygiene and Tropical Medicine, Keppel Street, London, UK.
| | - Abdul Jabbar
- Department of Medical Laboratory Technology, The University of Haripur, Haripur, KP, Pakistan.
| |
Collapse
|
4
|
Yi C, Taylor ML, Ziebarth J, Wang Y. Predictive Models and Impact of Interfacial Contacts and Amino Acids on Protein-Protein Binding Affinity. ACS OMEGA 2024; 9:3454-3468. [PMID: 38284090 PMCID: PMC10809705 DOI: 10.1021/acsomega.3c06996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/11/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024]
Abstract
Protein-protein interactions (PPIs) play a central role in nearly all cellular processes. The strength of the binding in a PPI is characterized by the binding affinity (BA) and is a key factor in controlling protein-protein complex formation and defining the structure-function relationship. Despite advancements in understanding protein-protein binding, much remains unknown about the interfacial region and its association with BA. New models are needed to predict BA with improved accuracy for therapeutic design. Here, we use machine learning approaches to examine how well different types of interfacial contacts can be used to predict experimentally determined BA and to reveal the impact of the specific amino acids at the binding interface on BA. We create a series of multivariate linear regression models incorporating different contact features at both residue and atomic levels and examine how different methods of identifying and characterizing these properties impact the performance of these models. Particularly, we introduce a new and simple approach to predict BA based on the quantities of specific amino acids at the protein-protein interface. We found that the numbers of specific amino acids at the protein-protein interface were correlated with BA. We show that the interfacial numbers of amino acids can be used to produce models with consistently good performance across different data sets, indicating the importance of the identities of interfacial amino acids in underlying BA. When trained on a diverse set of complexes from two benchmark data sets, the best performing BA model was generated with an explicit linear equation involving six amino acids. Tyrosine, in particular, was identified as the key amino acid in controlling BA, as it had the strongest correlation with BA and was consistently identified as the most important amino acid in feature importance studies. Glycine and serine were identified as the next two most important amino acids in predicting BA. The results from this study further our understanding of PPIs and can be used to make improved predictions of BA, giving them implications for drug design and screening in the pharmaceutical industry.
Collapse
Affiliation(s)
- Carey
Huang Yi
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Mitchell Lee Taylor
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Jesse Ziebarth
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Yongmei Wang
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| |
Collapse
|
5
|
Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024; 30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]
Abstract
INTRODUCTION Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Qianle Liang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| |
Collapse
|
6
|
Guendouzi A, Belkhiri L, Guendouzi A, Derouiche TMT, Djekoun A. A combined in silico approaches of 2D-QSAR, molecular docking, molecular dynamics and ADMET prediction of anti-cancer inhibitor activity for actinonin derivatives. J Biomol Struct Dyn 2024; 42:119-133. [PMID: 36995063 DOI: 10.1080/07391102.2023.2192801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 03/10/2023] [Indexed: 03/31/2023]
Abstract
Inhibition of human mitochondrial peptide deformylase (HsPDF) plays a major role in reducing growth, proliferation, and cellular cancer survival. In this work, a series of 32 actinonin derivatives for HsPDF (PDB: 3G5K) inhibitor's anticancer activity was computationally analyzed for the first time, using an in silico study considering 2D-QSAR modeling, and molecular docking studies, and validated by molecular dynamics and ADMET properties. The results of multilinear regression (MLR) and artificial neural networks (ANN) statistical analysis reveal a good correlation between pIC50 activity and the seven (7) descriptors. The developed models were highly significant with cross-validation, the Y-randomization test and their applicability range. In addition, all considered data sets show that the AC30 compound, exhibits the best binding affinity (docking score = -212.074 kcal/mol and H-bonding energy = -15.879 kcal/mol). Furthermore, molecular dynamics simulations were performed at 500 ns, confirming the stability of the studied complexes under physiological conditions and validating the molecular docking results. Five selected actinonin derivatives (AC1, AC8, AC15, AC18 and AC30), exhibiting best docking score, were rationalized as potential leads for HsPDF inhibition, in well agreement with experimental outcomes. Furthermore, based on the in silico study, new six molecules (AC32, AC33, AC34, AC35, AC36 and AC37) were suggested as HsPDF inhibition candidates, which would be combined with in-vitro and in-vivo studies to perspective validation of their anticancer activity. Indeed, the ADMET predictions indicate that these six new ligands have demonstrated a fairly good drug-likeness profile.
Collapse
Affiliation(s)
| | - Lotfi Belkhiri
- Centre de Recherche en Sciences Pharmaceutiques CRSP, Constantine, Algeria
- Laboratoire de Physique Mathématique et Subatomique LPMS, Département de Chimie, Université des Frères Mentouri, Constantine, Algeria
| | - Abdelkrim Guendouzi
- Laboratoire de Chimie, Synthèse, Propriétés et Applications LCSPA, Département de Chimie, Faculté des Sciences, Université Dr Moulay Tahar de Saida, Saïda, Algeria
| | - Tahar Mohamed Taha Derouiche
- Centre de Recherche en Sciences Pharmaceutiques CRSP, Constantine, Algeria
- Laboratoire Innovation Développement des Actifs Pharmaceutiques LIDAP, Faculté de Médecine, Département Pharmacie, Université Salah Boubnider Constantine 3, El Khroub, Algeria
| | - Abdelhamid Djekoun
- Centre de Recherche en Sciences Pharmaceutiques CRSP, Constantine, Algeria
| |
Collapse
|
7
|
Nolte TM. 300-fold higher neuro- and immunotoxicity from low-redox transformation of carbamazepine. Toxicol Rep 2023; 11:319-329. [PMID: 37927955 PMCID: PMC10622881 DOI: 10.1016/j.toxrep.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Current challenges in (eco)toxicology are in understanding the transformation of (reactive) substances, and how transformation affects toxic modes of action. Empirical assessment of transformation products of, practically an infinite number of substances, via experimentation, is impossible. Predicting transformation products for (benchmarking) compounds from conditions, facilitates risk analyses. This study applied calculus to predict transformation products of an important environmental and medicinal/toxicological marker, carbamazepine. As radicals are ubiquitous in humans and the environment, we looked into radical-mediated transformations of carbamazepine as a benchmark. We calculated proportions of their speciation states as function of redox conditions, which we took as pH and O2 concentration, describing transformation via covalent and ionic interactions. Formation of ring-contracted products with neuro-immunological activity is thermodynamically favored under anaerobic conditions and at low pH. Experimentally observed product distributions and toxicities reflect that pattern. Our predictive method may support toxicity predictions for other substances and conditions 'similar' to the current case study via interpolation. This paves the way for a more coherent, effective and easier risk assessment of transformation products.
Collapse
Affiliation(s)
- Tom M. Nolte
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud, University Nijmegen, 6500 GL Nijmegen, the Netherlands
- Eidgenössische Technische Hochschule (ETH) Zurich, Laboratory of Inorganic Chemistry, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland
| |
Collapse
|
8
|
Wang Y, Xia Y, Yan J, Yuan Y, Shen HB, Pan X. ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions. Nat Commun 2023; 14:7861. [PMID: 38030641 PMCID: PMC10687269 DOI: 10.1038/s41467-023-43597-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023] Open
Abstract
Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.
Collapse
Affiliation(s)
- Yuxuan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Junchi Yan
- Department of Computer Science and Engineering, and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
9
|
Barsbey M, ÖZçelİk R, Bağ A, Atil B, ÖZgür A, Ozkirimli E. A Computational Software for Training Robust Drug-Target Affinity Prediction Models: pydebiaseddta. J Comput Biol 2023; 30:1240-1245. [PMID: 37988394 DOI: 10.1089/cmb.2023.0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023] Open
Abstract
Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.
Collapse
Affiliation(s)
- Melİh Barsbey
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Riza ÖZçelİk
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Alperen Bağ
- Technical University of Munich, Munich, Germany
| | - Berk Atil
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Arzucan ÖZgür
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Elif Ozkirimli
- Roche Informatics, F. Hoffmann-La Roche AG, Basel, Switzerland
| |
Collapse
|
10
|
Sufyan M, Shokat Z, Ashfaq UA. Artificial intelligence in cancer diagnosis and therapy: Current status and future perspective. Comput Biol Med 2023; 165:107356. [PMID: 37688994 DOI: 10.1016/j.compbiomed.2023.107356] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/21/2023] [Accepted: 08/12/2023] [Indexed: 09/11/2023]
Abstract
Artificial intelligence (AI) in healthcare plays a pivotal role in combating many fatal diseases, such as skin, breast, and lung cancer. AI is an advanced form of technology that uses mathematical-based algorithmic principles similar to those of the human mind for cognizing complex challenges of the healthcare unit. Cancer is a lethal disease with many etiologies, including numerous genetic and epigenetic mutations. Cancer being a multifactorial disease is difficult to be diagnosed at an early stage. Therefore, genetic variations and other leading factors could be identified in due time through AI and machine learning (ML). AI is the synergetic approach for mining the drug targets, their mechanism of action, and drug-organism interaction from massive raw data. This synergetic approach is also facing several challenges in data mining but computational algorithms from different scientific communities for multi-target drug discovery are highly helpful to overcome the bottlenecks in AI for drug-target discovery. AI and ML could be the epicenter in the medical world for the diagnosis, treatment, and evaluation of almost any disease in the near future. In this comprehensive review, we explore the immense potential of AI and ML when integrated with the biological sciences, specifically in the context of cancer research. Our goal is to illuminate the many ways in which AI and ML are being applied to the study of cancer, from diagnosis to individualized treatment. We highlight the prospective role of AI in supporting oncologists and other medical professionals in making informed decisions and improving patient outcomes by examining the intersection of AI and cancer control. Although AI-based medical therapies show great potential, many challenges must be overcome before they can be implemented in clinical practice. We critically assess the current hurdles and provide insights into the future directions of AI-driven approaches, aiming to pave the way for enhanced cancer interventions and improved patient care.
Collapse
Affiliation(s)
- Muhammad Sufyan
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Pakistan.
| | - Zeeshan Shokat
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Pakistan.
| | - Usman Ali Ashfaq
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Pakistan.
| |
Collapse
|
11
|
Suviriyapaisal N, Wichadakul D. iEdgeDTA: integrated edge information and 1D graph convolutional neural networks for binding affinity prediction. RSC Adv 2023; 13:25218-25228. [PMID: 37636509 PMCID: PMC10448119 DOI: 10.1039/d3ra03796g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 08/14/2023] [Indexed: 08/29/2023] Open
Abstract
Artificial intelligence has become more prevalent in broad fields, including drug discovery, in which the process is costly and time-consuming when conducted through wet experiments. As a result, drug repurposing, which tries to utilize approved and low-risk drugs for a new purpose, becomes more attractive. However, screening candidates from many drugs for specific protein targets is still expensive and tedious. This study aims to leverage computational resources to aid drug discovery by utilizing drug-protein interaction data and estimating their interaction strength, so-called binding affinity. Our estimation approach addresses multiple challenges encountered in the field. First, we employed a graph-based deep learning technique to overcome the limitations of drug compounds represented in string format by incorporating background knowledge of node and edge information as separate multi-dimensional features. Second, we tackled the complexities associated with extracting the representation and structure of proteins by utilizing a pre-trained model for feature extraction. Also, we employed graph operations over the 1D representation of a protein sequence to overcome the fixed-length problem typically encountered in language model tasks. In addition, we conducted a comparative analysis with a baseline model that creates a protein graph from a contact map prediction model, giving valuable insights into the performance and effectiveness of our proposed method. We evaluated the performance of our model using the same benchmark datasets with a variety of matrices as other previous work, and the results show that our model achieved the best prediction results while requiring no contact map information compared to other graph-based methods.
Collapse
Affiliation(s)
- Natchanon Suviriyapaisal
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University Bangkok 10330 Thailand
| | - Duangdao Wichadakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University Bangkok 10330 Thailand
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University Bangkok 10330 Thailand
| |
Collapse
|
12
|
Zhang S, Jin Y, Liu T, Wang Q, Zhang Z, Zhao S, Shan B. SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction. ACS OMEGA 2023; 8:22496-22507. [PMID: 37396234 PMCID: PMC10308598 DOI: 10.1021/acsomega.3c00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/01/2023] [Indexed: 07/04/2023]
Abstract
Efficient and effective drug-target binding affinity (DTBA) prediction is a challenging task due to the limited computational resources in practical applications and is a crucial basis for drug screening. Inspired by the good representation ability of graph neural networks (GNNs), we propose a simple-structured GNN model named SS-GNN to accurately predict DTBA. By constructing a single undirected graph based on a distance threshold to represent protein-ligand interactions, the scale of the graph data is greatly reduced. Moreover, ignoring covalent bonds in the protein further reduces the computational cost of the model. The graph neural network-multilayer perceptron (GNN-MLP) module takes the latent feature extraction of atoms and edges in the graph as two mutually independent processes. We also develop an edge-based atom-pair feature aggregation method to represent complex interactions and a graph pooling-based method to predict the binding affinity of the complex. We achieve state-of-the-art prediction performance using a simple model (with only 0.6 M parameters) without introducing complicated geometric feature descriptions. SS-GNN achieves Pearson's Rp = 0.853 on the PDBbind v2016 core set, outperforming state-of-the-art GNN-based methods by 5.2%. Moreover, the simplified model structure and concise data processing procedure improve the prediction efficiency of the model. For a typical protein-ligand complex, affinity prediction takes only 0.2 ms. All codes are freely accessible at https://github.com/xianyuco/SS-GNN.
Collapse
Affiliation(s)
- Shuke Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Yanzhao Jin
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Tianmeng Liu
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Qi Wang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Zhaohui Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
| | - Shuliang Zhao
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
- Hebei
Provincial Key Laboratory of Network and Information Security, Shijiazhuang 050024, China
- Hebei
Provincial Engineering Research Center for Supply Chain Big Data Analytics
& Data Security, Shijiazhuang 050024, China
| | - Bo Shan
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| |
Collapse
|
13
|
Yousefi N, Yazdani-Jahromi M, Tayebi A, Kolanthai E, Neal CJ, Banerjee T, Gosai A, Balasubramanian G, Seal S, Ozmen Garibay O. BindingSite-AugmentedDTA: enabling a next-generation pipeline for interpretable prediction models in drug repurposing. Brief Bioinform 2023; 24:7140297. [PMID: 37096593 DOI: 10.1093/bib/bbad136] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/02/2022] [Accepted: 03/16/2023] [Indexed: 04/26/2023] Open
Abstract
While research into drug-target interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning (DL)-based framework, called BindingSite-AugmentedDTA, which improves drug-target affinity (DTA) predictions by reducing the search space of potential-binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein-binding sites. The computational results confirm that our framework can enhance the prediction performance of seven state-of-the-art DTA prediction algorithms in terms of four widely used evaluation metrics, including concordance index, mean squared error, modified squared correlation coefficient ($r^2_m$) and the area under the precision curve. We also contribute to three benchmark drug-traget interaction datasets by including additional information on 3D structure of all proteins contained in those datasets, which include the two most commonly used datasets, namely Kiba and Davis, as well as the data from IDG-DREAM drug-kinase binding prediction challenge. Furthermore, we experimentally validate the practical potential of our proposed framework through in-lab experiments. The relatively high agreement between computationally predicted and experimentally observed binding interactions supports the potential of our framework as the next-generation pipeline for prediction models in drug repurposing.
Collapse
Affiliation(s)
- Niloofar Yousefi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Mehdi Yazdani-Jahromi
- Computer Science, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Aida Tayebi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Elayaraja Kolanthai
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Craig J Neal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Tanumoy Banerjee
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | | | - Ganesh Balasubramanian
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | - Sudipta Seal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
- Advanced Materials Processing and Analysis Center, Department of Materials Science and Engineering, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Ozlem Ozmen Garibay
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| |
Collapse
|
14
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
15
|
Yadav S, Bharti S, Mathur P. GlucoKinaseDB: A comprehensive, curated resource of glucokinase modulators for clinical and molecular research. Comput Biol Chem 2023; 103:107818. [PMID: 36680885 DOI: 10.1016/j.compbiolchem.2023.107818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023]
Abstract
Glucokinase (GK), an isoform of hexokinase expressed predominantly in liver, pancreas and hypothalamus is crucial to blood glucose management. It is a critical component of the glucose-sensing mechanism of the pancreatic islet cells and glycogen regulation in hepatocytes. GK modulators such as allosteric GKAs (glucokinase activators) and GK-GKRP (glucokinase regulatory protein) disruptors have found potential applications as safer antihyperglycemics. Recent studies have also demonstrated the potential of GK modulators as antiparasitic agents. Researchers targeting GK often undertake the time-consuming task of independently collecting and compiling modulator information due to the lack of any dedicated single-platform resource. Towards this, in the present study we demonstrate the design and development of GlucoKinaseDB (GKDB), a comprehensive, curated, online resource of GK modulators. GKDB contains experimentally derived structural and bioactivity information of 1723 modulators along with their detailed molecular descriptors. The web-interface is user-friendly with features such as in-browser visualization, advanced search queries, cross-links to other databases and original reference etc. The bioactivity and descriptor data can be downloaded in bulk (for entire database) or for individual modulators. The 3D structures are also downloadable in multiple formats. GKDB employs a PHP-based web design with Bootstrap styling and a MySQL database backend. GKDB can be utilized for clinical and molecular research via development of pharmacophore hypotheses, QSAR/QSPR models, predictive machine learning models etc. GKDB is freely accessible online at https://glucokinasedb.in.
Collapse
Affiliation(s)
- Siddharth Yadav
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India
| | - Samuel Bharti
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India
| | - Puniti Mathur
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh, India.
| |
Collapse
|
16
|
Voitsitskyi T, Stratiichuk R, Koleiev I, Popryho L, Ostrovsky Z, Henitsoi P, Khropachov I, Vozniak V, Zhytar R, Nechepurenko D, Yesylevskyy S, Nafiiev A, Starosyla S. 3DProtDTA: a deep learning model for drug-target affinity prediction based on residue-level protein graphs. RSC Adv 2023; 13:10261-10272. [PMID: 37006369 PMCID: PMC10065141 DOI: 10.1039/d3ra00281k] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023] Open
Abstract
Accurate prediction of the drug-target affinity (DTA) in silico is of critical importance for modern drug discovery. Computational methods of DTA prediction, applied in the early stages of drug development, are able to speed it up and cut its cost significantly. A wide range of approaches based on machine learning were recently proposed for DTA assessment. The most promising of them are based on deep learning techniques and graph neural networks to encode molecular structures. The recent breakthrough in protein structure prediction made by AlphaFold made an unprecedented amount of proteins without experimentally defined structures accessible for computational DTA prediction. In this work, we propose a new deep learning DTA model 3DProtDTA, which utilises AlphaFold structure predictions in conjunction with the graph representation of proteins. The model is superior to its rivals on common benchmarking datasets and has potential for further improvement.
Collapse
Affiliation(s)
- Taras Voitsitskyi
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine Nauky Ave. 46 03038 Kyiv Ukraine
| | - Roman Stratiichuk
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Biophysics and Medical Informatics, Educational and Scientific Centre "Institute of Biology and Medicine", Taras Shevchenko National University of Kyiv 64 Volodymyrska Str. 01601 Kyiv Ukraine
| | - Ihor Koleiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | | | | | | | | | | - Roman Zhytar
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | | - Semen Yesylevskyy
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences CZ-166 10 Prague 6 Czech Republic
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine Nauky Ave. 46 03038 Kyiv Ukraine
| | - Alan Nafiiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | |
Collapse
|
17
|
Muniyappan S, Rayan AXA, Varrieth GT. DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9530-9571. [PMID: 37161255 DOI: 10.3934/mbe.2023419] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
MOTIVATION In vitro experiment-based drug-target interaction (DTI) exploration demands more human, financial and data resources. In silico approaches have been recommended for predicting DTIs to reduce time and cost. During the drug development process, one can analyze the therapeutic effect of the drug for a particular disease by identifying how the drug binds to the target for treating that disease. Hence, DTI plays a major role in drug discovery. Many computational methods have been developed for DTI prediction. However, the existing methods have limitations in terms of capturing the interactions via multiple semantics between drug and target nodes in a heterogeneous biological network (HBN). METHODS In this paper, we propose a DTiGNN framework for identifying unknown drug-target pairs. The DTiGNN first calculates the similarity between the drug and target from multiple perspectives. Then, the features of drugs and targets from each perspective are learned separately by using a novel method termed an information entropy-based random walk. Next, all of the learned features from different perspectives are integrated into a single drug and target similarity network by using a multi-view convolutional neural network. Using the integrated similarity networks, drug interactions, drug-disease associations, protein interactions and protein-disease association, the HBN is constructed. Next, a novel embedding algorithm called a meta-graph guided graph neural network is used to learn the embedding of drugs and targets. Then, a convolutional neural network is employed to infer new DTIs after balancing the sample using oversampling techniques. RESULTS The DTiGNN is applied to various datasets, and the result shows better performance in terms of the area under receiver operating characteristic curve (AUC) and area under precision-recall curve (AUPR), with scores of 0.98 and 0.99, respectively. There are 23,739 newly predicted DTI pairs in total.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Tamil Nadu, India
| | | | | |
Collapse
|
18
|
Yang X, Niu Z, Liu Y, Song B, Lu W, Zeng L, Zeng X. Modality-DTA: Multimodality Fusion Strategy for Drug-Target Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1200-1210. [PMID: 36083952 DOI: 10.1109/tcbb.2022.3205282] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Prediction of the drug-target affinity (DTA) plays an important role in drug discovery. Existing deep learning methods for DTA prediction typically leverage a single modality, namely simplified molecular input line entry specification (SMILES) or amino acid sequence to learn representations. SMILES or amino acid sequences can be encoded into different modalities. Multimodality data provide different kinds of information, with complementary roles for DTA prediction. We propose Modality-DTA, a novel deep learning method for DTA prediction that leverages the multimodality of drugs and targets. A group of backward propagation neural networks is applied to ensure the completeness of the reconstruction process from the latent feature representation to original multimodality data. The tag between the drug and target is used to reduce the noise information in the latent representation from multimodality data. Experiments on three benchmark datasets show that our Modality-DTA outperforms existing methods in all metrics. Modality-DTA reduces the mean square error by 15.7% and improves the area under the precisionrecall curve by 12.74% in the Davis dataset. We further find that the drug modality Morgan fingerprint and the target modality generated by one-hot-encoding play the most significant roles. To the best of our knowledge, Modality-DTA is the first method to explore multimodality for DTA prediction.
Collapse
|
19
|
Nguyen TM, Quinn TP, Nguyen T, Tran T. Explaining Black Box Drug Target Prediction Through Model Agnostic Counterfactual Samples. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1020-1029. [PMID: 35820003 DOI: 10.1109/tcbb.2022.3190266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Many high-performance DTA deep learning models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and allows to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug-target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time. We benchmark the proposed MACDA framework using the Davis and PDBBind dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge.
Collapse
|
20
|
Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools. Sci Rep 2023; 13:3008. [PMID: 36810894 PMCID: PMC9944886 DOI: 10.1038/s41598-023-29996-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 02/14/2023] [Indexed: 02/24/2023] Open
Abstract
Binding MOAD is a database of protein-ligand complexes and their affinities with many structured relationships across the dataset. The project has been in development for over 20 years, but now, the time has come to bring it to a close. Currently, the database contains 41,409 structures with affinity coverage for 15,223 (37%) complexes. The website BindingMOAD.org provides numerous tools for polypharmacology exploration. Current relationships include links for structures with sequence similarity, 2D ligand similarity, and binding-site similarity. In this last update, we have added 3D ligand similarity using ROCS to identify ligands which may not necessarily be similar in two dimensions but can occupy the same three-dimensional space. For the 20,387 different ligands present in the database, a total of 1,320,511 3D-shape matches between the ligands were added. Examples of the utility of 3D-shape matching in polypharmacology are presented. Finally, plans for future access to the project data are outlined.
Collapse
|
21
|
Design and Prediction of Aptamers Assisted by In Silico Methods. Biomedicines 2023; 11:biomedicines11020356. [PMID: 36830893 PMCID: PMC9953197 DOI: 10.3390/biomedicines11020356] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/21/2023] [Accepted: 01/23/2023] [Indexed: 01/28/2023] Open
Abstract
An aptamer is a single-stranded DNA or RNA that binds to a specific target with high binding affinity. Aptamers are developed through the process of systematic evolution of ligands by exponential enrichment (SELEX), which is repeated to increase the binding power and specificity. However, the SELEX process is time-consuming, and the characterization of aptamer candidates selected through it requires additional effort. Here, we describe in silico methods in order to suggest the most efficient way to develop aptamers and minimize the laborious effort required to screen and optimise aptamers. We investigated several methods for the estimation of aptamer-target molecule binding through conformational structure prediction, molecular docking, and molecular dynamic simulation. In addition, examples of machine learning and deep learning technologies used to predict the binding of targets and ligands in the development of new drugs are introduced. This review will be helpful in the development and application of in silico aptamer screening and characterization.
Collapse
|
22
|
Rai A, Shah K, Dewangan HK. Review on the Artificial Intelligence-based Nanorobotics Targeted Drug Delivery System for Brain-specific Targeting. Curr Pharm Des 2023; 29:3519-3531. [PMID: 38111114 DOI: 10.2174/0113816128279248231210172053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
Contemporary medical research increasingly focuses on the blood-brain barrier (BBB) to maintain homeostasis in healthy individuals and provide solutions for neurological disorders, including brain cancer. Specialized in vitro modules replicate the BBB's complex structure and signalling using micro-engineered perfusion devices and advanced 3D cell cultures, thus advancing the understanding of neuropharmacology. This research explores nanoparticle-based biomolecular engineering for precise control, targeting, and transport of theranostic payloads across the BBB using nanorobots. The review summarizes case studies on delivering therapeutics for brain tumors and neurological disorders, such as Alzheimer's, Parkinson's, and multiple sclerosis. It also examines the advantages and disadvantages of nano-robotics. In conclusion, integrating machine learning and AI with robotics aims to develop safe nanorobots capable of interacting with the BBB without adverse effects. This comprehensive review is valuable for extensive analysis and is of great significance to healthcare professionals, engineers specializing in robotics, chemists, and bioengineers involved in pharmaceutical development and neurological research, emphasizing transdisciplinary approaches.
Collapse
Affiliation(s)
- Akriti Rai
- School of Pharmacy, Lingayas Vidyapeeth, Nachauli, Jasana Road, Faridabad, Haryana 121002, India
| | - Kamal Shah
- Institute of Pharmaceutical Research (IPR), GLA University Mathura, NH-2 Delhi Mathura Road, Po Chaumuhan, Mathura, Uttar Pradesh 281406, India
| | - Hitesh Kumar Dewangan
- University Institute of Pharma Sciences (UIPS), Chandigarh University, NH-95, Chandigarh Ludhiana Highway, Mohali, Punjab, India
| |
Collapse
|
23
|
Nguyen MT, Nguyen T, Tran T. Learning to discover medicines. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022; 16:1-16. [PMID: 36440369 PMCID: PMC9676887 DOI: 10.1007/s41060-022-00371-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 11/05/2022] [Indexed: 11/19/2022]
Abstract
Discovering new medicines is the hallmark of the human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper, we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature on AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come.
Collapse
Affiliation(s)
- Minh-Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| |
Collapse
|
24
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
25
|
Hierarchical graph representation learning for the prediction of drug-target binding affinity. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.09.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
26
|
Krasoulis A, Antonopoulos N, Pitsikalis V, Theodorakis S. DENVIS: Scalable and High-Throughput Virtual Screening Using Graph Neural Networks with Atomic and Surface Protein Pocket Features. J Chem Inf Model 2022; 62:4642-4659. [PMID: 36154119 DOI: 10.1021/acs.jcim.2c01057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computational methods for virtual screening can dramatically accelerate early-stage drug discovery by identifying potential hits for a specified target. Docking algorithms traditionally use physics-based simulations to address this challenge by estimating the binding orientation of a query protein-ligand pair and a corresponding binding affinity score. Over the recent years, classical and modern machine learning architectures have shown potential for outperforming traditional docking algorithms. Nevertheless, most learning-based algorithms still rely on the availability of the protein-ligand complex binding pose, typically estimated via docking simulations, which leads to a severe slowdown of the overall virtual screening process. A family of algorithms processing target information at the amino acid sequence level avoid this requirement, however, at the cost of processing protein data at a higher representation level. We introduce deep neural virtual screening (DENVIS), an end-to-end pipeline for virtual screening using graph neural networks (GNNs). By performing experiments on two benchmark databases, we show that our method performs competitively to several docking-based, machine learning-based, and hybrid docking/machine learning-based algorithms. By avoiding the intermediate docking step, DENVIS exhibits several orders of magnitude faster screening times (i.e., higher throughput) than both docking-based and hybrid models. When compared to an amino acid sequence-based machine learning model with comparable screening times, DENVIS achieves dramatically better performance. Some key elements of our approach include protein pocket modeling using a combination of atomic and surface features, the use of model ensembles, and data augmentation via artificial negative sampling during model training. In summary, DENVIS achieves competitive to state-of-the-art virtual screening performance, while offering the potential to scale to billions of molecules using minimal computational resources.
Collapse
|
27
|
Wang T, Pulkkinen OI, Aittokallio T. Target-specific compound selectivity for multi-target drug discovery and repurposing. Front Pharmacol 2022; 13:1003480. [PMID: 36225560 PMCID: PMC9549418 DOI: 10.3389/fphar.2022.1003480] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
Most drug molecules modulate multiple target proteins, leading either to therapeutic effects or unwanted side effects. Such target promiscuity partly contributes to high attrition rates and leads to wasted costs and time in the current drug discovery process, and makes the assessment of compound selectivity an important factor in drug development and repurposing efforts. Traditionally, selectivity of a compound is characterized in terms of its target activity profile (wide or narrow), which can be quantified using various statistical and information theoretic metrics. Even though the existing selectivity metrics are widely used for characterizing the overall selectivity of a compound, they fall short in quantifying how selective the compound is against a particular target protein (e.g., disease target of interest). We therefore extended the concept of compound selectivity towards target-specific selectivity, defined as the potency of a compound to bind to the particular protein in comparison to the other potential targets. We decompose the target-specific selectivity into two components: 1) the compound’s potency against the target of interest (absolute potency), and 2) the compound’s potency against the other targets (relative potency). The maximally selective compound-target pairs are then identified as a solution of a bi-objective optimization problem that simultaneously optimizes these two potency metrics. In computational experiments carried out using large-scale kinase inhibitor dataset, which represents a wide range of polypharmacological activities, we show how the optimization-based selectivity scoring offers a systematic approach to finding both potent and selective compounds against given kinase targets. Compared to the existing selectivity metrics, we show how the target-specific selectivity provides additional insights into the target selectivity and promiscuity of multi-targeting kinase inhibitors. Even though the selectivity score is shown to be relatively robust against both missing bioactivity values and the dataset size, we further developed a permutation-based procedure to calculate empirical p-values to assess the statistical significance of the observed selectivity of a compound-target pair in the given bioactivity dataset. We present several case studies that show how the target-specific selectivity can distinguish between highly selective and broadly-active kinase inhibitors, hence facilitating the discovery or repurposing of multi-targeting drugs.
Collapse
Affiliation(s)
- Tianduanyi Wang
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Otto I. Pulkkinen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics and InFLAMES Research Flagship, University of Turku, Turku, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics and InFLAMES Research Flagship, University of Turku, Turku, Finland
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
- *Correspondence: Tero Aittokallio,
| |
Collapse
|
28
|
Yaseen A, Amin I, Akhter N, Ben-Hur A, Minhas F. Insights into performance evaluation of compound-protein interaction prediction methods. Bioinformatics 2022; 38:ii75-ii81. [PMID: 36124806 DOI: 10.1093/bioinformatics/btac496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adiba Yaseen
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan
| | - Naeem Akhter
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
29
|
Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:5014-5027. [PMID: 36091720 PMCID: PMC9448712 DOI: 10.1016/j.csbj.2022.09.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/03/2022] [Accepted: 09/03/2022] [Indexed: 11/26/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), has led to a global pandemic. Deep learning (DL) technology and molecular dynamics (MD) simulation are two mainstream computational approaches to investigate the geometric, chemical and structural features of protein and guide the relevant drug design. Despite a large amount of research papers focusing on drug design for SARS-COV-2 using DL architectures, it remains unclear how the binding energy of the protein-protein/ligand complex dynamically evolves which is also vital for drug development. In addition, traditional deep neural networks usually have obvious deficiencies in predicting the interaction sites as protein conformation changes. In this review, we introduce the latest progresses of the DL and DL-based MD simulation approaches in structure-based drug design (SBDD) for SARS-CoV-2 which could address the problems of protein structure and binding prediction, drug virtual screening, molecular docking and complex evolution. Furthermore, the current challenges and future directions of DL-based MD simulation for SBDD are also discussed.
Collapse
Affiliation(s)
- Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Chengcheng Shi
- State Key Lab of Urban Water Resource and Environment, School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
30
|
Chen G, Jiang X, Lv Q, Tan X, Yang Z, Chen CYC. VAERHNN: Voting-averaged ensemble regression and hybrid neural network to investigate potent leads against colorectal cancer. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
31
|
Yang R, Zha X, Gao X, Wang K, Cheng B, Yan B. Multi-stage virtual screening of natural products against p38α mitogen-activated protein kinase: predictive modeling by machine learning, docking study and molecular dynamics simulation. Heliyon 2022; 8:e10495. [PMID: 36105464 PMCID: PMC9465123 DOI: 10.1016/j.heliyon.2022.e10495] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/20/2022] [Accepted: 08/25/2022] [Indexed: 11/20/2022] Open
Abstract
p38α is a mitogen-activated protein kinase (MAPK), and the signaling pathways involved are closely related to the inflammation, apoptosis and differentiation of cells, which also makes it an attractive target for drug discovery. With the high efficiency and low cost, virtual screening technology is becoming an indispensable part of drug development. In this study, a novel multi-stage virtual screening method based on machine learning, molecular docking and molecular dynamics simulation was developed to identify p38α MAPK inhibitors from natural products in ZINC database, which improves the prediction accuracy by considering and utilizing both ligand and receptor information compared to any individual approach. Ultimately, we screened out two candidate inhibitors with acceptable ADMET properties (ZINC4260400 and ZINC8300300). Among the generated machine learning models, Random Forest (RF) and Support Vector Machine (SVM) performed better, with the area under the receiver operating characteristic curve (AUC) values of 0.932 and 0.931 on the test set, as well as 0.834 and 0.850 on the external validation set. In addition, the results of molecular docking and ADMET prediction showed that two compounds with appropriate pharmacokinetic properties had binding free energies less than −8.0 kcal/mol for the target protein, and the results of molecular dynamics simulations further confirmed that they were stable during the process of inhibition.
Collapse
|
32
|
Molecular Evolution of the Pseudomonas aeruginosa DNA Gyrase gyrA Gene. Microorganisms 2022; 10:microorganisms10081660. [PMID: 36014079 PMCID: PMC9415716 DOI: 10.3390/microorganisms10081660] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/10/2022] [Accepted: 08/13/2022] [Indexed: 11/29/2022] Open
Abstract
DNA gyrase plays important roles in genome replication in various bacteria, including Pseudomonasaeruginosa. The gyrA gene encodes the gyrase subunit A protein (GyrA). Mutations in GyrA are associated with resistance to quinolone-based antibiotics. We performed a detailed molecular evolutionary analyses of the gyrA gene and associated resistance to the quinolone drug, ciprofloxacin, using bioinformatics techniques. We produced an evolutionary phylogenetic tree using the Bayesian Markov Chain Monte Carlo (MCMC) method. This tree indicated that a common ancestor of the gene was present over 760 years ago, and the offspring formed multiple clusters. Quinolone drug-resistance-associated amino-acid substitutions in GyrA, including T83I and D87N, emerged after the drug was used clinically. These substitutions appeared to be positive selection sites. The molecular affinity between ciprofloxacin and the GyrA protein containing T83I and/or D87N decreased significantly compared to that between the drug and GyrA protein, with no substitutions. The rate of evolution of the gene before quinolone drugs were first used in the clinic, in 1962, was significantly lower than that after the drug was used. These results suggest that the gyrA gene evolved to permit the bacterium to overcome quinolone treatment.
Collapse
|
33
|
Yang R, Xia Y, Xian J, Yu H, Yan B, Cheng B. Identification of Potential Dual Farnesol X Receptor/Retinoid X Receptor α Agonists Based on Machine Learning Models, ADMET Prediction and Molecular Docking. ChemistrySelect 2022. [DOI: 10.1002/slct.202200715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Ruo‐qi Yang
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine Jinan 250355 China
- Shandong University of Traditional Chinese Medicine Jinan 250355 China
| | - Yu Xia
- Shandong University of Traditional Chinese Medicine Jinan 250355 China
| | - Jin Xian
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine Jinan 250355 China
| | - Hui‐juan Yu
- Shandong University of Traditional Chinese Medicine Jinan 250355 China
| | - Bin Yan
- Shandong University of Traditional Chinese Medicine Jinan 250355 China
| | - Bin Cheng
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine Jinan 250355 China
| |
Collapse
|
34
|
Yazdani-Jahromi M, Yousefi N, Tayebi A, Kolanthai E, Neal CJ, Seal S, Garibay OO. AttentionSiteDTI: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief Bioinform 2022; 23:6640006. [PMID: 35817396 PMCID: PMC9294423 DOI: 10.1093/bib/bbac272] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/01/2022] [Accepted: 06/10/2022] [Indexed: 11/14/2022] Open
Abstract
In this study, we introduce an interpretable graph-based deep learning prediction model, AttentionSiteDTI, which utilizes protein binding sites along with a self-attention mechanism to address the problem of drug-target interaction prediction. Our proposed model is inspired by sentence classification models in the field of Natural Language Processing, where the drug-target complex is treated as a sentence with relational meaning between its biochemical entities a.k.a. protein pockets and drug molecule. AttentionSiteDTI enables interpretability by identifying the protein binding sites that contribute the most toward the drug-target interaction. Results on three benchmark datasets show improved performance compared with the current state-of-the-art models. More significantly, unlike previous studies, our model shows superior performance, when tested on new proteins (i.e. high generalizability). Through multidisciplinary collaboration, we further experimentally evaluate the practical potential of our proposed approach. To achieve this, we first computationally predict the binding interactions between some candidate compounds and a target protein, then experimentally validate the binding interactions for these pairs in the laboratory. The high agreement between the computationally predicted and experimentally observed (measured) drug-target interactions illustrates the potential of our method as an effective pre-screening tool in drug repurposing applications.
Collapse
Affiliation(s)
- Mehdi Yazdani-Jahromi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Niloofar Yousefi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Aida Tayebi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Elayaraja Kolanthai
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Craig J Neal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Sudipta Seal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA.,Advanced Materials Processing and Analysis Center, Dept. of Materials Science and Engineering, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Ozlem Ozmen Garibay
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| |
Collapse
|
35
|
Nguyen TM, Nguyen T, Tran T. Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring. Brief Bioinform 2022; 23:6628784. [PMID: 35788823 PMCID: PMC9353967 DOI: 10.1093/bib/bbac269] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/20/2022] [Accepted: 06/08/2022] [Indexed: 12/04/2022] Open
Abstract
Predicting the drug-target interaction is crucial for drug discovery as well as drug repurposing. Machine learning is commonly used in drug-target affinity (DTA) problem. However, the machine learning model faces the cold-start problem where the model performance drops when predicting the interaction of a novel drug or target. Previous works try to solve the cold start problem by learning the drug or target representation using unsupervised learning. While the drug or target representation can be learned in an unsupervised manner, it still lacks the interaction information, which is critical in drug-target interaction. To incorporate the interaction information into the drug and protein interaction, we proposed using transfer learning from chemical–chemical interaction (CCI) and protein–protein interaction (PPI) task to drug-target interaction task. The representation learned by CCI and PPI tasks can be transferred smoothly to the DTA task due to the similar nature of the tasks. The result on the DTA datasets shows that our proposed method has advantages compared to other pre-training methods in the DTA task.
Collapse
Affiliation(s)
- Tri Minh Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| |
Collapse
|
36
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
37
|
UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules 2022; 27:molecules27092980. [PMID: 35566330 PMCID: PMC9100109 DOI: 10.3390/molecules27092980] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/26/2022] [Accepted: 04/28/2022] [Indexed: 01/27/2023] Open
Abstract
Drug-target interaction (DTI) prediction through in vitro methods is expensive and time-consuming. On the other hand, computational methods can save time and money while enhancing drug discovery efficiency. Most of the computational methods frame DTI prediction as a binary classification task. One important challenge is that the number of negative interactions in all DTI-related datasets is far greater than the number of positive interactions, leading to the class imbalance problem. As a result, a classifier is trained biased towards the majority class (negative class), whereas the minority class (interacting pairs) is of interest. This class imbalance problem is not widely taken into account in DTI prediction studies, and the few previous studies considering balancing in DTI do not focus on the imbalance issue itself. Additionally, they do not benefit from deep learning models and experimental validation. In this study, we propose a computational framework along with experimental validations to predict drug-target interaction using an ensemble of deep learning models to address the class imbalance problem in the DTI domain. The objective of this paper is to mitigate the bias in the prediction of DTI by focusing on the impact of balancing and maintaining other involved parameters at a constant value. Our analysis shows that the proposed model outperforms unbalanced models with the same architecture trained on the BindingDB both computationally and experimentally. These findings demonstrate the significance of balancing, which reduces the bias towards the negative class and leads to better performance. It is important to note that leaning on computational results without experimentally validating them and by relying solely on AUROC and AUPRC metrics is not credible, particularly when the testing set remains unbalanced.
Collapse
|
38
|
DRUG REPOSITIONING FOR CANCER IN THE ERA OF BIG OMICS AND REAL-WORLD DATA. Crit Rev Oncol Hematol 2022; 175:103730. [DOI: 10.1016/j.critrevonc.2022.103730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/25/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
|
39
|
Nag S, Baidya ATK, Mandal A, Mathew AT, Das B, Devi B, Kumar R. Deep learning tools for advancing drug discovery and development. 3 Biotech 2022; 12:110. [PMID: 35433167 PMCID: PMC8994527 DOI: 10.1007/s13205-022-03165-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 03/18/2022] [Indexed: 12/26/2022] Open
Abstract
A few decades ago, drug discovery and development were limited to a bunch of medicinal chemists working in a lab with enormous amount of testing, validations, and synthetic procedures, all contributing to considerable investments in time and wealth to get one drug out into the clinics. The advancements in computational techniques combined with a boom in multi-omics data led to the development of various bioinformatics/pharmacoinformatics/cheminformatics tools that have helped speed up the drug development process. But with the advent of artificial intelligence (AI), machine learning (ML) and deep learning (DL), the conventional drug discovery process has been further rationalized. Extensive biological data in the form of big data present in various databases across the globe acts as the raw materials for the ML/DL-based approaches and helps in accurate identifications of patterns and models which can be used to identify therapeutically active molecules with much fewer investments on time, workforce and wealth. In this review, we have begun by introducing the general concepts in the drug discovery pipeline, followed by an outline of the fields in the drug discovery process where ML/DL can be utilized. We have also introduced ML and DL along with their applications, various learning methods, and training models used to develop the ML/DL-based algorithms. Furthermore, we have summarized various DL-based tools existing in the public domain with their application in the drug discovery paradigm which includes DL tools for identification of drug targets and drug–target interaction such as DeepCPI, DeepDTA, WideDTA, PADME DeepAffinity, and DeepPocket. Additionally, we have discussed various DL-based models used in protein structure prediction, de novo design of new chemical scaffolds, virtual screening of chemical libraries for hit identification, absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction, metabolite prediction, clinical trial design, and oral bioavailability prediction. In the end, we have tried to shed light on some of the successful ML/DL-based models used in the drug discovery and development pipeline while also discussing the current challenges and prospects of the application of DL tools in drug discovery and development. We believe that this review will be useful for medicinal and computational chemists searching for DL tools for use in their drug discovery projects.
Collapse
Affiliation(s)
- Sagorika Nag
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Anurag T. K. Baidya
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Abhimanyu Mandal
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Alen T. Mathew
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bhanuranjan Das
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bharti Devi
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Rajnish Kumar
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| |
Collapse
|
40
|
Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022; 10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open
Abstract
Biomedical knowledge is represented in structured databases and published in biomedical literature, and different computational approaches have been developed to exploit each type of information in predictive models. However, the information in structured databases and literature is often complementary. We developed a machine learning method that combines information from literature and databases to predict drug targets and indications. To effectively utilize information in published literature, we integrate knowledge graphs and published literature using named entity recognition and normalization before applying a machine learning model that utilizes the combination of graph and literature. We then use supervised machine learning to show the effects of combining features from biomedical knowledge and published literature on the prediction of drug targets and drug indications. We demonstrate that our approach using datasets for drug-target interactions and drug indications is scalable to large graphs and can be used to improve the ranking of targets and indications by exploiting features from either structure or unstructured information alone.
Collapse
Affiliation(s)
- Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Abdullah Almansour
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Asma Alkhaldi
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Maha A. Thafar
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
41
|
Moon S, Zhung W, Yang S, Lim J, Kim WY. PIGNet: a physics-informed deep learning model toward generalized drug-target interaction predictions. Chem Sci 2022; 13:3661-3673. [PMID: 35432900 PMCID: PMC8966633 DOI: 10.1039/d1sc06946b] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 02/06/2022] [Indexed: 12/21/2022] Open
Abstract
Recently, deep neural network (DNN)-based drug–target interaction (DTI) models were highlighted for their high accuracy with affordable computational costs. Yet, the models' insufficient generalization remains a challenging problem in the practice of in silico drug discovery. We propose two key strategies to enhance generalization in the DTI model. The first is to predict the atom–atom pairwise interactions via physics-informed equations parameterized with neural networks and provides the total binding affinity of a protein–ligand complex as their sum. We further improved the model generalization by augmenting a broader range of binding poses and ligands to training data. We validated our model, PIGNet, in the comparative assessment of scoring functions (CASF) 2016, demonstrating the outperforming docking and screening powers than previous methods. Our physics-informing strategy also enables the interpretation of predicted affinities by visualizing the contribution of ligand substructures, providing insights for further ligand optimization. PIGNet, a deep neural network-based drug–target interaction model guided by physics and extensive data augmentation, shows significantly improved generalization ability and model performance.![]()
Collapse
Affiliation(s)
- Seokhyun Moon
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Wonho Zhung
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Soojung Yang
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Jaechang Lim
- HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea .,HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea.,KI for Artificial Intelligence, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| |
Collapse
|
42
|
Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 2022; 12:4751. [PMID: 35306525 PMCID: PMC8934358 DOI: 10.1038/s41598-022-08787-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 03/08/2022] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.
Collapse
|
43
|
Nguyen TM, Nguyen T, Le TM, Tran T. GEFA: Early Fusion Approach in Drug-Target Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:718-728. [PMID: 34197324 DOI: 10.1109/tcbb.2021.3094217] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA)problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.
Collapse
|
44
|
Deep learning allows genome-scale prediction of Michaelis constants from structural features. PLoS Biol 2021; 19:e3001402. [PMID: 34665809 PMCID: PMC8525774 DOI: 10.1371/journal.pbio.3001402] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 08/26/2021] [Indexed: 01/09/2023] Open
Abstract
The Michaelis constant KM describes the affinity of an enzyme for a specific substrate and is a central parameter in studies of enzyme kinetics and cellular physiology. As measurements of KM are often difficult and time-consuming, experimental estimates exist for only a minority of enzyme–substrate combinations even in model organisms. Here, we build and train an organism-independent model that successfully predicts KM values for natural enzyme–substrate combinations using machine and deep learning methods. Predictions are based on a task-specific molecular fingerprint of the substrate, generated using a graph neural network, and on a deep numerical representation of the enzyme’s amino acid sequence. We provide genome-scale KM predictions for 47 model organisms, which can be used to approximately relate metabolite concentrations to cellular physiology and to aid in the parameterization of kinetic models of cellular metabolism. To understand the action of an enzyme, we need to know its affinity for its substrates, quantified by Michaelis constants, but these are difficult to measure experimentally. This study shows that a deep learning model that can predict them from structural features of the enzyme and substrate, providing KM predictions for all enzymes across 47 model organisms.
Collapse
|
45
|
Thafar MA, Olayan RS, Albaradei S, Bajic VB, Gojobori T, Essack M, Gao X. DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning. J Cheminform 2021; 13:71. [PMID: 34551818 PMCID: PMC8459562 DOI: 10.1186/s13321-021-00552-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 09/05/2021] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction is a crucial step in drug discovery and repositioning as it reduces experimental validation costs if done right. Thus, developing in-silico methods to predict potential DTI has become a competitive research niche, with one of its main focuses being improving the prediction accuracy. Using machine learning (ML) models for this task, specifically network-based approaches, is effective and has shown great advantages over the other computational methods. However, ML model development involves upstream hand-crafted feature extraction and other processes that impact prediction accuracy. Thus, network-based representation learning techniques that provide automated feature extraction combined with traditional ML classifiers dealing with downstream link prediction tasks may be better-suited paradigms. Here, we present such a method, DTi2Vec, which identifies DTIs using network representation learning and ensemble learning techniques. DTi2Vec constructs the heterogeneous network, and then it automatically generates features for each drug and target using the nodes embedding technique. DTi2Vec demonstrated its ability in drug-target link prediction compared to several state-of-the-art network-based methods, using four benchmark datasets and large-scale data compiled from DrugBank. DTi2Vec showed a statistically significant increase in the prediction performances in terms of AUPR. We verified the "novel" predicted DTIs using several databases and scientific literature. DTi2Vec is a simple yet effective method that provides high DTI prediction performance while being scalable and efficient in computation, translating into a powerful drug repositioning tool.
Collapse
Affiliation(s)
- Maha A Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Kingdom of Saudi Arabia
| | - Rawan S Olayan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
| |
Collapse
|
46
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
47
|
Xiong Z, Jeon M, Allaway RJ, Kang J, Park D, Lee J, Jeon H, Ko M, Jiang H, Zheng M, Tan AC, Guo X, Dang KK, Tropsha A, Hecht C, Das TK, Carlson HA, Abagyan R, Guinney J, Schlessinger A, Cagan R. Crowdsourced identification of multi-target kinase inhibitors for RET- and TAU- based disease: The Multi-Targeting Drug DREAM Challenge. PLoS Comput Biol 2021; 17:e1009302. [PMID: 34520464 PMCID: PMC8483411 DOI: 10.1371/journal.pcbi.1009302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 09/30/2021] [Accepted: 07/23/2021] [Indexed: 01/22/2023] Open
Abstract
A continuing challenge in modern medicine is the identification of safer and more efficacious drugs. Precision therapeutics, which have one molecular target, have been long promised to be safer and more effective than traditional therapies. This approach has proven to be challenging for multiple reasons including lack of efficacy, rapidly acquired drug resistance, and narrow patient eligibility criteria. An alternative approach is the development of drugs that address the overall disease network by targeting multiple biological targets ('polypharmacology'). Rational development of these molecules will require improved methods for predicting single chemical structures that target multiple drug targets. To address this need, we developed the Multi-Targeting Drug DREAM Challenge, in which we challenged participants to predict single chemical entities that target pro-targets but avoid anti-targets for two unrelated diseases: RET-based tumors and a common form of inherited Tauopathy. Here, we report the results of this DREAM Challenge and the development of two neural network-based machine learning approaches that were applied to the challenge of rational polypharmacology. Together, these platforms provide a potentially useful first step towards developing lead therapeutic compounds that address disease complexity through rational polypharmacology.
Collapse
Affiliation(s)
- Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, Shanghai, China
| | - Minji Jeon
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | | | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, Republic of Korea
| | - Donghyeon Park
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Jinhyuk Lee
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Hwisang Jeon
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, Republic of Korea
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Miyoung Ko
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, Shanghai, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Aik Choon Tan
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Xindi Guo
- Sage Bionetworks, Seattle, Washington, United States of America
| | | | - Kristen K. Dang
- Sage Bionetworks, Seattle, Washington, United States of America
| | - Alex Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Chana Hecht
- Department of Cell, Developmental, and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York City, New York, United States of America
| | - Tirtha K. Das
- Department of Cell, Developmental, and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York City, New York, United States of America
| | - Heather A. Carlson
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, California, United States of America
| | - Justin Guinney
- Sage Bionetworks, Seattle, Washington, United States of America
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York City, New York, United States of America
| | - Ross Cagan
- Department of Cell, Developmental, and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York City, New York, United States of America
- Institute of Cancer Sciences, University of Glasgow; Glasgow, Scotland, United Kingdom
| |
Collapse
|
48
|
Rifaioglu AS, Cetin Atalay R, Cansen Kahraman D, Doğan T, Martin M, Atalay V. MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery. Bioinformatics 2021; 37:693-704. [PMID: 33067636 DOI: 10.1093/bioinformatics/btaa858] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Revised: 08/16/2020] [Accepted: 10/06/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Identification of interactions between bioactive small molecules and target proteins is crucial for novel drug discovery, drug repurposing and uncovering off-target effects. Due to the tremendous size of the chemical space, experimental bioactivity screening efforts require the aid of computational approaches. Although deep learning models have been successful in predicting bioactive compounds, effective and comprehensive featurization of proteins, to be given as input to deep neural networks, remains a challenge. RESULTS Here, we present a novel protein featurization approach to be used in deep learning-based compound-target protein binding affinity prediction. In the proposed method, multiple types of protein features such as sequence, structural, evolutionary and physicochemical properties are incorporated within multiple 2D vectors, which is then fed to state-of-the-art pairwise input hybrid deep neural networks to predict the real-valued compound-target protein interactions. The method adopts the proteochemometric approach, where both the compound and target protein features are used at the input level to model their interaction. The whole system is called MDeePred and it is a new method to be used for the purposes of computational drug discovery and repositioning. We evaluated MDeePred on well-known benchmark datasets and compared its performance with the state-of-the-art methods. We also performed in vitro comparative analysis of MDeePred predictions with selected kinase inhibitors' action on cancer cells. MDeePred is a scalable method with sufficiently high predictive performance. The featurization approach proposed here can also be utilized for other protein-related predictive tasks. AVAILABILITY AND IMPLEMENTATION The source code, datasets, additional information and user instructions of MDeePred are available at https://github.com/cansyl/MDeePred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A S Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey.,Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey
| | - R Cetin Atalay
- Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Section of Pulmonary and Critical Care Medicine, The University of Chicago, Chicago, IL, USA
| | - D Cansen Kahraman
- Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - T Doğan
- Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Institute of Informatics, Hacettepe University, Ankara, Turkey
| | - M Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton, UK
| | - V Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
49
|
Binding affinity prediction for binary drug-target interactions using semi-supervised transfer learning. J Comput Aided Mol Des 2021; 35:883-900. [PMID: 34189637 DOI: 10.1007/s10822-021-00404-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/18/2021] [Indexed: 10/21/2022]
Abstract
In the field of drug-target interactions prediction, the majority of approaches formulated the problem as a simple binary classification task. These methods used binary drug-target interaction datasets to train their models. The prediction of drug-target interactions is inherently a regression problem and these interactions would be identified according to the binding affinity between drugs and targets. This paper deals the binary drug-target interactions and tries to identify the binary interactions based on the binding strength of a drug and its target. To this end, we propose a semi-supervised transfer learning approach to predict the binding affinity in a continuous spectrum for binary interactions. Due to the lack of training data with continuous binding affinity in the target domain, the proposed method makes use of the information available in other domains (i.e. source domain), via the transfer learning approach. The general framework of our algorithm is based on an objective function, which considers the performance in both source and target domains as well as the unlabeled data in the target domain via a regularization term. To optimize this objective function, we make use of a gradient boosting machine which constructs the final model. To assess the performance of the proposed method, we have used some benchmark datasets with binary interactions for four classes of human proteins. Our algorithm identifies interactions in a more realistic situation. According to the experimental results, our regression model performs better than the state-of-the-art methods in some procedures.
Collapse
|
50
|
Lee HG, Kang S, Lee JS. Binding characteristics of staphylococcal protein A and streptococcal protein G for fragment crystallizable portion of human immunoglobulin G. Comput Struct Biotechnol J 2021; 19:3372-3383. [PMID: 34194664 PMCID: PMC8217638 DOI: 10.1016/j.csbj.2021.05.048] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/29/2021] [Accepted: 05/30/2021] [Indexed: 12/03/2022] Open
Abstract
In the wide array of physiological processes, protein-protein interactions and their binding are the most basal activities for achieving adequate biological metabolism. Among the studies on binding proteins, the examination of interactions between immunoglobulin G (IgG) and natural immunoglobulin-binding ligands, such as staphylococcal protein A (spA) and streptococcal protein G (spG), is essential in the development of pharmaceutical science, biotechnology, and affinity chromatography. The widespread utilization of IgG-spA/spG binding characteristics has allowed researchers to investigate these molecular interactions. However, the detailed binding strength of each ligand and the corresponding binding mechanisms have yet to be fully investigated. In this study, the authors analyzed the binding strengths of IgG-spA and IgG-spG complexes and identified the mechanisms enabling these bindings using molecular dynamics simulation, steered molecular dynamics, and advanced Poisson-Boltzmann Solver simulations. Based on the presented data, the binding strength of the spA ligand was found to significantly exceed that of the spG ligand. To find out which non-covalent interactions or amino acid sites have a dominant role in the tight binding of these ligands, further detailed analyses of electrostatic interactions, hydrophobic bonding, and binding free energies have been performed. In investigating their binding affinity, a relatively independent and different unbinding mechanism was found in each ligand. These distinctly different mechanisms were observed to be highly correlated to the protein secondary and tertiary structures of spA and spG ligands, as explicated from the perspective of hydrogen bonding.
Collapse
Key Words
- AFM, Atomic Force Microscopy
- APBS, Advanced Poisson–Boltzmann Solver
- Affinity chromatography
- BIR, Between Protein–Protein Interface Residues
- ELISA, Enzyme-linked Immunosorbent Assays
- Fc, Fragment Crystallizable
- IgG, Immunoglobulin G
- Immunoglobulin G
- MD, Molecular Dynamics
- MM/PBSA, Molecular Mechanics Poisson–Boltzmann Surface Area
- Molecular dynamics
- Protein A
- Protein G
- Protein docking
- RMSD, Root Mean Square Deviation
- SASA, Solvent Accessible Surface Area
- SMD, Steered Molecular Dynamics
- spA, Staphylococcal Protein A
- spG, Streptococcal Protein G
Collapse
Affiliation(s)
- Hae Gon Lee
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, South Korea
| | - Shinill Kang
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, South Korea
| | - Joon Sang Lee
- Department of Mechanical Engineering, Yonsei University, Seoul 03722, South Korea
| |
Collapse
|