1
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
2
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
3
|
Son H, Lee S, Kim J, Park H, Hwang MH, Yi GS. BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias. BMC Bioinformatics 2024; 25:340. [PMID: 39478454 PMCID: PMC11526688 DOI: 10.1186/s12859-024-05968-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/23/2024] [Indexed: 11/02/2024] Open
Abstract
BACKGROUND Deep learning-based drug-target affinity (DTA) prediction methods have shown impressive performance, despite a high number of training parameters relative to the available data. Previous studies have highlighted the presence of dataset bias by suggesting that models trained solely on protein or ligand structures may perform similarly to those trained on complex structures. However, these studies did not propose solutions and focused solely on analyzing complex structure-based models. Even when ligands are excluded, protein-only models trained on complex structures still incorporate some ligand information at the binding sites. Therefore, it is unclear whether binding affinity can be accurately predicted using only compound or protein features due to potential dataset bias. In this study, we expanded our analysis to comprehensive databases and investigated dataset bias through compound and protein feature-based methods using multilayer perceptron models. We assessed the impact of this bias on current prediction models and proposed the binding affinity similarity explorer (BASE) web service, which provides bias-reduced datasets. RESULTS By analyzing eight binding affinity databases using multilayer perceptron models, we confirmed a bias where the compound-protein binding affinity can be accurately predicted using compound features alone. This bias arises because most compounds show consistent binding affinities due to high sequence or functional similarity among their target proteins. Our Uniform Manifold Approximation and Projection analysis based on compound fingerprints further revealed that low and high variation compounds do not exhibit significant structural differences. This suggests that the primary factor driving the consistent binding affinities is protein similarity rather than compound structure. We addressed this bias by creating datasets with progressively reduced protein similarity between the training and test sets, observing significant changes in model performance. We developed the BASE web service to allow researchers to download and utilize these datasets. Feature importance analysis revealed that previous models heavily relied on protein features. However, using bias-reduced datasets increased the importance of compound and interaction features, enabling a more balanced extraction of key features. CONCLUSIONS We propose the BASE web service, providing both the affinity prediction results of existing models and bias-reduced datasets. These resources contribute to the development of generalized and robust predictive models, enhancing the accuracy and reliability of DTA predictions in the drug discovery process. BASE is freely available online at https://synbi2024.kaist.ac.kr/base .
Collapse
Affiliation(s)
- Hyojin Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Sechan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Jaeuk Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Haangik Park
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Myeong-Ha Hwang
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Gwan-Su Yi
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
| |
Collapse
|
4
|
Min Y, Wei Y, Wang P, Wang X, Li H, Wu N, Bauer S, Zheng S, Shi Y, Wang Y, Wu J, Zhao D, Zeng J. From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2405404. [PMID: 39206846 PMCID: PMC11516055 DOI: 10.1002/advs.202405404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 07/29/2024] [Indexed: 09/04/2024]
Abstract
Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.
Collapse
Affiliation(s)
- Yaosen Min
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Ye Wei
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Peizhuo Wang
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
- School of Life Science and TechnologyXidian UniversityXi'an710071ShaanxiChina
| | - Xiaoting Wang
- School of MedicineTsinghua UniversityBeijing100084China
| | - Han Li
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Nian Wu
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Stefan Bauer
- Department of Intelligent SystemsKTHStockholm10044Sweden
| | | | - Yu Shi
- Microsoft Research AsiaBeijing100080China
| | - Yingheng Wang
- Department of Electrical EngineeringTsinghua UniversityBeijing100084China
| | - Ji Wu
- Department of Electrical EngineeringTsinghua UniversityBeijing100084China
| | - Dan Zhao
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Jianyang Zeng
- School of EngineeringWestlake UniversityHangzhou310030China
- Research Center for Industries of the FutureWestlake UniversityHangzhou310030China
- Present address:
Westlake Laboratory of Life Sciences and BiomedicineWestlake UniversityHangzhou310024China
| |
Collapse
|
5
|
Kokudeva M, Vichev M, Naseva E, Miteva DG, Velikova T. Artificial intelligence as a tool in drug discovery and development. World J Exp Med 2024; 14:96042. [PMID: 39312699 PMCID: PMC11372739 DOI: 10.5493/wjem.v14.i3.96042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/06/2024] [Accepted: 08/12/2024] [Indexed: 08/29/2024] Open
Abstract
The rapidly advancing field of artificial intelligence (AI) has garnered substantial attention for its potential application in drug discovery and development. This opinion review critically examined the feasibility and prospects of integrating AI as a transformative tool in the pharmaceutical industry. AI, encompassing machine learning algorithms, deep learning, and data analytics, offers unprecedented opportunities to streamline and enhance various stages of drug development. This opinion review delved into the current landscape of AI-driven approaches, discussing their utilization in target identification, lead optimization, and predictive modeling of pharmacokinetics and toxicity. We aimed to scrutinize the integration of large-scale omics data, electronic health records, and chemical informatics, highlighting the power of AI in uncovering novel therapeutic targets and accelerating drug repurposing strategies. Despite the considerable potential of AI, the review also addressed inherent challenges, including data privacy concerns, interpretability of AI models, and the need for robust validation in real-world clinical settings. Additionally, we explored ethical considerations surrounding AI-driven decision-making in drug development. This opinion review provided a nuanced perspective on the transformative role of AI in drug discovery by discussing the existing literature and emerging trends, presenting critical insights and addressing potential hurdles. In conclusion, this study aimed to stimulate discourse within the scientific community and guide future endeavors to harness the full potential of AI in drug development.
Collapse
Affiliation(s)
- Maria Kokudeva
- Department of Pharmacology and Toxicology, Faculty of Pharmacy, Medical University of Sofia, Sofia 1000, Bulgaria
| | | | - Emilia Naseva
- Faculty of Public Health, Medical University of Sofia, Sofia 1431, Bulgaria
| | - Dimitrina Georgieva Miteva
- Department of Genetics, Faculty of Biology, Sofia University St. Kliment Ohridski, Sofia 1164, Bulgaria
- Medical Faculty, Sofia University St. Kliment Ohridski, Sofia 1407, Bulgaria
| | - Tsvetelina Velikova
- Medical Faculty, Sofia University St. Kliment Ohridski, Sofia 1407, Bulgaria
| |
Collapse
|
6
|
Issabayeva G, Kang OY, Choi SY, Hyun JY, Park SJ, Jeung HC, Lim HJ. Discovery of selective LATS inhibitors via scaffold hopping: enhancing drug-likeness and kinase selectivity for potential applications in regenerative medicine. RSC Med Chem 2024:d4md00492b. [PMID: 39345719 PMCID: PMC11428031 DOI: 10.1039/d4md00492b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 09/10/2024] [Indexed: 10/01/2024] Open
Abstract
Due to its essential roles in cell proliferation and apoptosis, the precise regulation of the Hippo pathway through LATS presents a viable biological target for developing new drugs for cancer and regenerative diseases. However, currently available probes for selective and highly drug-like inhibition of LATS require further improvement in terms of both activity, selectivity and drug-like properties. Through scaffold hopping aided by docking studies and AI-assisted prediction of metabolic stabilities, we successfully identified an advanced LATS inhibitor demonstrating potent kinase activity, exceptional selectivity against other kinases, and superior oral pharmacokinetic profiles.
Collapse
Affiliation(s)
- Guldana Issabayeva
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology 217 Gajeong-ro Daejeon 34113 Republic of Korea
| | - On-Yu Kang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
| | - Seong Yun Choi
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology 217 Gajeong-ro Daejeon 34113 Republic of Korea
| | - Ji Young Hyun
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology 217 Gajeong-ro Daejeon 34113 Republic of Korea
| | - Seong Jun Park
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology 217 Gajeong-ro Daejeon 34113 Republic of Korea
| | - Hei-Cheul Jeung
- Department of Medical Oncology, Yonsei University College of Medicine 211 Eonju-ro, Gangnam-gu Seoul 06273 Republic of Korea
| | - Hwan Jung Lim
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology 141 Gajeong-ro Daejeon 34114 Republic of Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology 217 Gajeong-ro Daejeon 34113 Republic of Korea
| |
Collapse
|
7
|
Li Y, Liang W, Peng L, Zhang D, Yang C, Li KC. Predicting Drug-Target Interactions Via Dual-Stream Graph Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:948-958. [PMID: 36074878 DOI: 10.1109/tcbb.2022.3204188] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Drug target interaction prediction is a crucial stage in drug discovery. However, brute-force search over a compound database is financially infeasible. We have witnessed the increasing measured drug-target interactions records in recent years, and the rich drug/protein-related information allows the usage of graph machine learning. Despite the advances in deep learning-enabled drug-target interaction, there are still open challenges: (1) rich and complex relationship between drugs and proteins can be explored; (2) the intermediate node is not calibrated in the heterogeneous graph. To tackle with above issues, this paper proposed a framework named DSG-DTI. Specifically, DSG-DTI has the heterogeneous graph autoencoder and heterogeneous attention network-based Matrix Completion. Our framework ensures that the known types of nodes (e.g., drug, target, side effects, diseases) are precisely embedded into high-dimensional space with our pretraining skills. Also, the attention-based heterogeneous graph-based matrix completion achieves highly competitive results via effective long-range dependencies extraction. We verify our model on two public benchmarks. The result of two publicly available benchmark application programs show that the proposed scheme effectively predicts drug-target interactions and can generalize to newly registered drugs and targets with slight performance degradation, outperforming the best accuracy compared with other baselines.
Collapse
|
8
|
Zhang Y, Li J, Lin S, Zhao J, Xiong Y, Wei DQ. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model. J Cheminform 2024; 16:67. [PMID: 38849874 PMCID: PMC11162000 DOI: 10.1186/s13321-024-00862-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 05/19/2024] [Indexed: 06/09/2024] Open
Abstract
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Jianwei Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China.
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
| |
Collapse
|
9
|
Chen X, Huang J, Shen T, Zhang H, Xu L, Yang M, Xie X, Yan Y, Yan J. DEAttentionDTA: protein-ligand binding affinity prediction based on dynamic embedding and self-attention. Bioinformatics 2024; 40:btae319. [PMID: 38897656 PMCID: PMC11193059 DOI: 10.1093/bioinformatics/btae319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/23/2024] [Accepted: 06/17/2024] [Indexed: 06/21/2024] Open
Abstract
MOTIVATION Predicting protein-ligand binding affinity is crucial in new drug discovery and development. However, most existing models rely on acquiring 3D structures of elusive proteins. Combining amino acid sequences with ligand sequences and better highlighting active sites are also significant challenges. RESULTS We propose an innovative neural network model called DEAttentionDTA, based on dynamic word embeddings and a self-attention mechanism, for predicting protein-ligand binding affinity. DEAttentionDTA takes the 1D sequence information of proteins as input, including the global sequence features of amino acids, local features of the active pocket site, and linear representation information of the ligand molecule in the SMILE format. These three linear sequences are fed into a dynamic word-embedding layer based on a 1D convolutional neural network for embedding encoding and are correlated through a self-attention mechanism. The output affinity prediction values are generated using a linear layer. We compared DEAttentionDTA with various mainstream tools and achieved significantly superior results on the same dataset. We then assessed the performance of this model in the p38 protein family. AVAILABILITY AND IMPLEMENTATION The resource codes are available at https://github.com/whatamazing1/DEAttentionDTA.
Collapse
Affiliation(s)
- Xiying Chen
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinsha Huang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Tianqiao Shen
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Houjin Zhang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Li Xu
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Min Yang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xiaoman Xie
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yunjun Yan
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinyong Yan
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
10
|
Zhang Q, Zuo L, Ren Y, Wang S, Wang W, Ma L, Zhang J, Xia B. FMCA-DTI: a fragment-oriented method based on a multihead cross attention mechanism to improve drug-target interaction prediction. Bioinformatics 2024; 40:btae347. [PMID: 38810106 PMCID: PMC11256963 DOI: 10.1093/bioinformatics/btae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTI) is crucial in drug discovery. Fragments are less complex and can accurately characterize local features, which is important in DTI prediction. Recently, deep learning (DL)-based methods predict DTI more efficiently. However, two challenges remain in existing DL-based methods: (i) some methods directly encode drugs and proteins into integers, ignoring the substructure representation; (ii) some methods learn the features of the drugs and proteins separately instead of considering their interactions. RESULTS In this article, we propose a fragment-oriented method based on a multihead cross attention mechanism for predicting DTI, named FMCA-DTI. FMCA-DTI obtains multiple types of fragments of drugs and proteins by branch chain mining and category fragment mining. Importantly, FMCA-DTI utilizes the shared-weight-based multihead cross attention mechanism to learn the complex interaction features between different fragments. Experiments on three benchmark datasets show that FMCA-DTI achieves significantly improved performance by comparing it with four state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION The code for this workflow is available at: https://github.com/jacky102022/FMCA-DTI.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Le Zuo
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Ying Ren
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Siyuan Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Wenfa Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Lerong Ma
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Jing Zhang
- Medical College of Yan'an University, Yan'an University, Yan'an 716000, China
- Medical Research and Experimental Center, The Second Affiliated Hospital of Xi'an Medical University, Xi'an 710021, China
| | - Bisheng Xia
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| |
Collapse
|
11
|
Feng BM, Zhang YY, Zhou XC, Wang JL, Feng YF. MolLoG: A Molecular Level Interpretability Model Bridging Local to Global for Predicting Drug Target Interactions. J Chem Inf Model 2024; 64:4348-4358. [PMID: 38709146 DOI: 10.1021/acs.jcim.4c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Developing new pharmaceuticals is a costly and time-consuming endeavor fraught with significant safety risks. A critical aspect of drug research and disease therapy is discerning the existence of interactions between drugs and proteins. The evolution of deep learning (DL) in computer science has been remarkably aided in this regard in recent years. Yet, two challenges remain: (i) balancing the extraction of profound, local cohesive characteristics while warding off gradient disappearance and (ii) globally representing and understanding the interactions between the drug and target local attributes, which is vital for delivering molecular level insights indispensable to drug development. In response to these challenges, we propose a DL network structure, MolLoG, primarily comprising two modules: local feature encoders (LFE) and global interactive learning (GIL). Within the LFE module, graph convolution networks and leap blocks capture the local features of drug and protein molecules, respectively. The GIL module enables the efficient amalgamation of feature information, facilitating the global learning of feature structural semantics and procuring multihead attention weights for abstract features stemming from two modalities, providing biologically pertinent explanations for black-box results. Finally, predictive outcomes are achieved by decoding the unified representation via a multilayer perceptron. Our experimental analysis reveals that MolLoG outperforms several cutting-edge baselines across four data sets, delivering superior overall performance and providing satisfactory results when elucidating various facets of drug-target interaction predictions.
Collapse
Affiliation(s)
- Bao-Ming Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yuan-Yuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Xiao-Chen Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Jin-Long Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yin-Fei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| |
Collapse
|
12
|
Zhou G, Qin Y, Hong Q, Li H, Chen H, Shen J. GEMF: a novel geometry-enhanced mid-fusion network for PLA prediction. Brief Bioinform 2024; 25:bbae333. [PMID: 38980371 PMCID: PMC11232467 DOI: 10.1093/bib/bbae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/04/2024] [Accepted: 06/26/2024] [Indexed: 07/10/2024] Open
Abstract
Accurate prediction of protein-ligand binding affinity (PLA) is important for drug discovery. Recent advances in applying graph neural networks have shown great potential for PLA prediction. However, existing methods usually neglect the geometric information (i.e. bond angles), leading to difficulties in accurately distinguishing different molecular structures. In addition, these methods also pose limitations in representing the binding process of protein-ligand complexes. To address these issues, we propose a novel geometry-enhanced mid-fusion network, named GEMF, to learn comprehensive molecular geometry and interaction patterns. Specifically, the GEMF consists of a graph embedding layer, a message passing phase, and a multi-scale fusion module. GEMF can effectively represent protein-ligand complexes as graphs, with graph embeddings based on physicochemical and geometric properties. Moreover, our dual-stream message passing framework models both covalent and non-covalent interactions. In particular, the edge-update mechanism, which is based on line graphs, can fuse both distance and angle information in the covalent branch. In addition, the communication branch consisting of multiple heterogeneous interaction modules is developed to learn intricate interaction patterns. Finally, we fuse the multi-scale features from the covalent, non-covalent, and heterogeneous interaction branches. The extensive experimental results on several benchmarks demonstrate the superiority of GEMF compared with other state-of-the-art methods.
Collapse
Affiliation(s)
- Guoqiang Zhou
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Yuke Qin
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Qiansen Hong
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Haoran Li
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| | - Huaming Chen
- School of Electrical and Computer Engineering, University of Sydney, Camperdown, NSW 2050, Australia
| | - Jun Shen
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| |
Collapse
|
13
|
Song C, Zhang L. Intelligent Design of Antithrombotic Peptide Targeting Collagen. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2024; 40:9661-9668. [PMID: 38664943 DOI: 10.1021/acs.langmuir.4c00543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Binding of blood components to collagen was proved to be a key step in thrombus formation. Intelligent Design of Protein Matcher (IDProMat), a neural network model, was then developed based on the principle of seq2seq to design an antithrombotic peptide targeting collagen. The encoding and decoding of peptide sequence data and the interaction patterns of peptide chains at the interface were studied, and then, IDProMat was applied to the design of peptides to cover collagen. The 99.3% decrease in seq2seq loss and 58.3% decrease in MLP loss demonstrated that IDProMat learned the interaction patterns between residues at the binding interface. An efficient peptide, LRWNSYY, was then designed using this model. Validations on its binding on collagen and its inhibition of platelet adhesion were obtained using docking, MD simulations, and experimental approaches.
Collapse
Affiliation(s)
- Changwei Song
- Department of Biochemical Engineering and Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (MOE), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, People's Republic of China
| | - Lin Zhang
- Department of Biochemical Engineering and Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (MOE), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, People's Republic of China
| |
Collapse
|
14
|
Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024; 173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Zhijun Fang
- School of Computer Science and Technology, Donghua University, Shanghai, China.
| |
Collapse
|
15
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
16
|
Hong Q, Zhou G, Qin Y, Shen J, Li H. SadNet: a novel multimodal fusion network for protein-ligand binding affinity prediction. Phys Chem Chem Phys 2024; 26:12880-12891. [PMID: 38625412 DOI: 10.1039/d3cp05664c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Protein-ligand binding affinity prediction plays an important role in the field of drug discovery. Existing deep learning-based approaches have significantly improved the efficiency of protein-ligand binding affinity prediction through their excellent inductive bias capability. However, these methods only focus on fragmented three-dimensional data, which truncates the integrity of pocket data, leading to the neglect of potential long-range interactions. In this paper, we propose a dual-stream framework, with amino acid sequence assisting the atomic data fusion for graph neural network (termed SadNet), to fuse both 3D atomic data and sequence data for more accurate prediction results. In detail, SadNet consists of a pocket module and a sequence module. The sequence module expands the "receptive field" of the pocket module through a mid-term virtual node fusion. To better integrate sequence-level information from the sequence module and 3D structural information from the pocket module, we incorporate structural information for each amino acid within the sequence module. Besides, to better understand the intrinsic relationship between sequences and 3D atomic information, our SadNet utilizes information stacking from both the early stage and later stage. Experimental results on publicly available benchmark datasets demonstrate the superiority of the proposed dual-stream approach over the state-of-the-art alternatives. The code of this work is available online at https://github.com/wardhong/SadNet.
Collapse
Affiliation(s)
- Qiansen Hong
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Guoqiang Zhou
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Yuke Qin
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Jun Shen
- University of Wollongong, Australia
| | | |
Collapse
|
17
|
Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024; 64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]
Abstract
A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.
Collapse
Affiliation(s)
- Emma Svensson
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden
| | - Pieter-Jan Hoedt
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Sepp Hochreiter
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Institute
of Advanced Research in Artificial Intelligence (IARAI), Vienna 1030, Austria
| | - Günter Klambauer
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
18
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
19
|
An H, Liu X, Cai W, Shao X. Explainable Graph Neural Networks with Data Augmentation for Predicting p Ka of C-H Acids. J Chem Inf Model 2024; 64:2383-2392. [PMID: 37706462 DOI: 10.1021/acs.jcim.3c00958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
The pKa of C-H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of pKa is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the pKa values of C-H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict pKa by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of pKa values when a specific atom was masked. This explainability was used to identify the key substituents for pKa. The model was evaluated on two data sets from the iBonD database. Dataset1 includes the experimental pKa values of C-H acids measured in DMSO, while dataset2 comprises the pKa values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
20
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
21
|
Mehta MJ, Kim HJ, Lim SB, Naito M, Miyata K. Recent Progress in the Endosomal Escape Mechanism and Chemical Structures of Polycations for Nucleic Acid Delivery. Macromol Biosci 2024; 24:e2300366. [PMID: 38226723 DOI: 10.1002/mabi.202300366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/22/2023] [Indexed: 01/17/2024]
Abstract
Nucleic acid-based therapies are seeing a spiralling surge. Stimuli-responsive polymers, especially pH-responsive ones, are gaining widespread attention because of their ability to efficiently deliver nucleic acids. These polymers can be synthesized and modified according to target requirements, such as delivery sites and the nature of nucleic acids. In this regard, the endosomal escape mechanism of polymer-nucleic acid complexes (polyplexes) remains a topic of considerable interest owing to various plausible escape mechanisms. This review describes current progress in the endosomal escape mechanism of polyplexes and state-of-the-art chemical designs for pH-responsive polymers. The importance is also discussed of the acid dissociation constant (i.e., pKa) in designing the new generation of pH-responsive polymers, along with assays to monitor and quantify the endosomal escape behavior. Further, the use of machine learning is addressed in pKa prediction and polymer design to find novel chemical structures for pH responsiveness. This review will facilitate the design of new pH-responsive polymers for advanced and efficient nucleic acid delivery.
Collapse
Affiliation(s)
- Mohit J Mehta
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Hyun Jin Kim
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
- Department of Biological Engineering, College of Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Sung Been Lim
- Department of Biological Sciences and Bioengineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Mitsuru Naito
- Department of Materials Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kanjiro Miyata
- Department of Materials Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
22
|
Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024; 64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Improving the generalization ability of scoring functions remains a major challenge in protein-ligand binding affinity prediction. Many machine learning methods are limited by their reliance on single-modal representations, hindering a comprehensive understanding of protein-ligand interactions. We introduce a graph-neural-network-based scoring function that utilizes a triplet contrastive learning loss to improve protein-ligand representations. In this model, three-dimensional complex representations and the fusion of two-dimensional ligand and coarse-grained pocket representations converge while distancing from decoy representations in latent space. After rigorous validation on multiple external data sets, our model exhibits commendable generalization capabilities compared to those of other deep learning-based scoring functions, marking it as a promising tool in the realm of drug discovery. In the future, our training framework can be extended to other biophysical- and biochemical-related problems such as protein-protein interaction and protein mutation prediction.
Collapse
Affiliation(s)
- Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Dandan Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Xiaoyang Qu
- School of Pharmacy and Medical Technology, Putian University, Putian 351100, P. R. China
- Key Laboratory of Pharmaceutical Analysis and Laboratory Medicine (Putian University), Fujian Province University, Putian 351100, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
23
|
Helal H, Firoz J, Bilbrey JA, Sprueill H, Herman KM, Krell MM, Murray T, Roldan ML, Kraus M, Li A, Das P, Xantheas SS, Choudhury S. Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units. J Chem Inf Model 2024; 64:1568-1580. [PMID: 38382011 DOI: 10.1021/acs.jcim.3c01312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Atomic structure prediction and associated property calculations are the bedrock of chemical physics. Since high-fidelity ab initio modeling techniques for computing the structure and properties can be prohibitively expensive, this motivates the development of machine-learning (ML) models that make these predictions more efficiently. Training graph neural networks over large atomistic databases introduces unique computational challenges, such as the need to process millions of small graphs with variable size and support communication patterns that are distinct from learning over large graphs, such as social networks. We demonstrate a novel hardware-software codesign approach to scale up the training of atomistic graph neural networks (GNN) for structure and property prediction. First, to eliminate redundant computation and memory associated with alternative padding techniques and to improve throughput via minimizing communication, we formulate the effective coalescing of the batches of variable-size atomistic graphs as the bin packing problem and introduce a hardware-agnostic algorithm to pack these batches. In addition, we propose hardware-specific optimizations, including a planner and vectorization for the gather-scatter operations targeted for Graphcore's Intelligence Processing Unit (IPU), as well as model-specific optimizations such as merged communication collectives and optimized softplus. Putting these all together, we demonstrate the effectiveness of the proposed codesign approach by providing an implementation of a well-established atomistic GNN on the Graphcore IPUs. We evaluate the training performance on multiple atomistic graph databases with varying degrees of graph counts, sizes, and sparsity. We demonstrate that such a codesign approach can reduce the training time of atomistic GNNs and can improve their performance by up to 1.5× compared to the baseline implementation of the model on the IPUs. Additionally, we compare our IPU implementation with a Nvidia GPU-based implementation and show that our atomistic GNN implementation on the IPUs can run 1.8× faster on average compared to the execution time on the GPUs.
Collapse
Affiliation(s)
- Hatem Helal
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Jesun Firoz
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 1100 Dexter Ave N, Seattle, Washington 98109, United States
| | - Jenna A Bilbrey
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Henry Sprueill
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Kristina M Herman
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
| | | | - Tom Murray
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | | | - Mike Kraus
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Ang Li
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Payel Das
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Sotiris S Xantheas
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Sutanay Choudhury
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| |
Collapse
|
24
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
25
|
Zhang Y, Chu Y, Lin S, Xiong Y, Wei DQ. ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler. Brief Bioinform 2024; 25:bbae103. [PMID: 38517693 PMCID: PMC10959163 DOI: 10.1093/bib/bbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/04/2024] [Accepted: 02/23/2024] [Indexed: 03/24/2024] Open
Abstract
Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
26
|
Yu Z, Wu Z, Wang Z, Wang Y, Zhou M, Li W, Liu G, Tang Y. Network-Based Methods and Their Applications in Drug Discovery. J Chem Inf Model 2024; 64:57-75. [PMID: 38150548 DOI: 10.1021/acs.jcim.3c01613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Drug discovery is time-consuming, expensive, and predominantly follows the "one drug → one target → one disease" paradigm. With the rapid development of systems biology and network pharmacology, a novel drug discovery paradigm, "multidrug → multitarget → multidisease", has emerged. This new holistic paradigm of drug discovery aligns well with the essence of networks, leading to the emergence of network-based methods in the field of drug discovery. In this Perspective, we initially introduce the concept and data sources of networks and highlight classical methodologies employed in network-based methods. Subsequently, we focus on the practical applications of network-based methods across various areas of drug discovery, such as target prediction, virtual screening, prediction of drug therapeutic effects or adverse drug events, and elucidation of molecular mechanisms. In addition, we provide representative web servers for researchers to use network-based methods in specific applications. Finally, we discuss several challenges of network-based methods and the directions for future development. In a word, network-based methods could serve as powerful tools to accelerate drug discovery.
Collapse
Affiliation(s)
- Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Ze Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Moran Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
27
|
Wang DD, Wu W, Wang R. Structure-based, deep-learning models for protein-ligand binding affinity prediction. J Cheminform 2024; 16:2. [PMID: 38173000 PMCID: PMC10765576 DOI: 10.1186/s13321-023-00795-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 12/10/2023] [Indexed: 01/05/2024] Open
Abstract
The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structure-based, deep-learning approaches for this problem, focusing on molecular representations, learning architectures and model interpretability. A model taxonomy has been generated. To compensate for the lack of valid comparisons among those models, we realized and evaluated representatives from a uniform basis, with the advantages and shortcomings discussed. This review will potentially benefit structure-based drug discovery and related areas.
Collapse
Affiliation(s)
- Debby D Wang
- School of Science and Technology, Hong Kong Metropolitan University, 81 Chung Hau Sreet, Ho Man Tin, Hong Kong, China
| | - Wenhui Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China
| | - Ran Wang
- School of Mathematical Science, Shenzhen University, Shenzhen, 518060, China.
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China.
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen , 518060, China.
| |
Collapse
|
28
|
Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024; 30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]
Abstract
INTRODUCTION Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Qianle Liang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| |
Collapse
|
29
|
Zhou H, Fu H, Shao X, Cai W. Binding Thermodynamics of Fourth-Generation EGFR Inhibitors Revealed by Absolute Binding Free Energy Calculations. J Chem Inf Model 2023; 63:7837-7846. [PMID: 38054791 DOI: 10.1021/acs.jcim.3c01636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The overexpression or mutation of the kinase domain of the epidermal growth factor receptor (EGFR) is strongly associated with non-small-cell lung cancer (NSCLC). EGFR tyrosine kinase inhibitors (TKIs) have proven to be effective in treating NSCLC patients. However, EGFR mutations can result in drug resistance. To elucidate the mechanisms underlying this resistance and inform future drug development, we examined the binding affinities of BLU-945, a recently reported fourth-generation TKI, to wild-type EGFR (EGFRWT) and its double-mutant (L858R/T790M; EGFRDM) and triple-mutant (L858R/T790M/C797S; EGFRTM) forms. We compared the binding affinities of BLU-945, BLU-945 analogues, CH7233163 (another fourth-generation TKI), and erlotinib (a first-generation TKI) using absolute binding free energy calculations. Our findings reveal that BLU-945 and CH7233163 exhibit binding affinities to both EGFRDM and EGFRTM stronger than those of erlotinib, corroborating experimental data. We identified K745 and T854 as the key residues in the binding of fourth-generation EGFR TKIs. Electrostatic forces were the predominant driving force for the binding of fourth-generation TKIs to EGFR mutants. Furthermore, we discovered that the incorporation of piperidinol and sulfone groups in BLU-945 substantially enhanced its binding capacity to EGFR mutants. Our study offers valuable theoretical insights for optimizing fourth-generation EGFR TKIs.
Collapse
Affiliation(s)
- Huaxin Zhou
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| |
Collapse
|
30
|
Park YJ, Kim H, Jo J, Yoon S. Deep contrastive learning of molecular conformation for efficient property prediction. NATURE COMPUTATIONAL SCIENCE 2023; 3:1015-1022. [PMID: 38177719 DOI: 10.1038/s43588-023-00560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 10/31/2023] [Indexed: 01/06/2024]
Abstract
Data-driven deep learning algorithms provide accurate prediction of high-level quantum-chemical molecular properties. However, their inputs must be constrained to the same quantum-chemical level of geometric relaxation as the training dataset, limiting their flexibility. Adopting alternative cost-effective conformation generative methods introduces domain-shift problems, deteriorating prediction accuracy. Here we propose a deep contrastive learning-based domain-adaptation method called Local Atomic environment Contrastive Learning (LACL). LACL learns to alleviate the disparities in distribution between the two geometric conformations by comparing different conformation-generation methods. We found that LACL forms a domain-agnostic latent space that encapsulates the semantics of an atom's local atomic environment. LACL achieves quantum-chemical accuracy while circumventing the geometric relaxation bottleneck and could enable future application scenarios such as inverse molecular engineering and large-scale screening. Our approach is also generalizable from small organic molecules to long chains of biological and pharmacological molecules.
Collapse
Affiliation(s)
- Yang Jeong Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea.
- Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - HyunGi Kim
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
| | - Jeonghee Jo
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.
- Institute of New Media and Communications, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
31
|
Zhu Z, Yao Z, Zheng X, Qi G, Li Y, Mazur N, Gao X, Gong Y, Cong B. Drug-target affinity prediction method based on multi-scale information interaction and graph optimization. Comput Biol Med 2023; 167:107621. [PMID: 37907030 DOI: 10.1016/j.compbiomed.2023.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/02/2023]
Abstract
Drug-target affinity (DTA) prediction as an emerging and effective method is widely applied to explore the strength of drug-target interactions in drug development research. By predicting these interactions, researchers can assess the potential efficacy and safety of candidate drugs at an early stage, narrowing down the search space for therapeutic targets and accelerating the discovery and development of new drugs. However, existing DTA prediction models mainly use graphical representations of drug molecules, which lack information on interactions between individual substructures, thus affecting prediction accuracy and model interpretability. Therefore, transformer and diffusion on drug graphs in DTA prediction (TDGraphDTA) are introduced to predict drug-target interactions using multi-scale information interaction and graph optimization. An interactive module is integrated into feature extraction of drug and target features at different granularity levels. A diffusion model-based graph optimization module is proposed to improve the representation of molecular graph structures and enhance the interpretability of graph representations while obtaining optimal feature representations. In addition, TDGraphDTA improves the accuracy and reliability of predictions by capturing relationships and contextual information between molecular substructures. The performance of the proposed TDGraphDTA in DTA prediction was verified on three publicly available benchmark datasets (Davis, Metz, and KIBA). Compared with state-of-the-art baseline models, it achieved better results in terms of consistency index, R-squared, etc. Furthermore, compared with some existing methods, the proposed TDGraphDTA is demonstrated to have better structure capturing capabilities by visualizing the feature capturing capabilities of the model using Grad-AAM toxicity labels in the ToxCast dataset. The corresponding source codes are available at https://github.com/Lamouryz/TDGraph.
Collapse
Affiliation(s)
- Zhiqin Zhu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Zheng Yao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Xin Zheng
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Guanqiu Qi
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Yuanyuan Li
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Neal Mazur
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Xinbo Gao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yifei Gong
- Faculty of applied science & engineering, the Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto at Toronto, ON M5S, Canada.
| | - Baisen Cong
- Diagnostics Digital, DH(Shanghai) Diagnostics Co, Ltd, a Danaher company, Shanghai, 200335, China.
| |
Collapse
|
32
|
Wang S, Tang H, Shan P, Wu Z, Zuo L. ProS-GNN: Predicting effects of mutations on protein stability using graph neural networks. Comput Biol Chem 2023; 107:107952. [PMID: 37643501 DOI: 10.1016/j.compbiolchem.2023.107952] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 08/31/2023]
Abstract
Predicting protein stability change upon variation through a computational approach is a valuable tool to unveil the mechanisms of mutation-induced drug failure and develop immunotherapy strategies. Some previous machine learning-based techniques exhibit anti-symmetric bias toward destabilizing situations, whereas others struggle with generalization to unseen examples. To address these issues, we propose a gated graph neural network-based approach to predict changes in protein stability upon mutation. The model uses message passing to encode the links between the molecular structure and property after eliminating the non-mutant structure and creating input feature vectors. While doing so, it also incorporates the coordinates of the raw atoms to provide spatial insights into the chemical systems. We test the model on the Ssym, Myoglobin, Broom, and p53 datasets to demonstrate the generalization performance. Compared to existing approaches, our proposed method achieves improved linearity with symmetry in less time. The code for this study is available at: https://github.com/HongzhouTang/Pros-GNN.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China.
| | - Hongzhou Tang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Peng Shan
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Zhaoxia Wu
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Lei Zuo
- Department of Marine Engineering, University of Michigan, Ann Arbor 48109, USA
| |
Collapse
|
33
|
Zhang X, Li Y, Wang J, Xu G, Gu Y. A Multi-perspective Model for Protein-Ligand-Binding Affinity Prediction. Interdiscip Sci 2023; 15:696-709. [PMID: 37815680 DOI: 10.1007/s12539-023-00582-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/09/2023] [Accepted: 07/13/2023] [Indexed: 10/11/2023]
Abstract
Gathering information from multi-perspective graphs is an essential issue for many applications especially for protein-ligand-binding affinity prediction. Most of traditional approaches obtained such information individually with low interpretability. In this paper, we harness the rich information from multi-perspective graphs with a general model, which abstractly represents protein-ligand complexes with better interpretability while achieving excellent predictive performance. In addition, we specially analyze the protein-ligand-binding affinity problem, taking into account the heterogeneity of proteins and ligands. Experimental evaluations demonstrate the effectiveness of our data representation strategy on public datasets by fusing information from different perspectives. All codes are available in the https://github.com/Jthy-af/HaPPy .
Collapse
Affiliation(s)
- Xianfeng Zhang
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing, 210023, China
| | - Yafei Li
- School of Chemistry and Materials Science, Nanjing Normal University, Nanjing, 210023, China
| | - Jinlan Wang
- School of Physics, Southeast University, Nanjing, 211189, China
| | - Guandong Xu
- School of Computer Science, University of Technology Sydney, Sydney, NSW 2008, Australia
| | - Yanhui Gu
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing, 210023, China.
| |
Collapse
|
34
|
Luo Y, Liu Y, Peng J. Calibrated geometric deep learning improves kinase-drug binding predictions. NAT MACH INTELL 2023; 5:1390-1401. [PMID: 38962391 PMCID: PMC11221792 DOI: 10.1038/s42256-023-00751-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 09/29/2023] [Indexed: 07/05/2024]
Abstract
Protein kinases regulate various cellular functions and hold significant pharmacological promise in cancer and other diseases. Although kinase inhibitors are one of the largest groups of approved drugs, much of the human kinome remains unexplored but potentially druggable. Computational approaches, such as machine learning, offer efficient solutions for exploring kinase-compound interactions and uncovering novel binding activities. Despite the increasing availability of three-dimensional (3D) protein and compound structures, existing methods predominantly focus on exploiting local features from one-dimensional protein sequences and two-dimensional molecular graphs to predict binding affinities, overlooking the 3D nature of the binding process. Here we present KDBNet, a deep learning algorithm that incorporates 3D protein and molecule structure data to predict binding affinities. KDBNet uses graph neural networks to learn structure representations of protein binding pockets and drug molecules, capturing the geometric and spatial characteristics of binding activity. In addition, we introduce an algorithm to quantify and calibrate the uncertainties of KDBNet's predictions, enhancing its utility in model-guided discovery in chemical or protein space. Experiments demonstrated that KDBNet outperforms existing deep learning models in predicting kinase-drug binding affinities. The uncertainties estimated by KDBNet are informative and well-calibrated with respect to prediction errors. When integrated with a Bayesian optimization framework, KDBNet enables data-efficient active learning and accelerates the exploration and exploitation of diverse high-binding kinase-drug pairs.
Collapse
Affiliation(s)
- Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Yang Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Jian Peng
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
35
|
Dong T, Yang Z, Zhou J, Chen CYC. Equivariant Flexible Modeling of the Protein-Ligand Binding Pose with Geometric Deep Learning. J Chem Theory Comput 2023; 19:8446-8459. [PMID: 37938978 DOI: 10.1021/acs.jctc.3c00273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Flexible modeling of the protein-ligand complex structure is a fundamental challenge for in silico drug development. Recent studies have improved commonly used docking tools by incorporating extra-deep learning-based steps. However, such strategies limit their accuracy and efficiency because they retain massive sampling pressure and lack consideration for flexible biomolecular changes. In this study, we propose FlexPose, a geometric graph network capable of direct flexible modeling of complex structures in Euclidean space without the following conventional sampling and scoring strategies. Our model adopts two key designs: scalar-vector dual feature representation and SE(3)-equivariant network, to manage dynamic structural changes, as well as two strategies: conformation-aware pretraining and weakly supervised learning, to boost model generalizability in unseen chemical space. Benefiting from these paradigms, our model dramatically outperforms all tested popular docking tools and recently advanced deep learning methods, especially in tasks involving protein conformation changes. We further investigate the impact of protein and ligand similarity on the model performance with two conformation-aware strategies. Moreover, FlexPose provides an affinity estimation and model confidence for postanalysis.
Collapse
Affiliation(s)
- Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Jun Zhou
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
36
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
37
|
Nguyen NQ, Park S, Gim M, Kang J. MulinforCPI: enhancing precision of compound-protein interaction prediction through novel perspectives on multi-level information integration. Brief Bioinform 2023; 25:bbad484. [PMID: 38180829 PMCID: PMC10768804 DOI: 10.1093/bib/bbad484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/15/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024] Open
Abstract
Forecasting the interaction between compounds and proteins is crucial for discovering new drugs. However, previous sequence-based studies have not utilized three-dimensional (3D) information on compounds and proteins, such as atom coordinates and distance matrices, to predict binding affinity. Furthermore, numerous widely adopted computational techniques have relied on sequences of amino acid characters for protein representations. This approach may constrain the model's ability to capture meaningful biochemical features, impeding a more comprehensive understanding of the underlying proteins. Here, we propose a two-step deep learning strategy named MulinforCPI that incorporates transfer learning techniques with multi-level resolution features to overcome these limitations. Our approach leverages 3D information from both proteins and compounds and acquires a profound understanding of the atomic-level features of proteins. Besides, our research highlights the divide between first-principle and data-driven methods, offering new research prospects for compound-protein interaction tasks. We applied the proposed method to six datasets: Davis, Metz, KIBA, CASF-2016, DUD-E and BindingDB, to evaluate the effectiveness of our approach.
Collapse
Affiliation(s)
- Ngoc-Quang Nguyen
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
| | - Sejeong Park
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
- AIGEN Sciences, 04778, Seoul, Korea
| | - Mogan Gim
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, 02841, Seoul, Korea
- AIGEN Sciences, 04778, Seoul, Korea
| |
Collapse
|
38
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
39
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
40
|
Zhang S, Liu Y, Xie L. A universal framework for accurate and efficient geometric deep learning of molecular systems. Sci Rep 2023; 13:19171. [PMID: 37932352 PMCID: PMC10628308 DOI: 10.1038/s41598-023-46382-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/31/2023] [Indexed: 11/08/2023] Open
Abstract
Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes. Recently, geometric deep learning, especially Graph Neural Networks, has shown promising performance in molecular science applications. However, most existing works often impose targeted inductive biases to a specific molecular system, and are inefficient when applied to macromolecules or large-scale tasks, thereby limiting their applications to many real-world problems. To address these challenges, we present PAMNet, a universal framework for accurately and efficiently learning the representations of three-dimensional (3D) molecules of varying sizes and types in any molecular system. Inspired by molecular mechanics, PAMNet induces a physics-informed bias to explicitly model local and non-local interactions and their combined effects. As a result, PAMNet can reduce expensive operations, making it time and memory efficient. In extensive benchmark studies, PAMNet outperforms state-of-the-art baselines regarding both accuracy and efficiency in three diverse learning tasks: small molecule properties, RNA 3D structures, and protein-ligand binding affinities. Our results highlight the potential for PAMNet in a broad range of molecular science applications.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
- Helen and Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, 10065, USA
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA.
- Helen and Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, 10065, USA.
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA.
| |
Collapse
|
41
|
Liu W, Chu Z, Yang C, Yang T, Yang Y, Wu H, Sun J. Discovery of potent STAT3 inhibitors using structure-based virtual screening, molecular dynamic simulation, and biological evaluation. Front Oncol 2023; 13:1287797. [PMID: 38023173 PMCID: PMC10652556 DOI: 10.3389/fonc.2023.1287797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Signal transducer and activator of transcription 3 (STAT3) is ubiquitously hyper-activated in numerous cancers, rendering it an appealing target for therapeutic intervention. Methods and results In this study, using structure-based virtual screening complemented by molecular dynamics simulations, we identified ten potential STAT3 inhibitors. The simulations pinpointed compounds 8, 9, and 10 as forming distinct hydrogen bonds with the SH2 domain of STAT3. In vitro cytotoxicity assays highlighted compound 4 as a potent inhibitor of gastric cancer cell proliferation across MGC803, KATO III, and NCI-N87 cell lines. Further cellular assays substantiated the ability of compound 4 to attenuate IL-6-mediated STAT3 phosphorylation at Tyr475. Additionally, oxygen consumption rate assays corroborated compound 4's deleterious effects on mitochondrial function. Discussion Collectively, our findings position compound 4 as a promising lead candidate warranting further exploration in the development of anti-gastric cancer therapeutics.
Collapse
Affiliation(s)
- Weifeng Liu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| | - Zhijie Chu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| | - Cheng Yang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| | - Tianbao Yang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| | - Yanhui Yang
- Department of Emergency Trauma Surgery, First Affiliated Hospital, College of Clinical Medicine, Henan University of Science and Technology, Luoyang, Henan, China
| | - Haigang Wu
- School of Life Sciences, Henan University, Kaifeng, Henan, China
| | - Junjun Sun
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| |
Collapse
|
42
|
Wu J, Chen H, Cheng M, Xiong H. CurvAGN: Curvature-based Adaptive Graph Neural Networks for Predicting Protein-Ligand Binding Affinity. BMC Bioinformatics 2023; 24:378. [PMID: 37798653 PMCID: PMC10557336 DOI: 10.1186/s12859-023-05503-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/28/2023] [Indexed: 10/07/2023] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial for drug discovery. Recent advances in graph neural networks (GNNs) have made significant progress in learning representations of protein-ligand complexes to estimate binding affinities. To improve the performance of GNNs, there frequently needs to look into protein-ligand complexes from geometric perspectives. While the "off-the-shelf" GNNs could incorporate some basic geometric structures of molecules, such as distances and angles, through modeling the complexes as homophilic graphs, these solutions seldom take into account the higher-level geometric attributes like curvatures and homology, and also heterophilic interactions.To address these limitations, we introduce the Curvature-based Adaptive Graph Neural Network (CurvAGN). This GNN comprises two components: a curvature block and an adaptive attention guided neural block (AGN). The curvature block encodes multiscale curvature informaton, then the AGN, based on an adaptive graph attention mechanism, incorporates geometry structure including angle, distance, and multiscale curvature, long-range molecular interactions, and heterophily of the graph into the protein-ligand complex representation. We demonstrate the superiority of our proposed model through experiments conducted on the PDBbind-V2016 core dataset.
Collapse
Affiliation(s)
- Jianqiu Wu
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Minhao Cheng
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Jiulongwan, Hongkong, 999077, China
| | - Haoyi Xiong
- Big Data Lab, Baidu Inc., Haidian, Beijing, 100080, China
| |
Collapse
|
43
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
44
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
45
|
Shi Y, Zhang X, Yang Y, Cai T, Peng C, Wu L, Zhou L, Han J, Ma M, Zhu W, Xu Z. D3CARP: a comprehensive platform with multiple-conformation based docking, ligand similarity search and deep learning approaches for target prediction and virtual screening. Comput Biol Med 2023; 164:107283. [PMID: 37536095 DOI: 10.1016/j.compbiomed.2023.107283] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 07/15/2023] [Accepted: 07/28/2023] [Indexed: 08/05/2023]
Abstract
Resource- and time-consuming biological experiments are unavoidable in traditional drug discovery, which have directly driven the evolution of various computational algorithms and tools for drug-target interaction (DTI) prediction. For improving the prediction reliability, a comprehensive platform is highly expected as some previously reported webservers are small in scale, single-method, or even out of service. In this study, we integrated the multiple-conformation based docking, 2D/3D ligand similarity search and deep learning approaches to construct a comprehensive webserver, namely D3CARP, for target prediction and virtual screening. Specifically, 9352 conformations with positive control of 1970 targets were used for molecular docking, and approximately 2 million target-ligand pairs were used for 2D/3D ligand similarity search and deep learning. Besides, the positive compounds were added as references, and related diseases of therapeutic targets were annotated for further disease-based DTI study. The accuracies of the molecular docking and deep learning approaches were 0.44 and 0.89, respectively. And the average accuracy of five ligand similarity searches was 0.94. The strengths of D3CARP encompass the support for multiple computational methods, ensemble docking, utilization of positive controls as references, cross-validation of predicted outcomes, diverse disease types, and broad applicability in drug discovery. The D3CARP is freely accessible at https://www.d3pharma.com/D3CARP/index.php.
Collapse
Affiliation(s)
- Yulong Shi
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinben Zhang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Yanqing Yang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingting Cai
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Cheng Peng
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Leyun Wu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liping Zhou
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Han
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Minfei Ma
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weiliang Zhu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhijian Xu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
46
|
Zhang Y, Hu Y, Han N, Yang A, Liu X, Cai H. A survey of drug-target interaction and affinity prediction methods via graph neural networks. Comput Biol Med 2023; 163:107136. [PMID: 37329615 DOI: 10.1016/j.compbiomed.2023.107136] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/29/2023] [Accepted: 06/04/2023] [Indexed: 06/19/2023]
Abstract
The tasks of drug-target interaction (DTI) and drug-target affinity (DTA) prediction play important roles in the field of drug discovery. However, biological experiment-based methods are time-consuming and expensive. Recently, computational-based approaches have accelerated the process of drug-target relationship prediction. Drug and target features are represented in structure-based, sequence-based, and graph-based ways. Although some achievements have been made regarding structure-based representations and sequence-based representations, the acquired feature information is not sufficiently rich. Molecular graph-based representations are some of the more popular approaches, and they have also generated a great deal of interest. In this article, we provide an overview of the DTI prediction and DTA prediction tasks based on graph neural networks (GNNs). We briefly discuss the molecular graphs of drugs, the primary sequences of target proteins, and the graph reSLBpresentations of target proteins. Meanwhile, we conducted experiments on various fundamental datasets to substantiate the plausibility of DTI and DTA utilizing graph neural networks.
Collapse
Affiliation(s)
- Yue Zhang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China.
| | - Yuqing Hu
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Na Han
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Aqing Yang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Xiaoyong Liu
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
47
|
Wu F, Wu L, Radev D, Xu J, Li SZ. Integration of pre-trained protein language models into geometric deep learning networks. Commun Biol 2023; 6:876. [PMID: 37626165 PMCID: PMC10457366 DOI: 10.1038/s42003-023-05133-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Collapse
Affiliation(s)
- Fang Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Lirong Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
| | - Jinbo Xu
- Institute of AI Industry Research, Tsinghua University, Haidian Street, 100084, Beijing, China
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Stan Z Li
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China.
| |
Collapse
|
48
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
49
|
Abstract
Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Harvard Medical School and Physics Department, Harvard University, Boston, Massachusetts, USA;
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Current affiliation: Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| |
Collapse
|
50
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|