1
|
Bhatt R, Koes DR, Durrant JD. CENsible: Interpretable Insights into Small-Molecule Binding with Context Explanation Networks. J Chem Inf Model 2024; 64:4651-4660. [PMID: 38847393 PMCID: PMC11200255 DOI: 10.1021/acs.jcim.4c00825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 05/28/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024]
Abstract
We present a novel and interpretable approach for assessing small-molecule binding using context explanation networks. Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of precalculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each precalculated term to the final affinity prediction, with implications for subsequent lead optimization.
Collapse
Affiliation(s)
- Roshni Bhatt
- Department
of Computational and Systems Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
- Department
of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David Ryan Koes
- Department
of Computational and Systems Biology, University
of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacob D. Durrant
- Department
of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
2
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
3
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
4
|
Wang DD, Wu W, Wang R. Structure-based, deep-learning models for protein-ligand binding affinity prediction. J Cheminform 2024; 16:2. [PMID: 38173000 PMCID: PMC10765576 DOI: 10.1186/s13321-023-00795-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 12/10/2023] [Indexed: 01/05/2024] Open
Abstract
The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structure-based, deep-learning approaches for this problem, focusing on molecular representations, learning architectures and model interpretability. A model taxonomy has been generated. To compensate for the lack of valid comparisons among those models, we realized and evaluated representatives from a uniform basis, with the advantages and shortcomings discussed. This review will potentially benefit structure-based drug discovery and related areas.
Collapse
Affiliation(s)
- Debby D Wang
- School of Science and Technology, Hong Kong Metropolitan University, 81 Chung Hau Sreet, Ho Man Tin, Hong Kong, China
| | - Wenhui Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China
| | - Ran Wang
- School of Mathematical Science, Shenzhen University, Shenzhen, 518060, China.
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518060, China.
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen , 518060, China.
| |
Collapse
|
5
|
Li B, Wang Y, Yin Z, Xu L, Xie L, Xu X. Decision tree-based identification of important molecular fragments for protein-ligand binding. Chem Biol Drug Des 2024; 103:e14427. [PMID: 38230776 DOI: 10.1111/cbdd.14427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Fragment-based drug design is an emerging technology in pharmaceutical research and development. One of the key aspects of this technology is the identification and quantitative characterization of molecular fragments. This study presents a strategy for identifying important molecular fragments based on molecular fingerprints and decision tree algorithms and verifies its feasibility in predicting protein-ligand binding affinity. Specifically, the three-dimensional (3D) structures of protein-ligand complexes are encoded using extended-connectivity fingerprints (ECFP), and three decision tree models, namely Random Forest, XGBoost, and LightGBM, are used to quantitatively characterize the feature importance, thereby extracting important molecular fragments with high reliability. Few-shot learning reveals that the extracted molecular fragments contribute significantly and consistently to the binding affinity even with a small sample size. Despite the absence of location and distance information for molecular fragments in ECFP, 3D visualization, in combination with the reverse ECFP process, shows that the majority of the extracted fragments are located at the binding interface of the protein and the ligand. This alignment with the distance constraints critical for binding affinity further supports the reliability of the strategy for identifying important molecular fragments.
Collapse
Affiliation(s)
- Baiyi Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Yunsong Wang
- School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Zuode Yin
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
6
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
7
|
Dong T, Yang Z, Zhou J, Chen CYC. Equivariant Flexible Modeling of the Protein-Ligand Binding Pose with Geometric Deep Learning. J Chem Theory Comput 2023; 19:8446-8459. [PMID: 37938978 DOI: 10.1021/acs.jctc.3c00273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Flexible modeling of the protein-ligand complex structure is a fundamental challenge for in silico drug development. Recent studies have improved commonly used docking tools by incorporating extra-deep learning-based steps. However, such strategies limit their accuracy and efficiency because they retain massive sampling pressure and lack consideration for flexible biomolecular changes. In this study, we propose FlexPose, a geometric graph network capable of direct flexible modeling of complex structures in Euclidean space without the following conventional sampling and scoring strategies. Our model adopts two key designs: scalar-vector dual feature representation and SE(3)-equivariant network, to manage dynamic structural changes, as well as two strategies: conformation-aware pretraining and weakly supervised learning, to boost model generalizability in unseen chemical space. Benefiting from these paradigms, our model dramatically outperforms all tested popular docking tools and recently advanced deep learning methods, especially in tasks involving protein conformation changes. We further investigate the impact of protein and ligand similarity on the model performance with two conformation-aware strategies. Moreover, FlexPose provides an affinity estimation and model confidence for postanalysis.
Collapse
Affiliation(s)
- Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Jun Zhou
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
8
|
Kang S, Kim M, Sun J, Lee M, Min K. Prediction of Protein Aggregation Propensity via Data-Driven Approaches. ACS Biomater Sci Eng 2023; 9:6451-6463. [PMID: 37844262 DOI: 10.1021/acsbiomaterials.3c01001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
Abstract
Protein aggregation occurs when misfolded or unfolded proteins physically bind together and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures in the Protein Data Bank using Aggrescan3D 2.0. Moreover, feature- and graph-based models for predicting protein aggregation have been developed by using this database. The graph-based model outperformed the feature-based model, resulting in an R2 of 0.95, although it intrinsically required protein structures. Second, for the experimental data, a feature-based model was built using the Curated Protein Aggregation Database 2.0 to predict the aggregated intensity curves. In summary, this study suggests approaches that are more effective in predicting protein aggregation, depending on the type of descriptor and the database.
Collapse
Affiliation(s)
- Seungpyo Kang
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Minseon Kim
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Jiwon Sun
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea
| |
Collapse
|
9
|
Bhatt R, Koes DR, Durrant JD. CENsible: Interpretable Insights into Small-Molecule Binding with Context Explanation Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.18.562959. [PMID: 37904961 PMCID: PMC10614872 DOI: 10.1101/2023.10.18.562959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
We present a novel and interpretable approach for predicting small-molecule binding affinities using context explanation networks (CENs). Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of pre-calculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs. inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each pre-calculated term to the final affinity prediction, with implications for subsequent lead optimization.
Collapse
Affiliation(s)
- Roshni Bhatt
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Jacob D Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
10
|
Lee M, Min K. AmorProt: Amino Acid Molecular Fingerprints Repurposing-Based Protein Fingerprint. Biochemistry 2023; 62:2700-2709. [PMID: 37622182 DOI: 10.1021/acs.biochem.3c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data-driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing-based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree-based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature-based methods, (4) feature importance analysis, and (5) protein space analysis. Consequently, the significantly improved model performance and data-set-independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands.
Collapse
Affiliation(s)
- Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
11
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
12
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
13
|
Volkov M, Turk JA, Drizard N, Martin N, Hoffmann B, Gaston-Mathé Y, Rognan D. On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks. J Med Chem 2022; 65:7946-7958. [PMID: 35608179 DOI: 10.1021/acs.jmedchem.2c00487] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Accurate prediction of binding affinities from protein-ligand atomic coordinates remains a major challenge in early stages of drug discovery. Using modular message passing graph neural networks describing both the ligand and the protein in their free and bound states, we unambiguously evidence that an explicit description of protein-ligand noncovalent interactions does not provide any advantage with respect to ligand or protein descriptors. Simple models, inferring binding affinities of test samples from that of the closest ligands or proteins in the training set, already exhibit good performances, suggesting that memorization largely dominates true learning in the deep neural networks. The current study suggests considering only noncovalent interactions while omitting their protein and ligand atomic environments. Removing all hidden biases probably requires much denser protein-ligand training matrices and a coordinated effort of the drug design community to solve the necessary protein-ligand structures.
Collapse
Affiliation(s)
- Mikhail Volkov
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| | | | | | | | | | | | - Didier Rognan
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| |
Collapse
|
14
|
A novel graph convolutional neural network for predicting interaction sites on protein kinase inhibitors in phosphorylation. Sci Rep 2022; 12:229. [PMID: 34997142 PMCID: PMC8742007 DOI: 10.1038/s41598-021-04230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 12/14/2021] [Indexed: 11/08/2022] Open
Abstract
Protein kinase-inhibitor interactions are key to the phosphorylation of proteins involved in cell proliferation, differentiation, and apoptosis, which shows the importance of binding mechanism research and kinase inhibitor design. In this study, a novel machine learning module (i.e., the WL Box) was designed and assembled to the Prediction of Interaction Sites of Protein Kinase Inhibitors (PISPKI) model, which is a graph convolutional neural network (GCN) to predict the interaction sites of protein kinase inhibitors. The WL Box is a novel module based on the well-known Weisfeiler-Lehman algorithm, which assembles multiple switch weights to effectively compute graph features. The PISPKI model was evaluated by testing with shuffled datasets and ablation analysis using 11 kinase classes. The accuracy of the PISPKI model with the shuffled datasets varied from 83 to 86%, demonstrating superior performance compared to two baseline models. The effectiveness of the model was confirmed by testing with shuffled datasets. Furthermore, the performance of each component of the model was analyzed via the ablation study, which demonstrated that the WL Box module was critical. The code is available at https://github.com/feiqiwang/PISPKI .
Collapse
|
15
|
Wang Y, Wu S, Duan Y, Huang Y. A point cloud-based deep learning strategy for protein-ligand binding affinity prediction. Brief Bioinform 2021; 23:6440132. [PMID: 34849569 DOI: 10.1093/bib/bbab474] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 10/15/2021] [Indexed: 01/14/2023] Open
Abstract
There is great interest to develop artificial intelligence-based protein-ligand binding affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied for protein-ligand binding affinity prediction for the first time. Three-dimensional point clouds could be rapidly generated from PDBbind-2016 with 3772 and 11 327 individual point clouds derived from the refined or/and general sets, respectively. These point clouds (the refined or the extended set) were used to train PointNet or PointTransformer, resulting in protein-ligand binding affinity prediction models with Pearson correlation coefficients R = 0.795 or 0.833 from the extended data set, respectively, based on the CASF-2016 benchmark test. The analysis of parameters suggests that the two deep learning models were capable to learn many interactions between proteins and their ligands, and some key atoms for the interactions could be visualized. The protein-ligand interaction features learned by PointTransformer could be further adapted for the XGBoost-based machine learning algorithm, resulting in prediction models with an average Rp of 0.827, which is on par with state-of-the-art machine learning models. These results suggest that the point clouds derived from PDBbind data sets are useful to evaluate the performance of 3D point clouds-centered deep learning algorithms, which could learn atomic features of protein-ligand interactions from natural evolution or medicinal chemistry and thus have wide applications in chemistry and biology.
Collapse
Affiliation(s)
- Yeji Wang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Shuo Wu
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Yanwen Duan
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,Hunan Engineering Research Center of Combinatorial Biosynthesis and Natural Product Drug Discover, Changsha, Hunan 410011, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| | - Yong Huang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| |
Collapse
|