1
|
Bogdanova EA, Novoseletsky VN. ProBAN: Neural network algorithm for predicting binding affinity in protein-protein complexes. Proteins 2024; 92:1127-1136. [PMID: 38722047 DOI: 10.1002/prot.26700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/22/2024] [Accepted: 04/26/2024] [Indexed: 08/07/2024]
Abstract
Determining binding affinities in protein-protein and protein-peptide complexes is a challenging task that directly impacts the development of peptide and protein pharmaceuticals. Although several models have been proposed to predict the value of the dissociation constant and the Gibbs free energy, they are currently not capable of making stable predictions with high accuracy, in particular for complexes consisting of more than two molecules. In this work, we present ProBAN, a new method for predicting binding affinity in protein-protein complexes based on a deep convolutional neural network. Prediction is carried out for the spatial structures of complexes, presented in the format of a 4D tensor, which includes information about the location of atoms and their abilities to participate in various types of interactions realized in protein-protein and protein-peptide complexes. The effectiveness of the model was assessed both on an internal test data set containing complexes consisting of three or more molecules, as well as on an external test for the PPI-Affinity service. As a result, we managed to achieve the best prediction quality on these data sets among all the analyzed models: on the internal test, Pearson correlation R = 0.6, MAE = 1.60, on the external test, R = 0.55, MAE = 1.75. The open-source code, the trained ProBAN model, and the collected dataset are freely available at the following link https://github.com/EABogdanova/ProBAN.
Collapse
|
2
|
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. Brief Bioinform 2024; 25:bbae281. [PMID: 38864340 PMCID: PMC11167311 DOI: 10.1093/bib/bbae281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/05/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein-ligand interaction models falter in GPCR-drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein-ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR-ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical-chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.
Collapse
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hongjie Fan
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
| | - Jixia Wang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Tao Hou
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Agharam Road 173, Selaiyur, Chennai, Tamil Nadu 600073, India
| | - Wei Xia
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hei Wun Kan
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - John Z H Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Xinmiao Liang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Yang Chen
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| |
Collapse
|
3
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
4
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
5
|
Wang H. Prediction of protein-ligand binding affinity via deep learning models. Brief Bioinform 2024; 25:bbae081. [PMID: 38446737 PMCID: PMC10939342 DOI: 10.1093/bib/bbae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/31/2024] [Indexed: 03/08/2024] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial in drug screening and optimization, but it is still a challenge in computer-aided drug design. The recent success of AlphaFold2 in predicting protein structures has brought new hope for deep learning (DL) models to accurately predict protein-ligand binding affinity. However, the current DL models still face limitations due to the low-quality database, inaccurate input representation and inappropriate model architecture. In this work, we review the computational methods, specifically DL-based models, used to predict protein-ligand binding affinity. We start with a brief introduction to protein-ligand binding affinity and the traditional computational methods used to calculate them. We then introduce the basic principles of DL models for predicting protein-ligand binding affinity. Next, we review the commonly used databases, input representations and DL models in this field. Finally, we discuss the potential challenges and future work in accurately predicting protein-ligand binding affinity via DL models.
Collapse
Affiliation(s)
- Huiwen Wang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang 471023, China
| |
Collapse
|
6
|
Shen T, Liu F, Wang Z, Sun J, Bu Y, Meng J, Chen W, Yao K, Mu Y, Li W, Zhao G, Wang S, Wei Y, Zheng L. zPoseScore model for accurate and robust protein-ligand docking pose scoring in CASP15. Proteins 2023; 91:1837-1849. [PMID: 37606194 DOI: 10.1002/prot.26573] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/20/2023] [Accepted: 07/31/2023] [Indexed: 08/23/2023]
Abstract
We introduce a deep learning-based ligand pose scoring model called zPoseScore for predicting protein-ligand complexes in the 15th Critical Assessment of Protein Structure Prediction (CASP15). Our contributions are threefold: first, we generate six training and evaluation data sets by employing advanced data augmentation and sampling methods. Second, we redesign the "zFormer" module, inspired by AlphaFold2's Evoformer, to efficiently describe protein-ligand interactions. This module enables the extraction of protein-ligand paired features that lead to accurate predictions. Finally, we develop the zPoseScore framework with zFormer for scoring and ranking ligand poses, allowing for atomic-level protein-ligand feature encoding and fusion to output refined ligand poses and ligand per-atom deviations. Our results demonstrate excellent performance on various testing data sets, achieving Pearson's correlation R = 0.783 and 0.659 for ranking docking decoys generated based on experimental and predicted protein structures of CASF-2016 protein-ligand complexes. Additionally, we obtain an averaged local distance difference test (lDDT pli = 0.558) of AIchemy LIG2 in CASP15 for de novo protein-ligand complex structure predictions. Detailed analysis shows that accurate ligand binding site prediction and side-chain orientation are crucial for achieving better prediction performance. Our proposed model is one of the most accurate protein-ligand pose prediction models and could serve as a valuable tool in small molecule drug discovery.
Collapse
Affiliation(s)
- Tao Shen
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Fuxu Liu
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong, China
| | - Jinyuan Sun
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yifan Bu
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Jintao Meng
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Weihua Chen
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Keyi Yao
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong, China
| | - Guoping Zhao
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Yanjie Wei
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| |
Collapse
|
7
|
Zhang W, Hu F, Li W, Yin P. Does protein pretrained language model facilitate the prediction of protein-ligand interaction? Methods 2023; 219:8-15. [PMID: 37690736 DOI: 10.1016/j.ymeth.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 08/22/2023] [Accepted: 08/29/2023] [Indexed: 09/12/2023] Open
Abstract
Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at https://github.com/brian-zZZ/PLM-PLI.
Collapse
Affiliation(s)
- Weihong Zhang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Wang Li
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
8
|
Domingo L, Djukic M, Johnson C, Borondo F. Binding affinity predictions with hybrid quantum-classical convolutional neural networks. Sci Rep 2023; 13:17951. [PMID: 37864075 PMCID: PMC10589342 DOI: 10.1038/s41598-023-45269-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 10/17/2023] [Indexed: 10/22/2023] Open
Abstract
Central in drug design is the identification of biomolecules that uniquely and robustly bind to a target protein, while minimizing their interactions with others. Accordingly, precise binding affinity prediction, enabling the accurate selection of suitable candidates from an extensive pool of potential compounds, can greatly reduce the expenses associated to practical experimental protocols. In this respect, recent advances revealed that deep learning methods show superior performance compared to other traditional computational methods, especially with the advent of large datasets. These methods, however, are complex and very time-intensive, thus representing an important clear bottleneck for their development and practical application. In this context, the emerging realm of quantum machine learning holds promise for enhancing numerous classical machine learning algorithms. In this work, we take one step forward and present a hybrid quantum-classical convolutional neural network, which is able to reduce by 20% the complexity of the classical counterpart while still maintaining optimal performance in the predictions. Additionally, this results in a significant cost and time savings of up to 40% in the training stage, which means a substantial speed-up of the drug design process.
Collapse
Affiliation(s)
- L Domingo
- Grupo de Sistemas Complejos, Universidad Politécnica de Madrid, 28035, Madrid, Spain.
- Instituto de Ciencias Matemáticas (ICMAT), Campus de Cantoblanco UAM, Nicolás Cabrera, 13-15, 28049, Madrid, Spain.
- Departamento de Química, Universidad Autónoma de Madrid, 28049, Cantoblanco, Madrid, Spain.
- Ingenii Inc., New York, USA.
| | | | | | - F Borondo
- Departamento de Química, Universidad Autónoma de Madrid, 28049, Cantoblanco, Madrid, Spain
| |
Collapse
|
9
|
Abstract
A survey of protein databases indicates that the majority of enzymes exist in oligomeric forms, with about half of those found in the UniProt database being homodimeric. Understanding why many enzymes are in their dimeric form is imperative. Recent developments in experimental and computational techniques have allowed for a deeper comprehension of the cooperative interactions between the subunits of dimeric enzymes. This review aims to succinctly summarize these recent advancements by providing an overview of experimental and theoretical methods, as well as an understanding of cooperativity in substrate binding and the molecular mechanisms of cooperative catalysis within homodimeric enzymes. Focus is set upon the beneficial effects of dimerization and cooperative catalysis. These advancements not only provide essential case studies and theoretical support for comprehending dimeric enzyme catalysis but also serve as a foundation for designing highly efficient catalysts, such as dimeric organic catalysts. Moreover, these developments have significant implications for drug design, as exemplified by Paxlovid, which was designed for the homodimeric main protease of SARS-CoV-2.
Collapse
Affiliation(s)
- Ke-Wei Chen
- Lab of Computional Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Tian-Yu Sun
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| | - Yun-Dong Wu
- Lab of Computional Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| |
Collapse
|
10
|
Abstract
Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Harvard Medical School and Physics Department, Harvard University, Boston, Massachusetts, USA;
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Current affiliation: Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| |
Collapse
|
11
|
Zhao X, Wang X, Jin Z, Wang R. A normalized differential sequence feature encoding method based on amino acid sequences. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14734-14755. [PMID: 37679156 DOI: 10.3934/mbe.2023659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the performance of protein interaction prediction, a coding method based on normalized difference sequence characteristics (NDSF) of amino acid sequences is proposed. By using the positional relationships between amino acids in the sequences and the correlation characteristics between sequence pairs, NDSF is jointly encoded. Using principal component analysis (PCA) and local linear embedding (LLE) dimensionality reduction methods, the coded 174-dimensional human protein sequence vector is extracted using sequence features. This study compares the classification performance of four ensemble learning methods (AdaBoost, Extra trees, LightGBM, XGBoost) applied to PCA and LLE features. Cross-validation and grid search methods are used to find the best combination of parameters. The results show that the accuracy of NDSF is generally higher than that of the sequence matrix-based coding method (MOS) coding method, and the loss and coding time can be greatly reduced. The bar chart of feature extraction shows that the classification accuracy is significantly higher when using the linear dimensionality reduction method, PCA, compared to the nonlinear dimensionality reduction method, LLE. After classification with XGBoost, the model accuracy reaches 99.2%, which provides the best performance among all models. This study suggests that NDSF combined with PCA and XGBoost may be an effective strategy for classifying different human protein interactions.
Collapse
Affiliation(s)
- Xiaoman Zhao
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| | - Xue Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Zhou Jin
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Rujing Wang
- Institute of Intelligent Machinery, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- University of Science and Technology of China, Hefei 230026, Chin
| |
Collapse
|
12
|
Zhang H, Saravanan KM, Zhang JZH. DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction. Molecules 2023; 28:4691. [PMID: 37375246 PMCID: PMC10301867 DOI: 10.3390/molecules28124691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The core of large-scale drug virtual screening is to select the binders accurately and efficiently with high affinity from large libraries of small molecules in which non-binders are usually dominant. The binding affinity is significantly influenced by the protein pocket, ligand spatial information, and residue types/atom types. Here, we used the pocket residues or ligand atoms as the nodes and constructed edges with the neighboring information to comprehensively represent the protein pocket or ligand information. Moreover, the model with pre-trained molecular vectors performed better than the one-hot representation. The main advantage of DeepBindGCN is that it is independent of docking conformation, and concisely keeps the spatial information and physical-chemical features. Using TIPE3 and PD-L1 dimer as proof-of-concept examples, we proposed a screening pipeline integrating DeepBindGCN and other methods to identify strong-binding-affinity compounds. It is the first time a non-complex-dependent model has achieved a root mean square error (RMSE) value of 1.4190 and Pearson r value of 0.7584 in the PDBbind v.2016 core set, respectively, thereby showing a comparable prediction power with the state-of-the-art affinity prediction models that rely upon the 3D complex. DeepBindGCN provides a powerful tool to predict the protein-ligand interaction and can be used in many important large-scale virtual screening application scenarios.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India;
| | - John Z. H. Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
13
|
Masters MR, Mahmoud AH, Wei Y, Lill MA. Deep Learning Model for Efficient Protein-Ligand Docking with Implicit Side-Chain Flexibility. J Chem Inf Model 2023; 63:1695-1707. [PMID: 36916514 DOI: 10.1021/acs.jcim.2c01436] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Protein-ligand docking is an essential tool in structure-based drug design with applications ranging from virtual high-throughput screening to pose prediction for lead optimization. Most docking programs for pose prediction are optimized for redocking to an existing cocrystallized protein structure, ignoring protein flexibility. In real-world drug design applications, however, protein flexibility is an essential feature of the ligand-binding process. Flexible protein-ligand docking still remains a significant challenge to computational drug design. To target this challenge, we present a deep learning (DL) model for flexible protein-ligand docking based on the prediction of an intermolecular Euclidean distance matrix (EDM), making the typical use of iterative search algorithms obsolete. The model was trained on a large-scale data set of protein-ligand complexes and evaluated on independent test sets. Our model generates high quality poses for a diverse set of protein and ligand structures and outperforms comparable docking methods.
Collapse
Affiliation(s)
- Matthew R Masters
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Amr H Mahmoud
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Yao Wei
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Markus A Lill
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| |
Collapse
|
14
|
Murugesan A, Nguyen P, Ramesh T, Yli-Harja O, Kandhavelu M, Saravanan KM. Molecular modeling and dynamics studies of the synthetic small molecule agonists with GPR17 and P2Y1 receptor. J Biomol Struct Dyn 2022; 40:12908-12916. [PMID: 34542380 DOI: 10.1080/07391102.2021.1977707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The human Guanine Protein coupled membrane Receptor 17 (hGPR17), an orphan receptor that activates uracil nucleotides and cysteinyl leukotrienes is considered as a crucial target for the neurodegenerative diseases. Yet, the detailed molecular interaction of potential synthetic ligands of GPR17 needs to be characterized. Here, we have studied a comparative analysis on the interaction specificity of GPR17-ligands with hGPR17 and human purinergic G protein-coupled receptor (hP2Y1) receptors. Previously, we have simulated the interaction stability of synthetic ligands such as T0510.3657, AC1MLNKK, and MDL29951 with hGPR17 and hP2Y1 receptor in the lipid environment. In the present work, we have comparatively studied the protein-ligand interaction of hGPR17-T0510.3657 and P2Y1-MRS2500. Sequence analysis and structural superimposition of hGPR17 and hP2Y1 receptor revealed the similarities in the structural arrangement with the local backbone root mean square deviation (RMSD) value of 1.16 Å and global backbone RMSD value of 5.30 Å. The comparative receptor-ligand interaction analysis between hGPR17 and hP2Y1 receptor exposed the distinct binding sites in terms of geometrical properties. Further, the molecular docking of T0510.3657 with the hP2Y1 receptor have shown non-specific interaction. The experimental validation also revealed that Gi-coupled activation of GPR17 by specific ligands leads to the adenylyl cyclase inhibition, while there is no inhibition upon hP2Y1 activation. Overall, the above findings suggest that T0510.3657-GPR17 binding specificity could be further explored for the treatment of numerous neuronal diseases. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Akshaya Murugesan
- Molecular Signaling Lab, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,Department of Biotechnology, Lady Doak College, Thallakulam, Madurai, India
| | - Phung Nguyen
- Molecular Signaling Lab, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Thiyagarajan Ramesh
- Department of Basic Medical Sciences, College of Medicine, Prince Sattam Bin Abdulaziz University, Al Kharj, Kingdom of Saudi Arabia
| | - Olli Yli-Harja
- Computational Systems Biology Group, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,Institute for Systems Biology, Seattle, WA, USA
| | | | - Konda Mani Saravanan
- Scigen Research and Innovation Pvt Ltd, Periyar Technology Business Incubator, Thanjavur, Tamil Nadu, India
| |
Collapse
|
15
|
Limbu S, Dakshanamurthy S. A New Hybrid Neural Network Deep Learning Method for Protein-Ligand Binding Affinity Prediction and De Novo Drug Design. Int J Mol Sci 2022; 23:ijms232213912. [PMID: 36430386 PMCID: PMC9693376 DOI: 10.3390/ijms232213912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/25/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022] Open
Abstract
Accurately predicting ligand binding affinity in a virtual screening campaign is still challenging. Here, we developed hybrid neural network (HNN) machine deep learning methods, HNN-denovo and HNN-affinity, by combining the 3D-CNN (convolutional neural network) and the FFNN (fast forward neural network) hybrid neural network framework. The HNN-denovo uses protein pocket structure and protein-ligand interactions as input features. The HNN-affinity uses protein sequences and ligand features as input features. The HNN method combines the CNN and FCNN machine architecture for the protein structure or protein sequence and ligand descriptors. To train the model, the HNN methods used thousands of known protein-ligand binding affinity data retrieved from the PDBBind database. We also developed the Random Forest (RF), Gradient Boosting (GB), Decision Tree with AdaBoost (DT), and a consensus model. We compared the HNN results with models developed based on the RF, GB, and DT methods. We also independently compared the HNN method results with the literature reported deep learning protein-ligand binding affinity predictions made by the DLSCORE, KDEEP, and DeepAtom. The predictive performance of the HNN methods (max Pearson's R achieved was 0.86) was consistently better than or comparable to the DLSCORE, KDEEP, and DeepAtom deep learning learning methods for both balanced and unbalanced data sets. The HNN-affinity can be applied for the protein-ligand affinity prediction even in the absence of protein structure information, as it considers the protein sequence as standalone feature in addition to the ligand descriptors. The HNN-denovo method can be efficiently implemented to the structure-based de novo drug design campaign. The HNN-affinity method can be used in conjunction with the deep learning molecular docking protocols as a standalone. Further, it can be combined with the conventional molecular docking methods as a multistep approach to rapidly screen billions of diverse compounds. The HNN method are highly scalable in the cloud ML platform.
Collapse
|
16
|
Korlepara DB, Vasavi CS, Jeurkar S, Pal PK, Roy S, Mehta S, Sharma S, Kumar V, Muvva C, Sridharan B, Garg A, Modee R, Bhati AP, Nayar D, Priyakumar UD. PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications. Sci Data 2022; 9:548. [PMID: 36071074 PMCID: PMC9451116 DOI: 10.1038/s41597-022-01631-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/15/2022] [Indexed: 11/08/2022] Open
Abstract
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.
Collapse
Affiliation(s)
- Divya B Korlepara
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - C S Vasavi
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Subhajit Roy
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
- UM-DAE-Centre For Excellence In Basic Sciences, University of Mumbai, Vidyanagari, Mumbai, India
| | - Sarvesh Mehta
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Charuvaka Muvva
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Bhuvanesh Sridharan
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Akshit Garg
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Rohit Modee
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Agastya P Bhati
- Centre for Computational Science, Department of Chemistry, University College London, London, WC1H 0AJ, United Kingdom
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
17
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
18
|
Zhang H, Zhang T, Saravanan KM, Liao L, Wu H, Zhang H, Zhang H, Pan Y, Wu X, Wei Y. DeepBindBC: a practical deep learning method for identifying native-like protein-ligand complexes in virtual screening. Methods 2022; 205:247-262. [PMID: 35878751 DOI: 10.1016/j.ymeth.2022.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 06/29/2022] [Accepted: 07/12/2022] [Indexed: 12/18/2022] Open
Abstract
Identifying native-like protein-ligand complexes (PLCs) from an abundance of docking decoys is critical for large-scale virtual drug screening in early-stage drug discovery lead searching efforts. Providing reliable prediction is still a challenge for most current affinity predicting models because of a lack of non-binding data during model training, lost critical physical-chemical features, and difficulties in learning abstract information with limited neural layers. In this work, we proposed a deep learning model, DeepBindBC, for classifying putative ligands as binding or non-binding. Our model incorporates information on non-binding interactions, making it more suitable for real applications. ResNet model architecture and more detailed atom type representation guarantee implicit features can be learned more accurately. Here, we show that DeepBindBC outperforms Autodock Vina, Pafnucy, and DLSCORE for three DUD.E testing sets. Moreover, DeepBindBC identified a novel human pancreatic α-amylase binder validated by a fluorescence spectral experiment (Ka= 1.0×105 M). Furthermore, DeepBindBC can be used as a core component of a hybrid virtual screening pipeline that incorporating many other complementary methods, such as DFCNN, Autodock Vina docking, and pocket molecular dynamics simulation. Additionally, an online web server based on the model is available at http://cbblab.siat.ac.cn/DeepBindBC/index.php for the user's convenience. Our model and the web server provide alternative tools in the early steps of drug discovery by providing accurate identification of native-like PLCs.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, PR China; Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Tingting Zhang
- School of Medicine, Shenzhen University, Shenzhen, Guangdong Province 518060, PR China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India
| | - Linbu Liao
- College of Software Technology, Zhejiang University, Zhejiang Province 315048, PR China
| | - Hao Wu
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Haishan Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Huiling Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Xuli Wu
- School of Medicine, Shenzhen University, Shenzhen, Guangdong Province 518060, PR China.
| | - Yanjie Wei
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, PR China.
| |
Collapse
|
19
|
Zhang H, Gong X, Peng Y, Saravanan KM, Bian H, Zhang JZH, Wei Y, Pan Y, Yang Y. An Efficient Modern Strategy to Screen Drug Candidates Targeting RdRp of SARS-CoV-2 With Potentially High Selectivity and Specificity. Front Chem 2022; 10:933102. [PMID: 35903186 PMCID: PMC9315156 DOI: 10.3389/fchem.2022.933102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 06/06/2022] [Indexed: 01/18/2023] Open
Abstract
Desired drug candidates should have both a high potential binding chance and high specificity. Recently, many drug screening strategies have been developed to screen compounds with high possible binding chances or high binding affinity. However, there is still no good solution to detect whether those selected compounds possess high specificity. Here, we developed a reverse DFCNN (Dense Fully Connected Neural Network) and a reverse docking protocol to check a given compound’s ability to bind diversified targets and estimate its specificity with homemade formulas. We used the RNA-dependent RNA polymerase (RdRp) target as a proof-of-concept example to identify drug candidates with high selectivity and high specificity. We first used a previously developed hybrid screening method to find drug candidates from an 8888-size compound database. The hybrid screening method takes advantage of the deep learning-based method, traditional molecular docking, molecular dynamics simulation, and binding free energy calculated by metadynamics, which should be powerful in selecting high binding affinity candidates. Also, we integrated the reverse DFCNN and reversed docking against a diversified 102 proteins to the pipeline for assessing the specificity of those selected candidates, and finally got compounds that have both predicted selectivity and specificity. Among the eight selected candidates, Platycodin D and Tubeimoside III were confirmed to effectively inhibit SARS-CoV-2 replication in vitro with EC50 values of 619.5 and 265.5 nM, respectively. Our study discovered that Tubeimoside III could inhibit SARS-CoV-2 replication potently for the first time. Furthermore, the underlying mechanisms of Platycodin D and Tubeimoside III inhibiting SARS-CoV-2 are highly possible by blocking the RdRp cavity according to our screening procedure. In addition, the careful analysis predicted common critical residues involved in the binding with active inhibitors Platycodin D and Tubeimoside III, Azithromycin, and Pralatrexate, which hopefully promote the development of non-covalent binding inhibitors against RdRp.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- *Correspondence: Yang Yang, ; Haiping Zhang,
| | - Xiaohua Gong
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, State Key Discipline of Infectious Disease, Shenzhen Third People’s Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Yun Peng
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, State Key Discipline of Infectious Disease, Shenzhen Third People’s Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, , India
| | - Hengwei Bian
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry and Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, China
| | - John Z. H. Zhang
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yang Yang
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, State Key Discipline of Infectious Disease, Shenzhen Third People’s Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Yang Yang, ; Haiping Zhang,
| |
Collapse
|
20
|
Zhao Q, Yang M, Cheng Z, Li Y, Wang J. Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2092-2110. [PMID: 33769935 DOI: 10.1109/tcbb.2021.3069040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.
Collapse
|
21
|
Kaushal K, Sarma P, Rana SV, Medhi B, Naithani M. Emerging role of artificial intelligence in therapeutics for COVID-19: a systematic review. J Biomol Struct Dyn 2022; 40:4750-4765. [PMID: 33300456 PMCID: PMC7738208 DOI: 10.1080/07391102.2020.1855250] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 11/20/2020] [Indexed: 12/21/2022]
Abstract
To elucidate the role of artificial intelligence (AI) in therapeutics for coronavirus disease 2019 (COVID-19). Five databases were searched (December 2019-May 2020). We included both published and pre-print original articles in English that applied AI, machine learning or deep learning in drug repurposing, novel drug discovery, vaccine and antibody development for COVID-19. Out of 31 studies included, 16 studies applied AI for drug repurposing, whereas 10 studies utilized AI for novel drug discovery. Only four studies used AI technology for vaccine development, whereas one study generated stable antibodies against SARS-CoV-2. Approx. 50% of studies exclusively targeted 3CLpro of SARS-CoV-2, and only two studies targeted ACE/TMPSS2 for inhibiting host viral interactions. Around 16% of the identified drugs are in different phases of clinical evaluation against COVID-19. AI has emerged as a promising solution of COVID-19 therapeutics. During this current pandemic, many of the researchers have used AI-based strategies to process large databases in a more customized manner leading to the faster identification of several potential targets, novel/repurposing of drugs and vaccine candidates. A number of these drugs are either approved or are in a late-stage clinical trial and are potentially effective against SARS-CoV2 indicating validity of the methodology. However, as the use of AI-based screening program is currently in budding stage, sole reliance on such algorithms is not advisable at this current point of time and an evidence based approach is warranted to confirm their usefulness against this life-threatening disease. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Karanvir Kaushal
- Department of Biochemistry, All India Institute of Medical Sciences, Rishikesh, India
| | - Phulan Sarma
- Department of Pharmacology, Post Graduate Institute of Medical Education and Research, Chandigarh, India
| | - S. V. Rana
- Department of Biochemistry, All India Institute of Medical Sciences, Rishikesh, India
| | - Bikash Medhi
- Department of Pharmacology, Post Graduate Institute of Medical Education and Research, Chandigarh, India
| | - Manisha Naithani
- Department of Biochemistry, All India Institute of Medical Sciences, Rishikesh, India
| |
Collapse
|
22
|
Wang Y, Wei Z, Xi L. Sfcnn: a novel scoring function based on 3D convolutional neural network for accurate and stable protein-ligand affinity prediction. BMC Bioinformatics 2022; 23:222. [PMID: 35676617 PMCID: PMC9178885 DOI: 10.1186/s12859-022-04762-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 06/01/2022] [Indexed: 01/09/2023] Open
Abstract
Background Computer-aided drug design provides an effective method of identifying lead compounds. However, success rates are significantly bottlenecked by the lack of accurate and reliable scoring functions needed to evaluate binding affinities of protein–ligand complexes. Therefore, many scoring functions based on machine learning or deep learning have been developed to improve prediction accuracies in recent years. In this work, we proposed a novel featurization method, generating a new scoring function model based on 3D convolutional neural network. Results This work showed the results from testing four architectures and three featurization methods, and outlined the development of a novel deep 3D convolutional neural network scoring function model. This model simplified feature engineering, and in combination with Grad-CAM made the intermediate layers of the neural network more interpretable. This model was evaluated and compared with other scoring functions on multiple independent datasets. The Pearson correlation coefficients between the predicted binding affinities by our model and the experimental data achieved 0.7928, 0.7946, 0.6758, and 0.6474 on CASF-2016 dataset, CASF-2013 dataset, CSAR_HiQ_NRC_set, and Astex_diverse_set, respectively. Overall, our model performed accurately and stably enough in the scoring power to predict the binding affinity of a protein–ligand complex. Conclusions These results indicate our model is an excellent scoring function, and performs well in scoring power for accurately and stably predicting the protein–ligand affinity. Our model will contribute towards improving the success rate of virtual screening, thus will accelerate the development of potential drugs or novel biologically active lead compounds. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04762-3.
Collapse
Affiliation(s)
- Yu Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Tele-Communications, No. 2 Chongwen Road, Nan'an District, Chongqing, 400065, China.
| | - Zhengxiao Wei
- Department of Clinical Laboratory, Public Health Clinical Center of Chengdu, Chengdu, 610095, China
| | - Lei Xi
- Hubei Provincial Key Laboratory of Occurrence and Intervention of Rheumatic Diseases, Hubei Minzu University, Enshi, 445000, China
| |
Collapse
|
23
|
Volkov M, Turk JA, Drizard N, Martin N, Hoffmann B, Gaston-Mathé Y, Rognan D. On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks. J Med Chem 2022; 65:7946-7958. [PMID: 35608179 DOI: 10.1021/acs.jmedchem.2c00487] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Accurate prediction of binding affinities from protein-ligand atomic coordinates remains a major challenge in early stages of drug discovery. Using modular message passing graph neural networks describing both the ligand and the protein in their free and bound states, we unambiguously evidence that an explicit description of protein-ligand noncovalent interactions does not provide any advantage with respect to ligand or protein descriptors. Simple models, inferring binding affinities of test samples from that of the closest ligands or proteins in the training set, already exhibit good performances, suggesting that memorization largely dominates true learning in the deep neural networks. The current study suggests considering only noncovalent interactions while omitting their protein and ligand atomic environments. Removing all hidden biases probably requires much denser protein-ligand training matrices and a coordinated effort of the drug design community to solve the necessary protein-ligand structures.
Collapse
Affiliation(s)
- Mikhail Volkov
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| | | | | | | | | | | | - Didier Rognan
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| |
Collapse
|
24
|
Shim H, Kim H, Allen JE, Wulff H. Pose Classification Using Three-Dimensional Atomic Structure-Based Neural Networks Applied to Ion Channel-Ligand Docking. J Chem Inf Model 2022; 62:2301-2315. [PMID: 35447030 PMCID: PMC9131459 DOI: 10.1021/acs.jcim.1c01510] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Indexed: 12/11/2022]
Abstract
The identification of promising lead compounds showing pharmacological activities toward a biological target is essential in early stage drug discovery. With the recent increase in available small-molecule databases, virtual high-throughput screening using physics-based molecular docking has emerged as an essential tool in assisting fast and cost-efficient lead discovery and optimization. However, the best scored docking poses are often suboptimal, resulting in incorrect screening and chemical property calculation. We address the pose classification problem by leveraging data-driven machine learning approaches to identify correct docking poses from AutoDock Vina and Glide screens. To enable effective classification of docking poses, we present two convolutional neural network approaches: a three-dimensional convolutional neural network (3D-CNN) and an attention-based point cloud network (PCN) trained on the PDBbind refined set. We demonstrate the effectiveness of our proposed classifiers on multiple evaluation data sets including the standard PDBbind CASF-2016 benchmark data set and various compound libraries with structurally different protein targets including an ion channel data set extracted from Protein Data Bank (PDB) and an in-house KCa3.1 inhibitor data set. Our experiments show that excluding false positive docking poses using the proposed classifiers improves virtual high-throughput screening to identify novel molecules against each target protein compared to the initial screen based on the docking scores.
Collapse
Affiliation(s)
- Heesung Shim
- Department
of Pharmacology, University of California, Davis, California 95616, United States
| | - Hyojin Kim
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Jonathan E. Allen
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Heike Wulff
- Department
of Pharmacology, University of California, Davis, California 95616, United States
| |
Collapse
|
25
|
Feng Y, Cheng X, Wu S, Mani Saravanan K, Liu W. Hybrid drug-screening strategy identifies potential SARS-CoV-2 cell-entry inhibitors targeting human transmembrane serine protease. Struct Chem 2022; 33:1503-1515. [PMID: 35571866 PMCID: PMC9091140 DOI: 10.1007/s11224-022-01960-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/28/2022] [Indexed: 11/21/2022]
Abstract
The spread of coronavirus infectious disease (COVID-19) is associated with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has risked public health more than any other infectious disease. Researchers around the globe use multiple approaches to identify an effective approved drug (drug repurposing) that treats viral infections. Most of the drug repurposing approaches target spike protein or main protease. Here we use transmembrane serine protease 2 (TMPRSS2) as a target that can prevent the virus entry into the cell by interacting with the surface receptors. By hypothesizing that the TMPRSS2 binders may help prevent the virus entry into the cell, we performed a systematic drug screening over the current approved drug database. Furthermore, we screened the Enamine REAL fragments dataset against the TMPRSS2 and presented nine potential drug-like compounds that give us clues about which kinds of groups the pocket prefers to bind, aiding future structure-based drug design for COVID-19. Also, we employ molecular dynamics simulations, binding free energy calculations, and well-tempered metadynamics to validate the obtained candidate drug and fragment list. Our results suggested three potential FDA-approved drugs against human TMPRSS2 as a target. These findings may pave the way for more drugs to be exposed to TMPRSS2, and testing the efficacy of these drugs with biochemical experiments will help improve COVID-19 treatment. Supplementary information The online version contains supplementary material available at 10.1007/s11224-022-01960-w.
Collapse
Affiliation(s)
- Yufei Feng
- Life Science and Technology School, Lingnan Normal University, Zhanjiang, 524048 Guangdong Province China
| | - Xiaoning Cheng
- Central People’s Hospital of Zhanjiang, Zhanjiang, 524045 Guangdong Province China
| | - Shuilong Wu
- Central People’s Hospital of Zhanjiang, Zhanjiang, 524045 Guangdong Province China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu 600073 India
| | - Wenxin Liu
- Central People’s Hospital of Zhanjiang, Zhanjiang, 524045 Guangdong Province China
| |
Collapse
|
26
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
27
|
Soil Moisture Content Estimation Based on Sentinel-1 SAR Imagery Using an Artificial Neural Network and Hydrological Components. REMOTE SENSING 2022. [DOI: 10.3390/rs14030465] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
This study estimates soil moisture content (SMC) using Sentinel-1A/B C-band synthetic aperture radar (SAR) images and an artificial neural network (ANN) over a 40 × 50-km2 area located in the Geum River basin in South Korea. The hydrological components characterized by the antecedent precipitation index (API) and dry days were used as input data as well as SAR (cross-polarization (VH) and copolarization (VV) backscattering coefficients and local incidence angle), topographic (elevation and slope), and soil (percentage of clay and sand)-related data in the ANN simulations. A simple logarithmic transformation was useful in establishing the linear relationship between the observed SMC and the API. In the dry period without rainfall, API did not decrease below 0, thus the Dry days were applied to express the decreasing SMC. The optimal ANN architecture was constructed in terms of the number of hidden layers, hidden neurons, and activation function. The comparison of the estimated SMC with the observed SMC showed that the Pearson’s correlation coefficient (R) and the root mean square error (RMSE) were 0.85 and 4.59%, respectively.
Collapse
|
28
|
Yan C, Feng X, Li G. From Drug Molecules to Thermoset Shape Memory Polymers: A Machine Learning Approach. ACS APPLIED MATERIALS & INTERFACES 2021; 13:60508-60521. [PMID: 34878247 DOI: 10.1021/acsami.1c20947] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Ultraviolet (UV)-curable thermoset shape memory polymers (TSMPs) with high recovery stress but mild glass transition temperature (Tg) are highly desired for 3D/4D printing lightweight load-bearing structures and devices. However, a bottleneck is that high recovery stress usually means high Tg. For a few TSMPs with high recovery stress, their Tg values are close to the decomposition temperature, and thus, the shape memory effect cannot be triggered safely and effectively. While machine learning (ML) has served as a useful tool to discover new materials and drugs, the grand challenge of using ML to discover new TSMPs persists in the very limited data available. Here, we report an enhanced ML approach by combining the transfer learning-variational autoencoder with a weighted-vector combination method. By learning a large data set with drug molecules in a pretraining process, we were able to effectively map the TSMPs to a hidden space that is much closer to a Gaussian distribution. Through this approach, we created a large compositional space and were able to discover five new types of UV-curable TSMPs with desired properties, one of which was validated by the experiments. Our contribution includes (1) representing the features of TSMPs by drug molecules to overcome the barrier of a limited training data set and (2) developing a ML framework that is able to overcome the barrier of mapping the molar ratio information. It is shown that this approach can effectively learn TSMP features by utilizing the relatedness between the data-scarce (and biased) TSMP target and data-abundant drug source, and the result is much more accurate and more robust than the benchmark set by the support vector machine method using direct label encoding and Morgan encoding. Therefore, it is believed that this framework is a state-of-the-art study in the TSMP field. This study opens new opportunities for discovering not only new TSMPs but also other thermoset polymers.
Collapse
Affiliation(s)
- Cheng Yan
- Department of Mechanical & Industrial Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States
| | - Xiaming Feng
- Department of Mechanical & Industrial Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States
| | - Guoqiang Li
- Department of Mechanical & Industrial Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States
| |
Collapse
|
29
|
Born J, Huynh T, Stroobants A, Cornell WD, Manica M. Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. J Chem Inf Model 2021; 62:240-257. [PMID: 34905358 DOI: 10.1021/acs.jcim.1c00889] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.
Collapse
Affiliation(s)
- Jannis Born
- IBM Research Europe, 8804 Rüschlikon, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Tien Huynh
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Astrid Stroobants
- Department of Chemistry, Imperial College London, SW7 2AZ London, United Kingdom
| | - Wendy D Cornell
- IBM Research, Yorktown Heights, New York 10598, United States
| | | |
Collapse
|
30
|
Zhang H, Li J, Saravanan KM, Wu H, Wang Z, Wu D, Wei Y, Lu Z, Chen YH, Wan X, Pan Y. An Integrated Deep Learning and Molecular Dynamics Simulation-Based Screening Pipeline Identifies Inhibitors of a New Cancer Drug Target TIPE2. Front Pharmacol 2021; 12:772296. [PMID: 34887765 PMCID: PMC8650684 DOI: 10.3389/fphar.2021.772296] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/02/2021] [Indexed: 12/31/2022] Open
Abstract
The TIPE2 (tumor necrosis factor-alpha-induced protein 8-like 2) protein is a major regulator of cancer and inflammatory diseases. The availability of its sequence and structure, as well as the critical amino acids involved in its ligand binding, provides insights into its function and helps greatly identify novel drug candidates against TIPE2 protein. With the current advances in deep learning and molecular dynamics simulation-based drug screening, large-scale exploration of inhibitory candidates for TIPE2 becomes possible. In this work, we apply deep learning-based methods to perform a preliminary screening against TIPE2 over several commercially available compound datasets. Then, we carried a fine screening by molecular dynamics simulations, followed by metadynamics simulations. Finally, four compounds were selected for experimental validation from 64 candidates obtained from the screening. With surprising accuracy, three compounds out of four can bind to TIPE2. Among them, UM-164 exhibited the strongest binding affinity of 4.97 µM and was able to interfere with the binding of TIPE2 and PIP2 according to competitive bio-layer interferometry (BLI), which indicates that UM-164 is a potential inhibitor against TIPE2 function. The work demonstrates the feasibility of incorporating deep learning and MD simulation in virtual drug screening and provides high potential inhibitors against TIPE2 for drug development.
Collapse
Affiliation(s)
- Haiping Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Konda Mani Saravanan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Hao Wu
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhichao Wang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Du Wu
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhen Lu
- Center for Cancer Immunology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Youhai H Chen
- Center for Cancer Immunology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Xiaochun Wan
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
31
|
Wang Y, Wu S, Duan Y, Huang Y. A point cloud-based deep learning strategy for protein-ligand binding affinity prediction. Brief Bioinform 2021; 23:6440132. [PMID: 34849569 DOI: 10.1093/bib/bbab474] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 10/15/2021] [Indexed: 01/14/2023] Open
Abstract
There is great interest to develop artificial intelligence-based protein-ligand binding affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied for protein-ligand binding affinity prediction for the first time. Three-dimensional point clouds could be rapidly generated from PDBbind-2016 with 3772 and 11 327 individual point clouds derived from the refined or/and general sets, respectively. These point clouds (the refined or the extended set) were used to train PointNet or PointTransformer, resulting in protein-ligand binding affinity prediction models with Pearson correlation coefficients R = 0.795 or 0.833 from the extended data set, respectively, based on the CASF-2016 benchmark test. The analysis of parameters suggests that the two deep learning models were capable to learn many interactions between proteins and their ligands, and some key atoms for the interactions could be visualized. The protein-ligand interaction features learned by PointTransformer could be further adapted for the XGBoost-based machine learning algorithm, resulting in prediction models with an average Rp of 0.827, which is on par with state-of-the-art machine learning models. These results suggest that the point clouds derived from PDBbind data sets are useful to evaluate the performance of 3D point clouds-centered deep learning algorithms, which could learn atomic features of protein-ligand interactions from natural evolution or medicinal chemistry and thus have wide applications in chemistry and biology.
Collapse
Affiliation(s)
- Yeji Wang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Shuo Wu
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China
| | - Yanwen Duan
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,Hunan Engineering Research Center of Combinatorial Biosynthesis and Natural Product Drug Discover, Changsha, Hunan 410011, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| | - Yong Huang
- Xiangya International Academy of Translational Medicine, Central South University, Changsha, Hunan 410013, China.,National Engineering Research Center of Combinatorial Biosynthesis for Drug Discovery, Changsha, Hunan 410011, China
| |
Collapse
|
32
|
Wei X, Wu X, Cheng Z, Wu Q, Cao C, Xu X, Shang H. Botanical drugs: a new strategy for structure-based target prediction. Brief Bioinform 2021; 23:6409695. [PMID: 34698349 DOI: 10.1093/bib/bbab425] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/08/2021] [Accepted: 09/17/2021] [Indexed: 11/14/2022] Open
Abstract
Target identification of small molecules is an important and still changeling work in the area of drug discovery, especially for botanical drug development. Indistinct understanding of the relationships of ligand-protein interactions is one of the main obstacles for drug repurposing and identification of off-targets. In this study, we collected 9063 crystal structures of ligand-binding proteins released from January, 1995 to April, 2021 in PDB bank, and split the complexes into 5133 interaction pairs of ligand atoms and protein fragments (covalently linked three heavy atoms) with interatomic distance ≤5 Å. The interaction pairs were grouped into ligand atoms with the same SYBYL atom type surrounding each type of protein fragment, which were further clustered via Bayesian Gaussian Mixture Model (BGMM). Gaussian distributions with ligand atoms ≥20 were identified as significant interaction patterns. Reliability of the significant interaction patterns was validated by comparing the difference of number of significant interaction patterns between the docked poses with higher and lower similarity to the native crystal structures. Fifty-one candidate targets of brucine, strychnine and icajine involved in Semen Strychni (Mǎ Qián Zǐ) and eight candidate targets of astragaloside-IV, formononetin and calycosin-7-glucoside involved in Astragalus (Huáng Qí) were predicted by the significant interaction patterns, in combination with docking, which were consistent with the therapeutic effects of Semen Strychni and Astragalus for cancer and chronic pain. The new strategy in this study improves the accuracy of target identification for small molecules, which will facilitate discovery of botanical drugs.
Collapse
Affiliation(s)
- Xuxu Wei
- Key laboratory of Chinese internal medicine of MOE, Dongzhimen Hospital, BUCM, Beijing, China.,School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Xiang Wu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Zeyu Cheng
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Qingming Wu
- School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Xue Xu
- School of Medicine, Wuhan University of Science and Technology, Wuhan, Hubei 430081, China
| | - Hongcai Shang
- Key laboratory of Chinese internal medicine of MOE, Dongzhimen Hospital, BUCM, Beijing, China.,Evidence-Based Medicine Research Centre, Jiangxi University of Chinese Medicine, Jiangxi, China
| |
Collapse
|
33
|
Recio R, Lerena P, Pozo E, Calderón-Montaño JM, Burgos-Morón E, López-Lázaro M, Valdivia V, Pernia Leal M, Mouillac B, Organero JÁ, Khiar N, Fernández I. Carbohydrate-Based NK1R Antagonists with Broad-Spectrum Anticancer Activity. J Med Chem 2021; 64:10350-10370. [PMID: 34236855 PMCID: PMC8529873 DOI: 10.1021/acs.jmedchem.1c00793] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Indexed: 01/03/2023]
Abstract
NK1R antagonists, investigated for the treatment of several pathologies, have shown encouraging results in the treatment of several cancers. In the present study, we report on the synthesis of carbohydrate-based NK1R antagonists and their evaluation as anticancer agents against a wide range of cancer cells. All of the prepared compounds, derived from either d-galactose or l-arabinose, have shown high affinity and NK1R antagonistic activity with a broad-spectrum anticancer activity and an important selectivity, comparable to Cisplatin. This strategy has allowed us to identify the galactosyl derivative 14α, as an interesting hit exhibiting significant NK1R antagonist effect (kinact 0.209 ± 0.103 μM) and high binding affinity for NK1R (IC50 = 50.4 nM, Ki = 22.4 nM by measuring the displacement of [125I] SP from NK1R). Interestingly, this galactosyl derivative has shown marked selective cytotoxic activity against 12 different types of cancer cell lines.
Collapse
Affiliation(s)
- Rocío Recio
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| | - Patricia Lerena
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| | - Esther Pozo
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| | - José Manuel Calderón-Montaño
- Departamento
de Farmacología, Facultad de Farmacia, Universidad de Sevilla, C/ Profesor García González, 2, 41012 Sevilla, Spain
| | - Estefanía Burgos-Morón
- Departamento
de Farmacología, Facultad de Farmacia, Universidad de Sevilla, C/ Profesor García González, 2, 41012 Sevilla, Spain
| | - Miguel López-Lázaro
- Departamento
de Farmacología, Facultad de Farmacia, Universidad de Sevilla, C/ Profesor García González, 2, 41012 Sevilla, Spain
| | - Victoria Valdivia
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| | - Manuel Pernia Leal
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| | - Bernard Mouillac
- Institut
de Génomique Fonctionnelle (IGF), INSERM, Université de Montpellier, CNRS, F-34094 Montpellier, France
| | - Juan Ángel Organero
- Departamento
de Química Física, Facultad de Ciencias Ambientales
y Bioquímicas and INAMOL, Universidad
de Castilla-La Mancha, Avenida Carlos III, s/n, 45071 Toledo, Spain
| | - Noureddine Khiar
- Instituto
de Investigaciones Químicas (IIQ), CSIC-Universidad de Sevilla, Avenida Américo Vespucio, 49, Isla de la
Cartuja, 41092 Sevilla, Spain
| | - Inmaculada Fernández
- Departamento
de Química Orgánica y Farmacéutica, Facultad
de Farmacia, Universidad de Sevilla, C/ Profesor García González,
2, 41012 Sevilla, Spain
| |
Collapse
|
34
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
35
|
Kim QH, Ko JH, Kim S, Park N, Jhe W. Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction. Bioinformatics 2021; 37:3428-3435. [PMID: 33978713 PMCID: PMC8545317 DOI: 10.1093/bioinformatics/btab346] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/26/2021] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open
Abstract
Motivation Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. Results At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points. Availability and implementation The code is available at https://github.com/QHwan/PretrainDPI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- QHwan Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Joon-Hyuk Ko
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Sunghoon Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Nojun Park
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Wonho Jhe
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
36
|
Kimber TB, Chen Y, Volkamer A. Deep Learning in Virtual Screening: Recent Applications and Developments. Int J Mol Sci 2021; 22:4435. [PMID: 33922714 PMCID: PMC8123040 DOI: 10.3390/ijms22094435] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/13/2021] [Accepted: 04/14/2021] [Indexed: 01/03/2023] Open
Abstract
Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.
Collapse
Affiliation(s)
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; (T.B.K.); (Y.C.)
| |
Collapse
|
37
|
Gupta P, Mohanty D. SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2. Brief Bioinform 2021; 22:6220172. [PMID: 33839740 PMCID: PMC8083326 DOI: 10.1093/bib/bbab111] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/18/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022] Open
Abstract
Small molecule modulators of protein–protein interactions (PPIs) are being pursued as novel anticancer, antiviral and antimicrobial drug candidates. We have utilized a large data set of experimentally validated PPI modulators and developed machine learning classifiers for prediction of new small molecule modulators of PPI. Our analysis reveals that using random forest (RF) classifier, general PPI Modulators independent of PPI family can be predicted with ROC-AUC higher than 0.9, when training and test sets are generated by random split. The performance of the classifier on data sets very different from those used in training has also been estimated by using different state of the art protocols for removing various types of bias in division of data into training and test sets. The family-specific PPIM predictors developed in this work for 11 clinically important PPI families also have prediction accuracies of above 90% in majority of the cases. All these ML-based predictors have been implemented in a freely available software named SMMPPI for prediction of small molecule modulators for clinically relevant PPIs like RBD:hACE2, Bromodomain_Histone, BCL2-Like_BAX/BAK, LEDGF_IN, LFA_ICAM, MDM2-Like_P53, RAS_SOS1, XIAP_Smac, WDR5_MLL1, KEAP1_NRF2 and CD4_gp120. We have identified novel chemical scaffolds as inhibitors for RBD_hACE PPI involved in host cell entry of SARS-CoV-2. Docking studies for some of the compounds reveal that they can inhibit RBD_hACE2 interaction by high affinity binding to interaction hotspots on RBD. Some of these new scaffolds have also been found in SARS-CoV-2 viral growth inhibitors reported recently; however, it is not known if these molecules inhibit the entry phase.
Collapse
Affiliation(s)
| | - Debasisa Mohanty
- Bioinformatics & Computational Biology research group at NII, New Delhi 110067, India
| |
Collapse
|
38
|
Jones D, Kim H, Zhang X, Zemla A, Stevenson G, Bennett WFD, Kirshner D, Wong SE, Lightstone FC, Allen JE. Improved Protein-Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference. J Chem Inf Model 2021; 61:1583-1592. [PMID: 33754707 DOI: 10.1021/acs.jcim.0c01306] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Predicting accurate protein-ligand binding affinities is an important task in drug discovery but remains a challenge even with computationally expensive biophysics-based energy scoring methods and state-of-the-art deep learning approaches. Despite the recent advances in the application of deep convolutional and graph neural network-based approaches, it remains unclear what the relative advantages of each approach are and how they compare with physics-based methodologies that have found more mainstream success in virtual screening pipelines. We present fusion models that combine features and inference from complementary representations to improve binding affinity prediction. This, to our knowledge, is the first comprehensive study that uses a common series of evaluations to directly compare the performance of three-dimensional (3D)-convolutional neural networks (3D-CNNs), spatial graph neural networks (SG-CNNs), and their fusion. We use temporal and structure-based splits to assess performance on novel protein targets. To test the practical applicability of our models, we examine their performance in cases that assume that the crystal structure is not available. In these cases, binding free energies are predicted using docking pose coordinates as the inputs to each model. In addition, we compare these deep learning approaches to predictions based on docking scores and molecular mechanic/generalized Born surface area (MM/GBSA) calculations. Our results show that the fusion models make more accurate predictions than their constituent neural network models as well as docking scoring and MM/GBSA rescoring, with the benefit of greater computational efficiency than the MM/GBSA method. Finally, we provide the code to reproduce our results and the parameter files of the trained models used in this work. The software is available as open source at https://github.com/llnl/fast. Model parameter files are available at ftp://gdo-bioinformatics.ucllnl.org/fast/pdbbind2016_model_checkpoints/.
Collapse
Affiliation(s)
- Derek Jones
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Hyojin Kim
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Xiaohua Zhang
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Adam Zemla
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Garrett Stevenson
- Computational Engineering Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - W F Drew Bennett
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Daniel Kirshner
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Sergio E Wong
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| | - Jonathan E Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550, United States
| |
Collapse
|
39
|
Zhang H, Yang Y, Li J, Wang M, Saravanan KM, Wei J, Tze-Yang Ng J, Tofazzal Hossain M, Liu M, Zhang H, Ren X, Pan Y, Peng Y, Shi Y, Wan X, Liu Y, Wei Y. A novel virtual screening procedure identifies Pralatrexate as inhibitor of SARS-CoV-2 RdRp and it reduces viral replication in vitro. PLoS Comput Biol 2020; 16:e1008489. [PMID: 33382685 PMCID: PMC7774833 DOI: 10.1371/journal.pcbi.1008489] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/03/2020] [Indexed: 01/18/2023] Open
Abstract
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus poses serious threats to the global public health and leads to worldwide crisis. No effective drug or vaccine is readily available. The viral RNA-dependent RNA polymerase (RdRp) is a promising therapeutic target. A hybrid drug screening procedure was proposed and applied to identify potential drug candidates targeting RdRp from 1906 approved drugs. Among the four selected market available drug candidates, Pralatrexate and Azithromycin were confirmed to effectively inhibit SARS-CoV-2 replication in vitro with EC50 values of 0.008μM and 9.453 μM, respectively. For the first time, our study discovered that Pralatrexate is able to potently inhibit SARS-CoV-2 replication with a stronger inhibitory activity than Remdesivir within the same experimental conditions. The paper demonstrates the feasibility of fast and accurate anti-viral drug screening for inhibitors of SARS-CoV-2 and provides potential therapeutic agents against COVID-19.
Collapse
Affiliation(s)
- Haiping Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Yang Yang
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for infectious disease, State Key Discipline of Infectious Disease, Shenzhen Third People's Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Min Wang
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Konda Mani Saravanan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Jinli Wei
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for infectious disease, State Key Discipline of Infectious Disease, Shenzhen Third People's Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Justin Tze-Yang Ng
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Md. Tofazzal Hossain
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
- University of Chinese Academy of Sciences, Shijingshan District, Beijing, China
| | - Maoxuan Liu
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
| | - Huiling Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Xiaohu Ren
- Institute of Toxicology, Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, Georgia, United States of America
| | - Yin Peng
- Department of Pathology, School of Medicine, Shenzhen University, Shenzhen, China
| | - Yi Shi
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Xiaochun Wan
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, University City of Shenzhen, Shenzhen, China
- * E-mail: (XW); (YL); (YW)
| | - Yingxia Liu
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for infectious disease, State Key Discipline of Infectious Disease, Shenzhen Third People's Hospital, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
- * E-mail: (XW); (YL); (YW)
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
- * E-mail: (XW); (YL); (YW)
| |
Collapse
|
40
|
Macari G, Toti D, Pasquadibisceglie A, Polticelli F. DockingApp RF: A State-of-the-Art Novel Scoring Function for Molecular Docking in a User-Friendly Interface to AutoDock Vina. Int J Mol Sci 2020; 21:ijms21249548. [PMID: 33333976 PMCID: PMC7765429 DOI: 10.3390/ijms21249548] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 11/28/2022] Open
Abstract
Motivation: Bringing a new drug to the market is expensive and time-consuming. To cut the costs and time, computer-aided drug design (CADD) approaches have been increasingly included in the drug discovery pipeline. However, despite traditional docking tools show a good conformational space sampling ability, they are still unable to produce accurate binding affinity predictions. This work presents a novel scoring function for molecular docking seamlessly integrated into DockingApp, a user-friendly graphical interface for AutoDock Vina. The proposed function is based on a random forest model and a selection of specific features to overcome the existing limits of Vina’s original scoring mechanism. A novel version of DockingApp, named DockingApp RF, has been developed to host the proposed scoring function and to automatize the rescoring procedure of the output of AutoDock Vina, even to nonexpert users. Results: By coupling intermolecular interaction, solvent accessible surface area features and Vina’s energy terms, DockingApp RF’s new scoring function is able to improve the binding affinity prediction of AutoDock Vina. Furthermore, comparison tests carried out on the CASF-2013 and CASF-2016 datasets demonstrate that DockingApp RF’s performance is comparable to other state-of-the-art machine-learning- and deep-learning-based scoring functions. The new scoring function thus represents a significant advancement in terms of the reliability and effectiveness of docking compared to AutoDock Vina’s scoring function. At the same time, the characteristics that made DockingApp appealing to a wide range of users are retained in this new version and have been complemented with additional features.
Collapse
Affiliation(s)
- Gabriele Macari
- Department of Sciences, Roma Tre University, 00146 Rome, Italy; (G.M.); (A.P.)
| | - Daniele Toti
- Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy;
| | | | - Fabio Polticelli
- Department of Sciences, Roma Tre University, 00146 Rome, Italy; (G.M.); (A.P.)
- National Institute of Nuclear Physics, Roma Tre Section, 00146 Rome, Italy
- Correspondence:
| |
Collapse
|
41
|
Zaucha J, Softley CA, Sattler M, Frishman D, Popowicz GM. Deep learning model predicts water interaction sites on the surface of proteins using limited-resolution data. Chem Commun (Camb) 2020; 56:15454-15457. [PMID: 33237041 DOI: 10.1039/d0cc04383d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We develop a residual deep learning model, hotWater (https://pypi.org/project/hotWater/), to identify key water interaction sites on proteins for binding models and drug discovery. This is tested on new crystal structures, as well as cryo-EM and NMR structures from the PDB and in crystallographic refinement with promising results.
Collapse
Affiliation(s)
- Jan Zaucha
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Maximus-von-Imhof-Forum 3, 85354 Freising, Germany.
| | | | | | | | | |
Collapse
|
42
|
Kwon Y, Shin WH, Ko J, Lee J. AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks. Int J Mol Sci 2020; 21:E8424. [PMID: 33182567 PMCID: PMC7697539 DOI: 10.3390/ijms21228424] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 10/24/2020] [Accepted: 11/07/2020] [Indexed: 02/04/2023] Open
Abstract
Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. Therefore, many binding affinity prediction methods have been developed. In recent years, since deep learning technology has become powerful, it is also implemented to predict affinity. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3-D convolutional neural network layers. Our model was trained using the 3772 protein-ligand complexes from the refined set of the PDBbind-2016 database and tested using the core set of 285 complexes. The benchmark results show that the Pearson correlation coefficient between the predicted binding affinities by our model and the experimental data is 0.827, which is higher than the state-of-the-art binding affinity prediction scoring functions. Additionally, our method ranks the relative binding affinities of possible multiple binders of a protein quite accurately, comparable to the other scoring functions. Last, we measured which structural information is critical for predicting binding affinity and found that the complementarity between the protein and ligand is most important.
Collapse
Affiliation(s)
- Yongbeom Kwon
- Department of Chemistry, Kangwon National University, Gangwon-do, Chuncheon 24341, Korea;
| | - Woong-Hee Shin
- Department of Chemical Science Education, Sunchon National University, Jeollanam-do, Suncheon 57922, Korea
| | - Junsu Ko
- Arontier, 241 Gangnam-daero, Seocho-gu, Seoul 06735, Korea
| | - Juyong Lee
- Department of Chemistry, Kangwon National University, Gangwon-do, Chuncheon 24341, Korea;
| |
Collapse
|
43
|
Ton A, Gentile F, Hsing M, Ban F, Cherkasov A. Rapid Identification of Potential Inhibitors of SARS-CoV-2 Main Protease by Deep Docking of 1.3 Billion Compounds. Mol Inform 2020; 39:e2000028. [PMID: 32162456 PMCID: PMC7228259 DOI: 10.1002/minf.202000028] [Citation(s) in RCA: 339] [Impact Index Per Article: 84.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 03/11/2020] [Indexed: 12/03/2022]
Abstract
The recently emerged 2019 Novel Coronavirus (SARS-CoV-2) and associated COVID-19 disease cause serious or even fatal respiratory tract infection and yet no approved therapeutics or effective treatment is currently available to effectively combat the outbreak. This urgent situation is pressing the world to respond with the development of novel vaccine or a small molecule therapeutics for SARS-CoV-2. Along these efforts, the structure of SARS-CoV-2 main protease (Mpro) has been rapidly resolved and made publicly available to facilitate global efforts to develop novel drug candidates. Recently, our group has developed a novel deep learning platform - Deep Docking (DD) which provides fast prediction of docking scores of Glide (or any other docking program) and, hence, enables structure-based virtual screening of billions of purchasable molecules in a short time. In the current study we applied DD to all 1.3 billion compounds from ZINC15 library to identify top 1,000 potential ligands for SARS-CoV-2 Mpro protein. The compounds are made publicly available for further characterization and development by scientific community.
Collapse
Affiliation(s)
- Anh‐Tien Ton
- Vancouver Prostate CentreUniversity of British Columbia2660 Oak StreetVancouver, BCV6H 3Z6Canada
| | - Francesco Gentile
- Vancouver Prostate CentreUniversity of British Columbia2660 Oak StreetVancouver, BCV6H 3Z6Canada
| | - Michael Hsing
- Vancouver Prostate CentreUniversity of British Columbia2660 Oak StreetVancouver, BCV6H 3Z6Canada
| | - Fuqiang Ban
- Vancouver Prostate CentreUniversity of British Columbia2660 Oak StreetVancouver, BCV6H 3Z6Canada
| | - Artem Cherkasov
- Vancouver Prostate CentreUniversity of British Columbia2660 Oak StreetVancouver, BCV6H 3Z6Canada
| |
Collapse
|
44
|
Wang DD, Zhu M, Yan H. Computationally predicting binding affinity in protein-ligand complexes: free energy-based simulations and machine learning-based scoring functions. Brief Bioinform 2020; 22:5860693. [PMID: 32591817 DOI: 10.1093/bib/bbaa107] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/20/2020] [Accepted: 05/05/2020] [Indexed: 12/18/2022] Open
Abstract
Accurately predicting protein-ligand binding affinities can substantially facilitate the drug discovery process, but it remains as a difficult problem. To tackle the challenge, many computational methods have been proposed. Among these methods, free energy-based simulations and machine learning-based scoring functions can potentially provide accurate predictions. In this paper, we review these two classes of methods, following a number of thermodynamic cycles for the free energy-based simulations and a feature-representation taxonomy for the machine learning-based scoring functions. More recent deep learning-based predictions, where a hierarchy of feature representations are generally extracted, are also reviewed. Strengths and weaknesses of the two classes of methods, coupled with future directions for improvements, are comparatively discussed.
Collapse
Affiliation(s)
- Debby D Wang
- School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology
| | - Mengxu Zhu
- Department of Electrical Engineering, City University of Hong Kong
| | - Hong Yan
- College of Science and Engineering, City University of Hong Kong
| |
Collapse
|
45
|
Insight into potent leads for alzheimer's disease by using several artificial intelligence algorithms. Biomed Pharmacother 2020; 129:110360. [PMID: 32559623 DOI: 10.1016/j.biopha.2020.110360] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/21/2022] Open
Abstract
Several proteins including S-nitrosoglutathione reductase (GSNOR), complement Factor D, complement 3b (C3b) and Protein Kinase R-like Endoplasmic Reticulum Kinase (PERK), have been demonstrated to be involved in pathogenesis pathways for Alzheimer's disease (AD) and considered as potential treatment targets to AD. Based on the concept of multitargets, a network pharmacology-based approach was employed to investigate potential Traditional Chinese Medicine (TCM) candidates that can dock well with GSNOR, C3b, Factor D and PERK proteins. To predict the bioactivities of candidates, Artificial Intelligence (AI) algorithms composed of seven machine learning algorithms and a deep learning model were performed to validate the docking results. Furthermore, in this study, we propose a novel combined method for efficiently exploring the predicted results of AI algorithms. Besides, Comparative force field analysis (CoMFA) and comparative similarity indices analysis (CoMSIA) were performed to construct predicted models. The results show that the square correlation coefficients (R2) of all models are almost higher than 0.75, which also acquire good achievements on the test set. Moreover, the binding stability of the potential inhibitors were evaluated using 100 ns of MD simulation. Collectively, this study elucidate that the herbs Ardisia japonica, Ligusticum chuanxiong, Lippia nodiflora and Mirabilis jalapa containing 2,2'-[benzene-1,4-diylbis(methanediyloxybenzene-4,1-diyl)]bis(oxoacetic acid), Glyasperin B, Nodifloridin A, Miraxanthin III and l-Valine-l-valine anhydride might be a potential medicine formula for AD.
Collapse
|
46
|
Zhang H, Saravanan KM, Yang Y, Hossain MT, Li J, Ren X, Pan Y, Wei Y. Deep Learning Based Drug Screening for Novel Coronavirus 2019-nCov. Interdiscip Sci 2020; 12:368-376. [PMID: 32488835 PMCID: PMC7266118 DOI: 10.1007/s12539-020-00376-6] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 04/20/2020] [Accepted: 05/25/2020] [Indexed: 01/09/2023]
Abstract
A novel coronavirus, called 2019-nCoV, was recently found in Wuhan, Hubei Province of China, and now is spreading across China and other parts of the world. Although there are some drugs to treat 2019-nCoV, there is no proper scientific evidence about its activity on the virus. It is of high significance to develop a drug that can combat the virus effectively to save valuable human lives. It usually takes a much longer time to develop a drug using traditional methods. For 2019-nCoV, it is now better to rely on some alternative methods such as deep learning to develop drugs that can combat such a disease effectively since 2019-nCoV is highly homologous to SARS-CoV. In the present work, we first collected virus RNA sequences of 18 patients reported to have 2019-nCoV from the public domain database, translated the RNA into protein sequences, and performed multiple sequence alignment. After a careful literature survey and sequence analysis, 3C-like protease is considered to be a major therapeutic target and we built a protein 3D model of 3C-like protease using homology modeling. Relying on the structural model, we used a pipeline to perform large scale virtual screening by using a deep learning based method to accurately rank/identify protein-ligand interacting pairs developed recently in our group. Our model identified potential drugs for 2019-nCoV 3C-like protease by performing drug screening against four chemical compound databases (Chimdiv, Targetmol-Approved_Drug_Library, Targetmol-Natural_Compound_Library, and Targetmol-Bioactive_Compound_Library) and a database of tripeptides. Through this paper, we provided the list of possible chemical ligands (Meglumine, Vidarabine, Adenosine, D-Sorbitol, D-Mannitol, Sodium_gluconate, Ganciclovir and Chlorobutanol) and peptide drugs (combination of isoleucine, lysine and proline) from the databases to guide the experimental scientists and validate the molecules which can combat the virus in a shorter time.
Collapse
Affiliation(s)
- Haiping Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, People's Republic of China
| | - Konda Mani Saravanan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, People's Republic of China
| | - Yang Yang
- Shenzhen Key Laboratory of Pathogen and Immunity, Guangdong Key Laboratory for Diagnosis and Treatment of Emerging Infectious Diseases, State Key Discipline of Infectious Disease, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen Third People's Hospital, Shenzhen, 518112, People's Republic of China
| | - Md Tofazzal Hossain
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, People's Republic of China
- University of Chinese Academy of Sciences, No. 19(A) Yuquan Road, Shijingshan District, Beijing, 100049, People's Republic of China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, University City of Shenzhen, XiliNanshan, Shenzhen, 518055, People's Republic of China
| | - Xiaohu Ren
- Institute of Toxicology, Shenzhen Center for Disease Control and Prevention, No 8 Longyuan Road, Nanshan District, Shenzhen, 518055, China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, 30302-5060, USA
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, People's Republic of China.
| |
Collapse
|
47
|
Mathai N, Kirchmair J. Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope. Int J Mol Sci 2020; 21:ijms21103585. [PMID: 32438666 PMCID: PMC7279241 DOI: 10.3390/ijms21103585] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/13/2020] [Accepted: 05/16/2020] [Indexed: 12/20/2022] Open
Abstract
Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
| | - Johannes Kirchmair
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
48
|
Rallabandi HR, Ganesan P, Kim YJ. Targeting the C-Terminal Domain Small Phosphatase 1. Life (Basel) 2020; 10:life10050057. [PMID: 32397221 PMCID: PMC7281111 DOI: 10.3390/life10050057] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 12/15/2022] Open
Abstract
The human C-terminal domain small phosphatase 1 (CTDSP1/SCP1) is a protein phosphatase with a conserved catalytic site of DXDXT/V. CTDSP1’s major activity has been identified as dephosphorylation of the 5th Ser residue of the tandem heptad repeat of the RNA polymerase II C-terminal domain (RNAP II CTD). It is also implicated in various pivotal biological activities, such as acting as a driving factor in repressor element 1 (RE-1)-silencing transcription factor (REST) complex, which silences the neuronal genes in non-neuronal cells, G1/S phase transition, and osteoblast differentiation. Recent findings have denoted that negative regulation of CTDSP1 results in suppression of cancer invasion in neuroglioma cells. Several researchers have focused on the development of regulating materials of CTDSP1, due to the significant roles it has in various biological activities. In this review, we focused on this emerging target and explored the biological significance, challenges, and opportunities in targeting CTDSP1 from a drug designing perspective.
Collapse
|
49
|
Zhang H, Saravanan KM, Lin J, Liao L, Ng JTY, Zhou J, Wei Y. DeepBindPoc: a deep learning method to rank ligand binding pockets using molecular vector representation. PeerJ 2020; 8:e8864. [PMID: 32292649 PMCID: PMC7144620 DOI: 10.7717/peerj.8864] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/08/2020] [Indexed: 11/30/2022] Open
Abstract
Accurate identification of ligand-binding pockets in a protein is important for structure-based drug design. In recent years, several deep learning models were developed to learn important physical–chemical and spatial information to predict ligand-binding pockets in a protein. However, ranking the native ligand binding pockets from a pool of predicted pockets is still a hard task for computational molecular biologists using a single web-based tool. Hence, we believe, by using closer to real application data set as training and by providing ligand information, an enhanced model to identify accurate pockets can be obtained. In this article, we propose a new deep learning method called DeepBindPoc for identifying and ranking ligand-binding pockets in proteins. The model is built by using information about the binding pocket and associated ligand. We take advantage of the mol2vec tool to represent both the given ligand and pocket as vectors to construct a densely fully connected layer model. During the training, important features for pocket-ligand binding are automatically extracted and high-level information is preserved appropriately. DeepBindPoc demonstrated a strong complementary advantage for the detection of native-like pockets when combined with traditional popular methods, such as fpocket and P2Rank. The proposed method is extensively tested and validated with standard procedures on multiple datasets, including a dataset with G-protein Coupled receptors. The systematic testing and validation of our method suggest that DeepBindPoc is a valuable tool to rank near-native pockets for theoretically modeled protein with unknown experimental active site but have known ligand. The DeepBindPoc model described in this article is available at GitHub (https://github.com/haiping1010/DeepBindPoc) and the webserver is available at (http://cbblab.siat.ac.cn/DeepBindPoc/index.php).
Collapse
Affiliation(s)
- Haiping Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province, China
| | - Konda Mani Saravanan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province, China
| | - Jinzhi Lin
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province, China
| | - Linbu Liao
- College of Software Technology, Zhejiang University, Zhejiang Province, Zhejiang, China
| | - Justin Tze-Yang Ng
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jiaxiu Zhou
- Shenzhen Children's Hospital, Shenzhen, Guangdong Province, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province, China
| |
Collapse
|
50
|
Mirabzadeh CA, Ytreberg FM. Implementation of adaptive integration method for free energy calculations in molecular systems. PeerJ Comput Sci 2020; 6:e264. [PMID: 33457645 PMCID: PMC7808261 DOI: 10.7717/peerj-cs.264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/10/2020] [Indexed: 11/20/2022]
Abstract
Estimating free energy differences by computer simulation is useful for a wide variety of applications such as virtual screening for drug design and for understanding how amino acid mutations modify protein interactions. However, calculating free energy differences remains challenging and often requires extensive trial and error and very long simulation times in order to achieve converged results. Here, we present an implementation of the adaptive integration method (AIM). We tested our implementation on two molecular systems and compared results from AIM to those from a suite of other methods. The model systems tested here include calculating the solvation free energy of methane, and the free energy of mutating the peptide GAG to GVG. We show that AIM is more efficient than other tested methods for these systems, that is, AIM results converge to a higher level of accuracy and precision for a given simulation time.
Collapse
Affiliation(s)
| | - F. Marty Ytreberg
- Department of Physics, University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, United States of America
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States of America
| |
Collapse
|