1
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
2
|
Liu H, Hu B, Chen P, Wang X, Wang H, Wang S, Wang J, Lin B, Cheng M. Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets. J Chem Inf Model 2024. [PMID: 38958413 DOI: 10.1021/acs.jcim.4c00072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
Collapse
Affiliation(s)
- Haihan Liu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Baichun Hu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Peiying Chen
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Xiao Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Hanxun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Shizun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Bin Lin
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Maosheng Cheng
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| |
Collapse
|
3
|
Shen C, Song J, Hsieh CY, Cao D, Kang Y, Ye W, Wu Z, Wang J, Zhang O, Zhang X, Zeng H, Cai H, Chen Y, Chen L, Luo H, Zhao X, Jian T, Chen T, Jiang D, Wang M, Ye Q, Wu J, Du H, Shi H, Deng Y, Hou T. DrugFlow: An AI-Driven One-Stop Platform for Innovative Drug Discovery. J Chem Inf Model 2024. [PMID: 38920405 DOI: 10.1021/acs.jcim.4c00621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenling Ye
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hao Zeng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yu Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Linkang Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Hao Luo
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Xinda Zhao
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hui Shi
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
4
|
Zhou Y, Chen SJ. Advances in machine-learning approaches to RNA-targeted drug design. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100053. [PMID: 38434217 PMCID: PMC10904028 DOI: 10.1016/j.aichem.2024.100053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
RNA molecules play multifaceted functional and regulatory roles within cells and have garnered significant attention in recent years as promising therapeutic targets. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in computer-aided drug design (CADD) to discover novel drug compounds that target RNA. Although machine-learning (ML) approaches have been widely adopted in the discovery of small molecules targeting proteins, the application of ML approaches to model interactions between RNA and small molecule is still in its infancy. Compared to protein-targeted drug discovery, the major challenges in ML-based RNA-targeted drug discovery stem from the scarcity of available data resources. With the growing interest and the development of curated databases focusing on interactions between RNA and small molecule, the field anticipates a rapid growth and the opening of a new avenue for disease treatment. In this review, we aim to provide an overview of recent advancements in computationally modeling RNA-small molecule interactions within the context of RNA-targeted drug discovery, with a particular emphasis on methodologies employing ML techniques.
Collapse
Affiliation(s)
- Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, MO 65211-7010, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
| |
Collapse
|
5
|
Zhang X, Shen C, Zhang H, Kang Y, Hsieh CY, Hou T. Advancing Ligand Docking through Deep Learning: Challenges and Prospects in Virtual Screening. Acc Chem Res 2024; 57:1500-1509. [PMID: 38577892 DOI: 10.1021/acs.accounts.4c00093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Molecular docking, also termed ligand docking (LD), is a pivotal element of structure-based virtual screening (SBVS) used to predict the binding conformations and affinities of protein-ligand complexes. Traditional LD methodologies rely on a search and scoring framework, utilizing heuristic algorithms to explore binding conformations and scoring functions to evaluate binding strengths. However, to meet the efficiency demands of SBVS, these algorithms and functions are often simplified, prioritizing speed over accuracy.The emergence of deep learning (DL) has exerted a profound impact on diverse fields, ranging from natural language processing to computer vision and drug discovery. DeepMind's AlphaFold2 has impressively exhibited its ability to accurately predict protein structures solely from amino acid sequences, highlighting the remarkable potential of DL in conformation prediction. This groundbreaking advancement circumvents the traditional search-scoring frameworks in LD, enhancing both accuracy and processing speed and thereby catalyzing a broader adoption of DL algorithms in binding pose prediction. Nevertheless, a consensus on certain aspects remains elusive.In this Account, we delineate the current status of employing DL to augment LD within the VS paradigm, highlighting our contributions to this domain. Furthermore, we discuss the challenges and future prospects, drawing insights from our scholarly investigations. Initially, we present an overview of VS and LD, followed by an introduction to DL paradigms, which deviate significantly from traditional search-scoring frameworks. Subsequently, we delve into the challenges associated with the development of DL-based LD (DLLD), encompassing evaluation metrics, application scenarios, and physical plausibility of the predicted conformations. In the evaluation of LD algorithms, it is essential to recognize the multifaceted nature of the metrics. While the accuracy of binding pose prediction, often measured by the success rate, is a pivotal aspect, the scoring/screening power and computational speed of these algorithms are equally important given the pivotal role of LD tools in VS. Regarding application scenarios, early methods focused on blind docking, where the binding site is unknown. However, recent studies suggest a shift toward identifying binding sites rather than solely predicting binding poses within these models. In contrast, LD with a known pocket in VS has been shown to be more practical. Physical plausibility poses another significant challenge. Although DLLD models often achieve higher success rates compared to traditional methods, they may generate poses with implausible local structures, such as incorrect bond angles or lengths, which are disadvantageous for postprocessing tasks like visualization. Finally, we discuss the future perspectives for DLLD, emphasizing the need to improve generalization ability, strike a balance between speed and accuracy, account for protein conformation flexibility, and enhance physical plausibility. Additionally, we delve into the comparison between generative and regression algorithms in this context, exploring their respective strengths and potential.
Collapse
Affiliation(s)
- Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Haotian Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
6
|
Herlina T, Akili AWR, Nishinarizki V, Hardianto A, Latip J. Bioinformatics Study of Flavonoids From Genus Erythrina As Ace2 inhibitor Candidates For Covid-19 Treatment. Adv Appl Bioinform Chem 2024; 17:61-70. [PMID: 38764460 PMCID: PMC11102127 DOI: 10.2147/aabc.s454961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/03/2024] [Indexed: 05/21/2024] Open
Abstract
Purpose This study aimed to screen potential drug candidates from the flavonoids of the genus Erythrina for the Corona Virus Disease 2019 (COVID-19) treatment. Patients and Methods A comprehensive screening was conducted on the structures of 473 flavonoids derived from the genus Erythrina, focusing on their potential toxicity and pharmacokinetic profiles. Subsequently, flavonoids that were non-toxic and possessed favorable pharmacokinetic properties underwent further analysis to explore their interactions with the angiotensin-converting enzyme 2 (ACE2) receptor, employing molecular docking and molecular dynamics simulations. Results Among 473 flavonoids, 104 were predicted to be safe from being mutagenic, hepatotoxic, and inhibitors of the human ether-a-go-go-related gene (hERG). Among these 104 flavonoids, 18 compounds were predicted not to be substrates of P-glycoprotein (P-gp). Among these 18 flavonoids, gangetinin (471) and erybraedin D (310) exhibit low binding affinities and root mean square deviation (RMSD) values, indicating stable binding to the ACE2 receptor. The physicochemical attributes of compounds 310 and 471 suggest that they possess drug-like properties. Conclusion Gangetinin (471) and erybraedin D (310) may serve as promising candidates for COVID-19 treatment due to their potential to inhibit the ACE2-RBD interaction. This warrants further investigation into their inhibitory effects on ACE2-RBD binding through in vitro experiments.
Collapse
Affiliation(s)
- Tati Herlina
- Department of Chemistry, Universitas Padjadjaran, Jatinangor, West Java, Indonesia
| | | | - Vicki Nishinarizki
- Department of Chemistry, Universitas Padjadjaran, Jatinangor, West Java, Indonesia
| | - Ari Hardianto
- Department of Chemistry, Universitas Padjadjaran, Jatinangor, West Java, Indonesia
| | - Jalifah Latip
- Department of Chemical Sciences, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| |
Collapse
|
7
|
Gu S, Yang Y, Zhao Y, Qiu J, Wang X, Tong HHY, Liu L, Wan X, Liu H, Hou T, Kang Y. Evaluation of AlphaFold2 Structures for Hit Identification across Multiple Scenarios. J Chem Inf Model 2024; 64:3630-3639. [PMID: 38630855 DOI: 10.1021/acs.jcim.3c01976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The introduction of AlphaFold2 (AF2) has sparked significant enthusiasm and generated extensive discussion within the scientific community, particularly among drug discovery researchers. Although previous studies have addressed the performance of AF2 structures in virtual screening (VS), a more comprehensive investigation is still necessary considering the paramount importance of structural accuracy in drug design. In this study, we evaluate the performance of AF2 structures in VS across three common drug discovery scenarios: targets with holo, apo, and AF2 structures; targets with only apo and AF2 structures; and targets exclusively with AF2 structures. We utilized both the traditional physics-based Glide and the deep-learning-based scoring function RTMscore to rank the compounds in the DUD-E, DEKOIS 2.0, and DECOY data sets. The results demonstrate that, overall, the performance of VS on AF2 structures is comparable to that on apo structures but notably inferior to that on holo structures across diverse scenarios. Moreover, when a target has solely AF2 structure, selecting the holo structure of the target from different subtypes within the same protein family produces comparable results with the AF2 structure for VS on the data set of the AF2 structures, and significantly better results than the AF2 structures on its own data set. This indicates that utilizing AF2 structures for docking-based VS may not yield most satisfactory outcomes, even when solely AF2 structures are available. Moreover, we rule out the possibility that the variations in VS performance between the binding pockets of AF2 and holo structures arise from the differences in their biological assembly composition.
Collapse
Affiliation(s)
- Shukai Gu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yuwei Yang
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Yihao Zhao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiayue Qiu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Re-search in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China
| | - Henry Hoi Yee Tong
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Xiaozhe Wan
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Nanjing 210000, Jiangsu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078, SAR, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
8
|
Xu Y, Cao L, Chen Y, Zhang Z, Liu W, Li H, Ding C, Pu J, Qian K, Xu W. Integrating Machine Learning in Metabolomics: A Path to Enhanced Diagnostics and Data Interpretation. SMALL METHODS 2024:e2400305. [PMID: 38682615 DOI: 10.1002/smtd.202400305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/07/2024] [Indexed: 05/01/2024]
Abstract
Metabolomics, leveraging techniques like NMR and MS, is crucial for understanding biochemical processes in pathophysiological states. This field, however, faces challenges in metabolite sensitivity, data complexity, and omics data integration. Recent machine learning advancements have enhanced data analysis and disease classification in metabolomics. This study explores machine learning integration with metabolomics to improve metabolite identification, data efficiency, and diagnostic methods. Using deep learning and traditional machine learning, it presents advancements in metabolic data analysis, including novel algorithms for accurate peak identification, robust disease classification from metabolic profiles, and improved metabolite annotation. It also highlights multiomics integration, demonstrating machine learning's potential in elucidating biological phenomena and advancing disease diagnostics. This work contributes significantly to metabolomics by merging it with machine learning, offering innovative solutions to analytical challenges and setting new standards for omics data analysis.
Collapse
Affiliation(s)
- Yudian Xu
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Linlin Cao
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Yifan Chen
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Ziyue Zhang
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Wanshan Liu
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - He Li
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Chenhuan Ding
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Jun Pu
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Kun Qian
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Wei Xu
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| |
Collapse
|
9
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
10
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
11
|
Rayka M, Mirzaei M, Mohammad Latifi A. An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. Mol Inform 2024; 43:e202300292. [PMID: 38358080 DOI: 10.1002/minf.202300292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/22/2024] [Accepted: 02/02/2024] [Indexed: 02/16/2024]
Abstract
When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Morteza Mirzaei
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Mohammad Latifi
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
12
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
13
|
Zengin IN, Koca MS, Tayfuroglu O, Yildiz M, Kocak A. Benchmarking ANI potentials as a rescoring function and screening FDA drugs for SARS-CoV-2 M pro. J Comput Aided Mol Des 2024; 38:15. [PMID: 38532176 DOI: 10.1007/s10822-024-00554-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/26/2024] [Indexed: 03/28/2024]
Abstract
Here, we introduce the use of ANI-ML potentials as a rescoring function in the host-guest interaction in molecular docking. Our results show that the "docking power" of ANI potentials can compete with the current scoring functions at the same level of computational cost. Benchmarking studies on CASF-2016 dataset showed that ANI is ranked in the top 5 scoring functions among the other 34 tested. In particular, the ANI predicted interaction energies when used in conjunction with GOLD-PLP scoring function can boost the top ranked solution to be the closest to the x-ray structure. Rapid and accurate calculation of interaction energies between ligand and protein also enables screening of millions of drug candidates/docking poses. Using a unique protocol in which docking by GOLD-PLP, rescoring by ANI-ML potentials and extensive MD simulations along with end state free energy methods are combined, we have screened FDA approved drugs against the SARS-CoV-2 main protease (Mpro). The top six drug molecules suggested by the consensus of these free energy methods have already been in clinical trials or proposed as potential drug molecules in previous theoretical and experimental studies, approving the validity and the power of accuracy in our screening method.
Collapse
Affiliation(s)
- Irem N Zengin
- Department of Chemistry, Gebze Technical University, 41400, Gebze, Kocaeli, Turkey
| | - M Serdar Koca
- Department of Molecular Biology and Genetics, Gebze Technical University, 41400, Gebze, Kocaeli, Turkey
- Pfizer - Universidad de Granada - Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), 18016, Granada, Spain
| | - Omer Tayfuroglu
- Department of Chemistry, Gebze Technical University, 41400, Gebze, Kocaeli, Turkey
| | - Muslum Yildiz
- Department of Molecular Biology and Genetics, Gebze Technical University, 41400, Gebze, Kocaeli, Turkey
| | - Abdulkadir Kocak
- Department of Chemistry, Gebze Technical University, 41400, Gebze, Kocaeli, Turkey.
| |
Collapse
|
14
|
Luo M, He J, Yin L, Zhan P, Zhao Z, Xiong H, Mei Z. Borneol exerts its antipruritic effects by inhibiting TRPA1 and activating TRPM8. JOURNAL OF ETHNOPHARMACOLOGY 2024; 322:117581. [PMID: 38103845 DOI: 10.1016/j.jep.2023.117581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/28/2023] [Accepted: 12/10/2023] [Indexed: 12/19/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Borneol is a long-established traditional Chinese medicine that has been found to be effective in treating pain and itchy skin. However, whether borneol has a therapeutic effect on chronic itch and its related mechanisms remain unclear. AIM OF THE STUDY To investigate the antipruritic effect of borneol and its molecular mechanism. MATERIALS AND METHODS DrugBAN framework and molecular docking were applied to predict the targets of borneol, and the calcium imaging or patch-clamp recording analysis were used to detect the effects of borneol on TRPA1, TRPM8 or TRPV3 channels in HEK293T cells. In addition, various mouse models of acute itch and chronic itch were established to evaluate the antipruritic effects of borneol on C57BL/6J mice. Then, the borneol-induced pruritic relief was further investigated in Trpa1-/-, Trpm8-/-, or Trpa1-/-/Trpm8-/- mice. The effects of borneol on the activation of TRPM8 and the inhibition of TRPA1 were also measured in dorsal root ganglia neurons of wild-type (WT), Trpm8-/- and Trpv1-/- mice. Lastly, a randomized, double-blind study of adult patients was conducted to evaluate the clinical antipruritic effect of borneol. RESULTS TRPA1, TRPV3 and TRPM8 are the potential targets of borneol according to the results of DrugBAN algorithm and molecular docking. Calcium imaging and patch-clamp recording analysis demonstrated that borneol activates TRPM8 channel-induced cell excitability and inhibits TRPA1 channel-mediated cell excitability in transfected HEK293T cells. Animal behavior analysis showed that borneol can significantly reduce acute and chronic itch behavior in C57BL/6J mice, but this effect was eliminated in Trpa1-/-, Trpm8-/- mice, or at least in Trpa1-/-/Trpm8-/- mice. Borneol elicits TRPM8 channel induced [Ca2+]i responses but inhibits AITC or SADBE-induced activation of TRPA1 channels in dorsal root ganglia neurons of WT and Trpv1-/- mice, respectively. Furthermore, the clinical results indicated that borneol could reduce itching symptoms in patients and its efficacy is similar to that of menthol. CONCLUSION Borneol has therapeutic effects on multiple pruritus models in mice and patients with chronic itch, and the mechanism may be through inhibiting TRPA1 and activating TRPM8.
Collapse
Affiliation(s)
- Miao Luo
- School of Pharmaceutical Sciences, South-Central Minzu University, Wuhan, 430074, China
| | - Jinfeng He
- School of Pharmaceutical Sciences, South-Central Minzu University, Wuhan, 430074, China
| | - Liang Yin
- School of Pharmaceutical Sciences, South-Central Minzu University, Wuhan, 430074, China
| | - Ping Zhan
- Dermatology Hospital of Jiangxi Province, Nanchang, 330000, China
| | | | - Hui Xiong
- School of Pharmaceutical Sciences, South-Central Minzu University, Wuhan, 430074, China; Ethnopharmacology Level 3 Laboratory of National Administration of Traditional Chinese Medicine, South-Central Minzu University, Wuhan, 430074, China.
| | - Zhinan Mei
- School of Pharmaceutical Sciences, South-Central Minzu University, Wuhan, 430074, China; College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
15
|
Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024; 64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Improving the generalization ability of scoring functions remains a major challenge in protein-ligand binding affinity prediction. Many machine learning methods are limited by their reliance on single-modal representations, hindering a comprehensive understanding of protein-ligand interactions. We introduce a graph-neural-network-based scoring function that utilizes a triplet contrastive learning loss to improve protein-ligand representations. In this model, three-dimensional complex representations and the fusion of two-dimensional ligand and coarse-grained pocket representations converge while distancing from decoy representations in latent space. After rigorous validation on multiple external data sets, our model exhibits commendable generalization capabilities compared to those of other deep learning-based scoring functions, marking it as a promising tool in the realm of drug discovery. In the future, our training framework can be extended to other biophysical- and biochemical-related problems such as protein-protein interaction and protein mutation prediction.
Collapse
Affiliation(s)
- Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Dandan Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Xiaoyang Qu
- School of Pharmacy and Medical Technology, Putian University, Putian 351100, P. R. China
- Key Laboratory of Pharmaceutical Analysis and Laboratory Medicine (Putian University), Fujian Province University, Putian 351100, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
16
|
Metcalf DP, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek P Metcalf
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Andrea Bortolato
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| |
Collapse
|
17
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
18
|
Mohebbinia Z, Firouzi R, Karimi-Jafari MH. Improving protein-ligand docking results using the Semiempirical quantum mechanics: testing on the PDBbind 2016 core set. J Biomol Struct Dyn 2024:1-11. [PMID: 38165642 DOI: 10.1080/07391102.2023.2299742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/20/2023] [Indexed: 01/04/2024]
Abstract
Molecular docking techniques are routinely employed for predicting ligand binding conformations and affinities in the in silico phase of the drug design and development process. In this study, a reliable semiempirical quantum mechanics (SQM) method, PM7, was employed for geometry optimization of top-ranked poses obtained from two widely used docking programs, AutoDock4 and AutoDock Vina. The PDBbind core set (version 2016), which contains high-quality crystal protein - ligand complexes with their corresponding experimental binding affinities, was used as an initial dataset in this research. It was shown that docking pose optimization improves the accuracy of pose predictions and is very useful for the refinement of docked complexes via removing clashes between ligands and proteins. It was also demonstrated that AutoDock Vina achieves a higher sampling power than AutoDock4 in generating accurate ligand poses (RMSD ≤ 2.0 Å), while AutoDock4 exhibits a better ranking power than AutoDock Vina. Finally, a new protocol based on a combination of the results obtained from the two docking programs was proposed for structure-based virtual screening studies, which benefits from the robust sampling abilities of AutoDock Vina and the reliable ranking performance of AutoDock4.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zainab Mohebbinia
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | - Rohoullah Firouzi
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | | |
Collapse
|
19
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
20
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
21
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
22
|
Zhao X, Li H, Zhang K, Huang SY. Iterative Knowledge-Based Scoring Function for Protein-Ligand Interactions by Considering Binding Affinity Information. J Phys Chem B 2023; 127:9021-9034. [PMID: 37822259 DOI: 10.1021/acs.jpcb.3c04421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Scoring functions for protein-ligand interactions play a critical role in structure-based drug design. Owing to the good balance between general applicability and computational efficiency, knowledge-based scoring functions have obtained significant advancements and achieved many successes. Nevertheless, knowledge-based scoring functions face a challenge in utilizing the experimental affinity data and thus may not perform well in binding affinity prediction. Addressing the challenge, we have proposed an improved version of the iterative knowledge-based scoring function ITScore by considering binding affinity information, which is referred to as ITScoreAff, based on a large training set of 6216 protein-ligand complexes with both structures and affinity data. ITScoreAff was extensively evaluated and compared with ITScore, 33 traditional, and 6 machine learning scoring functions in terms of docking power, ranking power, and screening power on the independent CASF-2016 benchmark. It was shown that ITScoreAff obtained an overall better performance than the other 40 scoring functions and gave an average success rate of 85.3% in docking power, a correlation coefficient of 0.723 in scoring power, and an average rank correlation coefficient of 0.668 in ranking power. In addition, ITScoreAff also achieved the overall best screening power when the top 10% of the ranked database were considered. These results demonstrated the robustness of ITScoreAff and its improvement over existing scoring functions.
Collapse
Affiliation(s)
- Xuejun Zhao
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Keqiong Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
23
|
Zhang C, Sui Y, Liu S, Yang M. Anti-Viral Activity of Bioactive Molecules of Silymarin against COVID-19 via In Silico Studies. Pharmaceuticals (Basel) 2023; 16:1479. [PMID: 37895950 PMCID: PMC10610370 DOI: 10.3390/ph16101479] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 10/11/2023] [Accepted: 10/11/2023] [Indexed: 10/29/2023] Open
Abstract
The severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) infection drove the global coronavirus disease 2019 (COVID-19) pandemic, causing a huge loss of human life and a negative impact on economic development. It is an urgent necessity to explore potential drugs against viruses, such as SARS-CoV-2. Silymarin, a mixture of herb-derived polyphenolic flavonoids extracted from the milk thistle, possesses potent antioxidative, anti-apoptotic, and anti-inflammatory properties. Accumulating research studies have demonstrated the killing activity of silymarin against viruses, such as dengue virus, chikungunya virus, and hepatitis C virus. However, the anti-COVID-19 mechanisms of silymarin remain unclear. In this study, multiple disciplinary approaches and methodologies were applied to evaluate the potential mechanisms of silymarin as an anti-viral agent against SARS-CoV-2 infection. In silico approaches such as molecular docking, network pharmacology, and bioinformatic methods were incorporated to assess the ligand-protein binding properties and analyze the protein-protein interaction network. The DAVID database was used to analyze gene functions, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment. TCMSP and GeneCards were used to identify drug target genes and COVID-19-related genes. Our results revealed that silymarin compounds, such as silybin A/B and silymonin, displayed triplicate functions against SARS-CoV-2 infection, including directly binding with human angiotensin-converting enzyme 2 (ACE2) to inhibit SARS-CoV-2 entry into the host cells, directly binding with viral proteins RdRp and helicase to inhibit viral replication and proliferation, and regulating host immune response to indirectly inhibit viral infection. Specifically, the targets of silymarin molecules in immune regulation were screened out, such as proinflammatory cytokines TNF and IL-6 and cell growth factors VEGFA and EGF. In addition, the molecular mechanism of drug-target protein interaction was investigated, including the binding pockets of drug molecules in human ACE2 and viral proteins, the formation of hydrogen bonds, hydrophobic interactions, and other drug-protein ligand interactions. Finally, the drug-likeness results of candidate molecules passed the criteria for drug screening. Overall, this study demonstrates the molecular mechanism of silymarin molecules against SARS-CoV-2 infection.
Collapse
Affiliation(s)
- Chunye Zhang
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA;
| | - Yuxiang Sui
- School of Life Science, Shanxi Normal University, Linfen 041004, China;
| | - Shuai Liu
- The First Affiliated Hospital, Zhejiang University, Hangzhou 310006, China;
| | - Ming Yang
- Department of Surgery, University of Missouri, Columbia, MO 65212, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65212, USA
| |
Collapse
|
24
|
Dragan P, Joshi K, Atzei A, Latek D. Keras/TensorFlow in Drug Design for Immunity Disorders. Int J Mol Sci 2023; 24:15009. [PMID: 37834457 PMCID: PMC10573944 DOI: 10.3390/ijms241915009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 09/21/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023] Open
Abstract
Homeostasis of the host immune system is regulated by white blood cells with a variety of cell surface receptors for cytokines. Chemotactic cytokines (chemokines) activate their receptors to evoke the chemotaxis of immune cells in homeostatic migrations or inflammatory conditions towards inflamed tissue or pathogens. Dysregulation of the immune system leading to disorders such as allergies, autoimmune diseases, or cancer requires efficient, fast-acting drugs to minimize the long-term effects of chronic inflammation. Here, we performed structure-based virtual screening (SBVS) assisted by the Keras/TensorFlow neural network (NN) to find novel compound scaffolds acting on three chemokine receptors: CCR2, CCR3, and one CXC receptor, CXCR3. Keras/TensorFlow NN was used here not as a typically used binary classifier but as an efficient multi-class classifier that can discard not only inactive compounds but also low- or medium-activity compounds. Several compounds proposed by SBVS and NN were tested in 100 ns all-atom molecular dynamics simulations to confirm their binding affinity. To improve the basic binding affinity of the compounds, new chemical modifications were proposed. The modified compounds were compared with known antagonists of these three chemokine receptors. Known CXCR3 compounds were among the top predicted compounds; thus, the benefits of using Keras/TensorFlow in drug discovery have been shown in addition to structure-based approaches. Furthermore, we showed that Keras/TensorFlow NN can accurately predict the receptor subtype selectivity of compounds, for which SBVS often fails. We cross-tested chemokine receptor datasets retrieved from ChEMBL and curated datasets for cannabinoid receptors. The NN model trained on the cannabinoid receptor datasets retrieved from ChEMBL was the most accurate in the receptor subtype selectivity prediction. Among NN models trained on the chemokine receptor datasets, the CXCR3 model showed the highest accuracy in differentiating the receptor subtype for a given compound dataset.
Collapse
Affiliation(s)
- Paulina Dragan
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Kavita Joshi
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Alessandro Atzei
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
- Department of Life and Environmental Science, Food Toxicology Unit, University of Cagliari, University Campus of Monserrato, SS 554, 09042 Cagliari, Italy
| | - Dorota Latek
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| |
Collapse
|
25
|
Zhang X, Shen C, Wang T, Deng Y, Kang Y, Li D, Hou T, Pan P. ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions. Brief Bioinform 2023; 24:bbad295. [PMID: 37738401 DOI: 10.1093/bib/bbad295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/17/2023] [Accepted: 07/31/2023] [Indexed: 09/24/2023] Open
Abstract
Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
26
|
Zhang X, Zhang O, Shen C, Qu W, Chen S, Cao H, Kang Y, Wang Z, Wang E, Zhang J, Deng Y, Liu F, Wang T, Du H, Wang L, Pan P, Chen G, Hsieh CY, Hou T. Efficient and accurate large library ligand docking with KarmaDock. NATURE COMPUTATIONAL SCIENCE 2023; 3:789-804. [PMID: 38177786 DOI: 10.1038/s43588-023-00511-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/08/2023] [Indexed: 01/06/2024]
Abstract
Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein-ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wanglin Qu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Shicheng Chen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hanqun Cao
- Department of Mathematics, Chinese University of Hong Kong, Hong Kong, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | | | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, Zhejiang, China
| | - Furui Liu
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Langcheng Wang
- Department of Pathology, New York University Medical Center, New York, NY, USA
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| | | | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| |
Collapse
|
27
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
28
|
Zhang X, Shen C, Jiang D, Zhang J, Ye Q, Xu L, Hou T, Pan P, Kang Y. TB-IECS: an accurate machine learning-based scoring function for virtual screening. J Cheminform 2023; 15:63. [PMID: 37403155 DOI: 10.1186/s13321-023-00731-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/18/2023] [Indexed: 07/06/2023] Open
Abstract
Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
29
|
Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023; 9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Collapse
Affiliation(s)
- Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan
| | | | - Sheheryar Khan
- School of Professional Education & Executive Development, The Hong Kong Polytechnic University, Hong Kong
| | - Jia Wu
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | | | - John Heymach
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Xiuning Le
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
30
|
Cui Z, Zhang N, Zhou T, Zhou X, Meng H, Yu Y, Zhang Z, Zhang Y, Wang W, Liu Y. Conserved Sites and Recognition Mechanisms of T1R1 and T2R14 Receptors Revealed by Ensemble Docking and Molecular Descriptors and Fingerprints Combined with Machine Learning. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:5630-5645. [PMID: 37005743 DOI: 10.1021/acs.jafc.3c00591] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Taste peptides, as an important component of protein-rich foodstuffs, potentiate the nutrition and taste of food. Thereinto, umami- and bitter-taste peptides have been ex tensively reported, while their taste mechanisms remain unclear. Meanwhile, the identification of taste peptides is still a time-consuming and costly task. In this study, 489 peptides with umami/bitter taste from TPDB (http://tastepeptides-meta.com/) were collected and used to train the classification models based on docking analysis, molecular descriptors (MDs), and molecular fingerprints (FPs). A consensus model, taste peptide docking machine (TPDM), was generated based on five learning algorithms (linear regression, random forest, gaussian naive bayes, gradient boosting tree, and stochastic gradient descent) and four molecular representation schemes. Model interpretive analysis showed that MDs (VSA_EState, MinEstateIndex, MolLogP) and FPs (598, 322, 952) had the greatest impact on the umami/bitter prediction of peptides. Based on the consensus docking results, we obtained the key recognition modes of umami/bitter receptors (T1Rs/T2Rs): (1) residues 107S-109S, 148S-154T, 247F-249A mainly form hydrogen bonding contacts and (2) residues 153A-158L, 163L, 181Q, 218D, 247F-249A in T1R1 and 56D, 106P, 107V, 152V-156F, 173K-180F in T2R14 constituted their hydrogen bond pockets. The model is available at http://www.tastepeptides-meta.com/yyds.
Collapse
Affiliation(s)
- Zhiyong Cui
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ninglong Zhang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianxing Zhou
- Department of Bioinformatics, Faculty of Science, The University of Melbourne, Parkville 3010, Victoria, Australia
| | - Xueke Zhou
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hengli Meng
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyang Yu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhiwei Zhang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yin Zhang
- Key Laboratory of Meat Processing of Sichuan, Chengdu University, Chengdu 610106, China
| | - Wenli Wang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yuan Liu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
31
|
Cavasotto CN, Di Filippo JI. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J Chem Inf Model 2023; 63:2267-2280. [PMID: 37036491 DOI: 10.1021/acs.jcim.2c01471] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Structure-based virtual screening methods are, nowadays, one of the key pillars of computational drug discovery. In recent years, a series of studies have reported docking-based virtual screening campaigns of large databases ranging from hundreds to thousands of millions compounds, further identifying novel hits after experimental validation. As these larg-scale efforts are not generally accessible, machine learning-based protocols have emerged to accelerate the identification of virtual hits within an ultralarge chemical space, reaching impressive reductions in computational time. Herein, we illustrate the motivation and the problem behind the screening of large databases, providing an overview of key concepts and essential applications of machine learning-accelerated protocols, specifically concerning supervised learning methods. We also discuss where the field stands with these novel developments, highlighting possible insights for future studies.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| | - Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| |
Collapse
|
32
|
Gu S, Shen C, Yu J, Zhao H, Liu H, Liu L, Sheng R, Xu L, Wang Z, Hou T, Kang Y. Can molecular dynamics simulations improve predictions of protein-ligand binding affinity with machine learning? Brief Bioinform 2023; 24:6995375. [PMID: 36681903 DOI: 10.1093/bib/bbad008] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/04/2022] [Accepted: 12/30/2023] [Indexed: 01/23/2023] Open
Abstract
Binding affinity prediction largely determines the discovery efficiency of lead compounds in drug discovery. Recently, machine learning (ML)-based approaches have attracted much attention in hopes of enhancing the predictive performance of traditional physics-based approaches. In this study, we evaluated the impact of structural dynamic information on the binding affinity prediction by comparing the models trained on different dimensional descriptors, using three targets (i.e. JAK1, TAF1-BD2 and DDR1) and their corresponding ligands as the examples. Here, 2D descriptors are traditional ECFP4 fingerprints, 3D descriptors are the energy terms of the Smina and NNscore scoring functions and 4D descriptors contain the structural dynamic information derived from the trajectories based on molecular dynamics (MD) simulations. We systematically investigate the MD-refined binding affinity prediction performance of three classical ML algorithms (i.e. RF, SVR and XGB) as well as two common virtual screening methods, namely Glide docking and MM/PBSA. The outcomes of the ML models built using various dimensional descriptors and their combinations reveal that the MD refinement with the optimized protocol can improve the predictive performance on the TAF1-BD2 target with considerable structural flexibility, but not for the less flexible JAK1 and DDR1 targets, when taking docking poses as the initial structure instead of the crystal structures. The results highlight the importance of the initial structures to the final performance of the model through conformational analysis on the three targets with different flexibility.
Collapse
Affiliation(s)
- Shukai Gu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiahui Yu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hong Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Rong Sheng
- Health Technology Development Dept, Huawei Device Co., Ltd., Dongguan 523808, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
33
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
34
|
Kwon Y, Park S, Lee J, Kang J, Lee HJ, Kim W. BEAR: A Novel Virtual Screening Method Based on Large-Scale Bioactivity Data. J Chem Inf Model 2023; 63:1429-1437. [PMID: 36821004 DOI: 10.1021/acs.jcim.2c01300] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Data-driven drug discovery exploits a comprehensive set of big data to provide an efficient path for the development of new drugs. Currently, publicly available bioassay data sets provide extensive information regarding the bioactivity profiles of millions of compounds. Using these large-scale drug screening data sets, we developed a novel in silico method to virtually screen hit compounds against protein targets, named BEAR (Bioactive compound Enrichment by Assay Repositioning). The underlying idea of BEAR is to reuse bioassay data for predicting hit compounds for targets other than their originally intended purposes, i.e., "assay repositioning". The BEAR approach differs from conventional virtual screening methods in that (1) it relies solely on bioactivity data and requires no physicochemical features of either the target or ligand. (2) Accordingly, structurally diverse candidates are predicted, allowing for scaffold hopping. (3) BEAR shows stable performance across diverse target classes, suggesting its general applicability. Large-scale cross-validation of more than a thousand targets showed that BEAR accurately predicted known ligands (median area under the curve = 0.87), proving that BEAR maintained a robust performance even in the validation set with additional constraints. In addition, a comparative analysis demonstrated that BEAR outperformed other machine learning models, including a recent deep learning model for ABC transporter family targets. We predicted P-gp and BCRP dual inhibitors using the BEAR approach and validated the predicted candidates using in vitro assays. The intracellular accumulation effects of mitoxantrone, a well-known P-gp/BCRP dual substrate for cancer treatment, confirmed nine out of 72 dual inhibitor candidates preselected by primary cytotoxicity screening. Consequently, these nine hits are novel and potent dual inhibitors for both P-gp and BCRP, solely predicted by bioactivity profiles without relying on any structural information of targets or ligands.
Collapse
Affiliation(s)
| | - Sera Park
- KaiPharm, Seoul 03760, Republic of Korea
| | - Jaeok Lee
- College of Pharmacy, Research Institute of Pharmaceutical Science, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Jiyeon Kang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Hwa Jeong Lee
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Wankyu Kim
- KaiPharm, Seoul 03760, Republic of Korea.,Department of Life Sciences, College of Natural Science, Ewha Womans University, Seoul 03760, Republic of Korea
| |
Collapse
|
35
|
Rayka M, Firouzi R. GB-score: Minimally designed machine learning scoring function based on distance-weighted interatomic contact features. Mol Inform 2023; 42:e2200135. [PMID: 36722733 DOI: 10.1002/minf.202200135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 11/24/2022] [Accepted: 11/28/2022] [Indexed: 02/02/2023]
Abstract
In recent years, thanks to advances in computer hardware and dataset availability, data-driven approaches (like machine learning) have become one of the essential parts of the drug design framework to accelerate drug discovery procedures. Constructing a new scoring function, a function that can predict the binding score for a generated protein-ligand pose during docking procedure or a crystal complex, based on machine and deep learning has become an active research area in computer-aided drug design. GB-Score is a state-of-the-art machine learning-based scoring function that utilizes distance-weighted interatomic contact features, PDBbind-v2019 general set, and Gradient Boosting Trees algorithm to the binding affinity prediction. The distance-weighted interatomic contact featurization method used the distance between different ligand and protein atom types for numerical representation of the protein-ligand complex. GB-Score attains Pearson's correlation 0.862 and RMSE 1.190 on the CASF-2016 benchmark test in the scoring power metric. GB-Score's codes are freely available on the web at https://github.com/miladrayka/GB_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | - Rohoullah Firouzi
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| |
Collapse
|
36
|
Yu Y, Xu S, He R, Liang G. Application of Molecular Simulation Methods in Food Science: Status and Prospects. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:2684-2703. [PMID: 36719790 DOI: 10.1021/acs.jafc.2c06789] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Molecular simulation methods, such as molecular docking, molecular dynamic (MD) simulation, and quantum chemical (QC) calculation, have become popular as characterization and/or virtual screening tools because they can visually display interaction details that in vitro experiments can not capture and quickly screen bioactive compounds from large databases with millions of molecules. Currently, interdisciplinary research has expanded molecular simulation technology from computer aided drug design (CADD) to food science. More food scientists are supporting their hypotheses/results with this technology. To understand better the use of molecular simulation methods, it is necessary to systematically summarize the latest applications and usage trends of molecular simulation methods in the research field of food science. However, this type of review article is rare. To bridge this gap, we have comprehensively summarized the principle, combination usage, and application of molecular simulation methods in food science. We also analyzed the limitations and future trends and offered valuable strategies with the latest technologies to help food scientists use molecular simulation methods.
Collapse
Affiliation(s)
- Yuandong Yu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Shiqi Xu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Ran He
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| |
Collapse
|
37
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
38
|
Wang L, Shi SH, Li H, Zeng XX, Liu SY, Liu ZQ, Deng YF, Lu AP, Hou TJ, Cao DS. Reducing false positive rate of docking-based virtual screening by active learning. Brief Bioinform 2023; 24:6987822. [PMID: 36642412 DOI: 10.1093/bib/bbac626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/10/2022] [Accepted: 12/20/2022] [Indexed: 01/17/2023] Open
Abstract
Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Shao-Hua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Hui Li
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Xiang-Xiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Su-You Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhao-Qian Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Ya-Feng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| |
Collapse
|
39
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
40
|
Zhu H, Zhang Y, Li W, Huang N. A Comprehensive Survey of Prospective Structure-Based Virtual Screening for Early Drug Discovery in the Past Fifteen Years. Int J Mol Sci 2022; 23:ijms232415961. [PMID: 36555602 PMCID: PMC9781938 DOI: 10.3390/ijms232415961] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 12/12/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Structure-based virtual screening (SBVS), also known as molecular docking, has been increasingly applied to discover small-molecule ligands based on the protein structures in the early stage of drug discovery. In this review, we comprehensively surveyed the prospective applications of molecular docking judged by solid experimental validations in the literature over the past fifteen years. Herein, we systematically analyzed the novelty of the targets and the docking hits, practical protocols of docking screening, and the following experimental validations. Among the 419 case studies we reviewed, most virtual screenings were carried out on widely studied targets, and only 22% were on less-explored new targets. Regarding docking software, GLIDE is the most popular one used in molecular docking, while the DOCK 3 series showed a strong capacity for large-scale virtual screening. Besides, the majority of identified hits are promising in structural novelty and one-quarter of the hits showed better potency than 1 μM, indicating that the primary advantage of SBVS is to discover new chemotypes rather than highly potent compounds. Furthermore, in most studies, only in vitro bioassays were carried out to validate the docking hits, which might limit the further characterization and development of the identified active compounds. Finally, several successful stories of SBVS with extensive experimental validations have been highlighted, which provide unique insights into future SBVS drug discovery campaigns.
Collapse
Affiliation(s)
- Hui Zhu
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Yulin Zhang
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Wei Li
- RPXDs (Suzhou) Co., Ltd., Suzhou 215028, China
| | - Niu Huang
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Correspondence:
| |
Collapse
|
41
|
Developing a Naïve Bayesian Classification Model with PI3Kγ structural features for virtual screening against PI3Kγ: Combining molecular docking and pharmacophore based on multiple PI3Kγ conformations. Eur J Med Chem 2022; 244:114824. [DOI: 10.1016/j.ejmech.2022.114824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/28/2022] [Accepted: 10/01/2022] [Indexed: 11/21/2022]
|
42
|
Zhu H, Yang J, Huang N. Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening. J Chem Inf Model 2022; 62:5485-5502. [PMID: 36268980 DOI: 10.1021/acs.jcim.2c01149] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In structure-based virtual screening (SBVS), it is critical that scoring functions capture protein-ligand atomic interactions. By focusing on the local domains of ligand binding pockets, a standardized pocket Pfam-based clustering (Pfam-cluster) approach was developed to assess the cross-target generalization ability of machine-learning scoring functions (MLSFs). Subsequently, 12 typical MLSFs were evaluated using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performances from Random-CV to Seq-CV to Pfam-CV experiments, not showing satisfactory generalization capacity. Our interpretable analysis suggested that the predictions on novel targets by MLSFs were dependent on buried solvent-accessible surface area (SASA)-related features of complex structures, with greater predicted binding affinities on complexes owning larger protein-ligand interfaces. By combining buried SASA-related features with target-specific patterns that were only shared among structurally similar compounds in the same cluster, the random forest (RF)-Score attained a good performance in the Random-CV test. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.
Collapse
Affiliation(s)
- Hui Zhu
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China102206, China.,National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| | - Niu Huang
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China102206, China.,National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China
| |
Collapse
|
43
|
Neto MFA, Campos JM, Cerqueira APM, de Lima LR, Da Costa GV, Ramos RDS, Junior JTM, Santos CBR, Leite FHA. Hierarchical Virtual Screening and Binding Free Energy Prediction of Potential Modulators of Aedes Aegypti Odorant-Binding Protein 1. Molecules 2022; 27:molecules27206777. [PMID: 36296371 PMCID: PMC9612181 DOI: 10.3390/molecules27206777] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 09/27/2022] [Accepted: 09/30/2022] [Indexed: 11/16/2022] Open
Abstract
The Aedes aegypti mosquito is the main hematophagous vector responsible for arbovirus transmission in Brazil. The disruption of A. aegypti hematophagy remains one of the most efficient and least toxic methods against these diseases and, therefore, efforts in the research of new chemical entities with repellent activity have advanced due to the elucidation of the functionality of the olfactory receptors and the behavior of mosquitoes. With the growing interest of the pharmaceutical and cosmetic industries in the development of chemical entities with repellent activity, computational studies (e.g., virtual screening and molecular modeling) are a way to prioritize potential modulators with stereoelectronic characteristics (e.g., pharmacophore models) and binding affinity to the AaegOBP1 binding site (e.g., molecular docking) at a lower computational cost. Thus, pharmacophore- and docking-based virtual screening was employed to prioritize compounds from Sigma-Aldrich® (n = 126,851) and biogenic databases (n = 8766). In addition, molecular dynamics (MD) was performed to prioritize the most potential potent compounds compared to DEET according to free binding energy calculations. Two compounds showed adequate stereoelectronic requirements (QFIT > 81.53), AaegOBP1 binding site score (Score > 42.0), volatility and non-toxic properties and better binding free energy value (∆G < −24.13 kcal/mol) compared to DEET ((N,N-diethyl-meta-toluamide)) (∆G = −24.13 kcal/mol).
Collapse
Affiliation(s)
- Moysés F. A. Neto
- Laboratório de Quimioinformática e Avaliação Biológica, Departamento de Saúde, Universidade Estadual de Feira de Santana, Feira de Santana 44036-900, Brazil
| | - Joaquín M. Campos
- Departamento de Química Farmacéutica y Orgánica, Universidad de Granada, 18071 Granada, Spain
- Biosanitary Institute of Granada (ibs.GRANADA), SAS-University of Granada, 18071 Granada, Spain
| | - Amanda P. M. Cerqueira
- Laboratório de Quimioinformática e Avaliação Biológica, Departamento de Saúde, Universidade Estadual de Feira de Santana, Feira de Santana 44036-900, Brazil
| | - Lucio R. de Lima
- Laboratório de Modelagem e Química Computacional, Departamento de Ciências Biológicas e da Saúde, Universidade Federal do Amapá, Macapá 68902-280, Brazil
| | - Glauber V. Da Costa
- Laboratório de Modelagem e Química Computacional, Departamento de Ciências Biológicas e da Saúde, Universidade Federal do Amapá, Macapá 68902-280, Brazil
| | - Ryan Da S. Ramos
- Laboratório de Modelagem e Química Computacional, Departamento de Ciências Biológicas e da Saúde, Universidade Federal do Amapá, Macapá 68902-280, Brazil
| | - Jairo T. Magalhães Junior
- Centro Multidisciplinar, Departamento de Saúde, Universidade Federal do Oeste da Bahia, Barreiras 47100-000, Brazil
| | - Cleydson B. R. Santos
- Laboratório de Modelagem e Química Computacional, Departamento de Ciências Biológicas e da Saúde, Universidade Federal do Amapá, Macapá 68902-280, Brazil
- Correspondence: (C.B.R.S.); (F.H.A.L.)
| | - Franco H. A. Leite
- Laboratório de Quimioinformática e Avaliação Biológica, Departamento de Saúde, Universidade Estadual de Feira de Santana, Feira de Santana 44036-900, Brazil
- Correspondence: (C.B.R.S.); (F.H.A.L.)
| |
Collapse
|
44
|
Qu X, Dong L, Zhang J, Si Y, Wang B. Systematic Improvement of the Performance of Machine Learning Scoring Functions by Incorporating Features of Protein-Bound Water Molecules. J Chem Inf Model 2022; 62:4369-4379. [PMID: 36083808 DOI: 10.1021/acs.jcim.2c00916] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Water molecules at the ligand-protein interfaces play crucial roles in the binding of the ligands, but the behavior of protein-bound water is largely ignored in many currently used machine learning (ML)-based scoring functions (SFs). In an attempt to improve the prediction performance of existing ML-based SFs, we estimated the water distribution with a HydraMap (HM) method and then incorporated the features extracted from protein-bound waters obtained in this way into three ML-based SFs: RF-Score, ECIF, and PLEC. It was found that a combination of HM-based features can consistently improve the performance of all three SFs, including their scoring, ranking, and docking power. HydraMap-based features show consistently good performance with both crystal structures and docked structures, demonstrating their robustness for SFs. Overall, HM-based features, which are a statistical representation of hydration sites at protein-ligand interfaces, are expected to improve the prediction performance for diverse SFs.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Jinyan Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| |
Collapse
|
45
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
46
|
Sinha S, Tam B, Wang SM. Applications of Molecular Dynamics Simulation in Protein Study. MEMBRANES 2022; 12:membranes12090844. [PMID: 36135863 PMCID: PMC9505860 DOI: 10.3390/membranes12090844] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 05/29/2023]
Abstract
Molecular Dynamics (MD) Simulations is increasingly used as a powerful tool to study protein structure-related questions. Starting from the early simulation study on the photoisomerization in rhodopsin in 1976, MD Simulations has been used to study protein function, protein stability, protein-protein interaction, enzymatic reactions and drug-protein interactions, and membrane proteins. In this review, we provide a brief review for the history of MD Simulations application and the current status of MD Simulations applications in protein studies.
Collapse
|
47
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
48
|
Du H, Jiang D, Gao J, Zhang X, Jiang L, Zeng Y, Wu Z, Shen C, Xu L, Cao D, Hou T, Pan P. Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network. Research (Wash D C) 2022; 2022:9873564. [PMID: 35958111 PMCID: PMC9343084 DOI: 10.34133/2022/9873564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/27/2022] [Indexed: 11/06/2022] Open
Abstract
Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.
Collapse
Affiliation(s)
- Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lingxiao Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004 Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| |
Collapse
|
49
|
Yang C, Chen EA, Zhang Y. Protein-Ligand Docking in the Machine-Learning Era. Molecules 2022; 27:4568. [PMID: 35889440 PMCID: PMC9323102 DOI: 10.3390/molecules27144568] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/14/2022] [Indexed: 11/16/2022] Open
Abstract
Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein-ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein-ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Eric Anthony Chen
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
50
|
Meixner M, Zachmann M, Metzler S, Scheerer J, Zacharias M, Antes I. Dynamic Docking of Macrocycles in Bound and Unbound Protein Structures with DynaDock. J Chem Inf Model 2022; 62:3426-3441. [PMID: 35796228 DOI: 10.1021/acs.jcim.2c00436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Macrocycles are interesting molecules with unique features due to their conformationally constrained yet flexible ring structure. This characteristic poses a difficult challenge for computational modeling studies since they rely on accurate structural descriptions. In particular, molecular docking calculations suffer from the lack of ring flexibility during pose generation, which is often compensated by using pregenerated ligand conformer ensembles. Moreover, receptor structures are mainly treated rigidly, which limits the use of many docking tools. In this study, we optimized our previous molecular dynamics-based sampling and docking pipeline specifically designed for the accurate prediction of macrocyclic compounds. We developed a dihedral classification procedure for in-depth conformational analysis of the macrocyclic rings and extracted structural ensembles that were subsequently docked in both bound and unbound protein structures employing a fully flexible approach. Our results suggest that including a ring conformer close to the bound state in the starting ensemble increases the chance of successful docking. The bioactive conformations of a diverse set of ligands could be predicted with high and decent accuracy in bound and unbound protein structures, respectively, due to the incorporation of full molecular flexibility in our approach. The remaining unsuccessful docking calculations were mainly caused by large flexible substituents that bind to surface-exposed binding sites, rather than the macrocyclic ring per se and could be further improved by explicit molecular dynamics simulations of the docked complex.
Collapse
Affiliation(s)
- Maximilian Meixner
- TUM School of Life Sciences, Technical University Munich, Am Staudengarten 2, Freising 85354, Germany
| | - Martin Zachmann
- TUM School of Life Sciences, Technical University Munich, Am Staudengarten 2, Freising 85354, Germany
| | - Sebastian Metzler
- TUM School of Life Sciences, Technical University Munich, Am Staudengarten 2, Freising 85354, Germany
| | - Jonathan Scheerer
- TUM School of Life Sciences, Technical University Munich, Am Staudengarten 2, Freising 85354, Germany
| | - Martin Zacharias
- Center of Functional Protein Assemblies, Technical University Munich, Ernst-Otto-Fischer-Straße 8, Garching bei München 85748, Germany
| | - Iris Antes
- TUM School of Life Sciences, Technical University Munich, Am Staudengarten 2, Freising 85354, Germany
| |
Collapse
|