1
|
Qian R, Xue J, Xu Y, Huang J. Alchemical Transformations and Beyond: Recent Advances and Real-World Applications of Free Energy Calculations in Drug Discovery. J Chem Inf Model 2024; 64:7214-7237. [PMID: 39360948 DOI: 10.1021/acs.jcim.4c01024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Computational methods constitute efficient strategies for screening and optimizing potential drug molecules. A critical factor in this process is the binding affinity between candidate molecules and targets, quantified as binding free energy. Among various estimation methods, alchemical transformation methods stand out for their theoretical rigor. Despite challenges in force field accuracy and sampling efficiency, advancements in algorithms, software, and hardware have increased the application of free energy perturbation (FEP) calculations in the pharmaceutical industry. Here, we review the practical applications of FEP in drug discovery projects since 2018, covering both ligand-centric and residue-centric transformations. We show that relative binding free energy calculations have steadily achieved chemical accuracy in real-world applications. In addition, we discuss alternative physics-based simulation methods and the incorporation of deep learning into free energy calculations.
Collapse
Affiliation(s)
- Runtong Qian
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Xue
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - You Xu
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
2
|
King JE, Koes DR. Interpreting forces as deep learning gradients improves quality of predicted protein structures. Biophys J 2024; 123:2730-2739. [PMID: 38104241 PMCID: PMC11393680 DOI: 10.1016/j.bpj.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/30/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023] Open
Abstract
Protein structure predictions from deep learning models like AlphaFold2, despite their remarkable accuracy, are likely insufficient for direct use in downstream tasks like molecular docking. The functionality of such models could be improved with a combination of increased accuracy and physical intuition. We propose a new method to train deep learning protein structure prediction models using molecular dynamics force fields to work toward these goals. Our custom PyTorch loss function, OpenMM-Loss, represents the potential energy of a predicted structure. OpenMM-Loss can be applied to any all-atom representation of a protein structure capable of mapping into our software package, SidechainNet. We demonstrate our method's efficacy by finetuning OpenFold. We show that subsequently predicted protein structures, both before and after a relaxation procedure, exhibit comparable accuracy while displaying lower potential energy and improved structural quality as assessed by MolProbity metrics.
Collapse
Affiliation(s)
- Jonathan Edward King
- Joint PhD Program in Computational Biology, Carnegie Mellon University-University of Pittsburgh, Pittsburgh, Pennsylvania
| | - David Ryan Koes
- Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania.
| |
Collapse
|
3
|
Fralish Z, Reker D. Finding the most potent compounds using active learning on molecular pairs. Beilstein J Org Chem 2024; 20:2152-2162. [PMID: 39224230 PMCID: PMC11368049 DOI: 10.3762/bjoc.20.185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/02/2024] [Indexed: 09/04/2024] Open
Abstract
Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.
Collapse
Affiliation(s)
- Zachary Fralish
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| |
Collapse
|
4
|
Fralish Z, Skaluba P, Reker D. Leveraging bounded datapoints to classify molecular potency improvements. RSC Med Chem 2024; 15:2474-2482. [PMID: 39026630 PMCID: PMC11253865 DOI: 10.1039/d4md00325j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 05/19/2024] [Indexed: 07/20/2024] Open
Abstract
Molecular machine learning algorithms are becoming increasingly powerful at predicting the potency of potential drug candidates to guide molecular discovery, lead series prioritization, and structural optimization. However, a substantial amount of inhibition data is bounded and inaccessible to traditional regression algorithms. Here, we develop a novel molecular pairing approach to process this data. This creates a new classification task of predicting which one of two paired molecules is more potent. This novel classification task can be accurately solved by various, established molecular machine learning algorithms, including XGBoost and Chemprop. Across 230 ChEMBL IC50 datasets, both tree-based and neural network-based "DeltaClassifiers" show improvements over traditional regression approaches in correctly classifying molecular potency improvements. The Chemprop-based deep DeltaClassifier outperformed all here evaluated regression approaches for paired molecules with shared and with distinct scaffolds, highlighting the promise of this approach for molecular optimization and scaffold-hopping.
Collapse
Affiliation(s)
- Zachary Fralish
- Department of Biomedical Engineering, Duke University Durham NC 27708 USA
| | - Paul Skaluba
- Department of Biomedical Engineering, Duke University Durham NC 27708 USA
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University Durham NC 27708 USA
| |
Collapse
|
5
|
Zhang L, Zhou R, Liu D, Zhu M, Zhang G, Zhang L, Zhou SF, Jiang W. Multi-strategy orthogonal enhancement and analysis of aldo-keto reductase thermal stability. Int J Biol Macromol 2024; 264:130691. [PMID: 38458293 DOI: 10.1016/j.ijbiomac.2024.130691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/26/2024] [Accepted: 03/05/2024] [Indexed: 03/10/2024]
Abstract
Given their outstanding efficiency and selectivity, enzymes are integral in various domains such as drug synthesis, the food industry, and environmental management. However, the inherent instability of natural enzymes limits their widespread industrial application. In this study, we underscore the efficacy of enhancing protein thermal stability through comprehensive protein design strategies, encompassing elements such as the free energy of protein folding, internal forces within proteins, and the overall structural design. We also demonstrate the efficiency and precision of combinatorial screening in the thermal stability design of aldo-keto reductase (AKR7-2-1). In our research, three single-point mutations and five combinatorial mutations were strategically introduced into AKR7-2-1, using multiple computational techniques. Notably, the E12I/S235I mutant showed a significant increase of 25.4 °C in its melting temperature (Tm). Furthermore, the optimal mutant, E12V/S235I, maintained 80 % of its activity while realizing a 16.8 °C elevation in Tm. Remarkably, its half-life at 50 °C was increased to twenty times that of the wild type. Structural analysis indicates that this enhanced thermal stability primarily arises from reduced oscillation in the loop region and increased internal hydrogen bonding. The promising results achieved with AKR7-2-1 demonstrate that our strategy could serve as a valuable reference for enhancing the thermal stability of other industrial enzymes.
Collapse
Affiliation(s)
- Lingzhi Zhang
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China
| | - Rui Zhou
- Shanghai Marine Diesel Engine Research Institute, Shanghai 201108, PR China
| | - Dekai Liu
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China
| | - Meinan Zhu
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China
| | - Guangya Zhang
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China
| | - Lijuan Zhang
- State Key Laboratory of Microbial Technology, Shandong University, No. 72 Binhai Road, Qingdao 266237, PR China
| | - Shu-Feng Zhou
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China.
| | - Wei Jiang
- College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian Province, PR China.
| |
Collapse
|
6
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
7
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
8
|
Fralish Z, Chen A, Skaluba P, Reker D. DeepDelta: predicting ADMET improvements of molecular derivatives with deep learning. J Cheminform 2023; 15:101. [PMID: 37885017 PMCID: PMC10605784 DOI: 10.1186/s13321-023-00769-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
Established molecular machine learning models process individual molecules as inputs to predict their biological, chemical, or physical properties. However, such algorithms require large datasets and have not been optimized to predict property differences between molecules, limiting their ability to learn from smaller datasets and to directly compare the anticipated properties of two molecules. Many drug and material development tasks would benefit from an algorithm that can directly compare two molecules to guide molecular optimization and prioritization, especially for tasks with limited available data. Here, we develop DeepDelta, a pairwise deep learning approach that processes two molecules simultaneously and learns to predict property differences between two molecules from small datasets. On 10 ADMET benchmark tasks, our DeepDelta approach significantly outperforms two established molecular machine learning algorithms, the directed message passing neural network (D-MPNN) ChemProp and Random Forest using radial fingerprints, for 70% of benchmarks in terms of Pearson's r, 60% of benchmarks in terms of mean absolute error (MAE), and all external test sets for both Pearson's r and MAE. We further analyze our performance and find that DeepDelta is particularly outperforming established approaches at predicting large differences in molecular properties and can perform scaffold hopping. Furthermore, we derive mathematically fundamental computational tests of our models based on mathematical invariants and show that compliance to these tests correlates with overall model performance - providing an innovative, unsupervised, and easily computable measure of expected model performance and applicability. Taken together, DeepDelta provides an accurate approach to predict molecular property differences by directly training on molecular pairs and their property differences to further support fidelity and transparency in molecular optimization for drug development and the chemical sciences.
Collapse
Affiliation(s)
- Zachary Fralish
- Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA
| | - Ashley Chen
- Department of Computer Science, Duke University, Durham, NC, 27708, USA
| | - Paul Skaluba
- Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
| |
Collapse
|
9
|
Bustillo L, Laino T, Rodrigues T. The rise of automated curiosity-driven discoveries in chemistry. Chem Sci 2023; 14:10378-10384. [PMID: 37799997 PMCID: PMC10548516 DOI: 10.1039/d3sc03367h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 09/07/2023] [Indexed: 10/07/2023] Open
Abstract
The quest for generating novel chemistry knowledge is critical in scientific advancement, and machine learning (ML) has emerged as an asset in this pursuit. Through interpolation among learned patterns, ML can tackle tasks that were previously deemed demanding to machines. This distinctive capacity of ML provides invaluable aid to bench chemists in their daily work. However, current ML tools are typically designed to prioritize experiments with the highest likelihood of success, i.e., higher predictive confidence. In this perspective, we build on current trends that suggest a future in which ML could be just as beneficial in exploring uncharted search spaces through simulated curiosity. We discuss how low and 'negative' data can catalyse one-/few-shot learning, and how the broader use of curious ML and novelty detection algorithms can propel the next wave of chemical discoveries. We anticipate that ML for curiosity-driven research will help the community overcome potentially biased assumptions and uncover unexpected findings in the chemical sciences at an accelerated pace.
Collapse
Affiliation(s)
- Latimah Bustillo
- Research Institute for Medicines (iMed), Faculdade de Farmácia, Universidade de Lisboa Lisbon Portugal
| | - Teodoro Laino
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Zurich Switzerland
| | - Tiago Rodrigues
- Research Institute for Medicines (iMed), Faculdade de Farmácia, Universidade de Lisboa Lisbon Portugal
| |
Collapse
|
10
|
Efficient prediction of relative ligand binding affinity in drug discovery. NATURE COMPUTATIONAL SCIENCE 2023; 3:829-830. [PMID: 38177771 DOI: 10.1038/s43588-023-00531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
11
|
Yu J, Li Z, Chen G, Kong X, Hu J, Wang D, Cao D, Li Y, Huo R, Wang G, Liu X, Jiang H, Li X, Luo X, Zheng M. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. NATURE COMPUTATIONAL SCIENCE 2023; 3:860-872. [PMID: 38177766 PMCID: PMC10766524 DOI: 10.1038/s43588-023-00529-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/05/2023] [Indexed: 01/06/2024]
Abstract
Structure-based lead optimization is an open challenge in drug discovery, which is still largely driven by hypotheses and depends on the experience of medicinal chemists. Here we propose a pairwise binding comparison network (PBCNet) based on a physics-informed graph attention mechanism, specifically tailored for ranking the relative binding affinity among congeneric ligands. Benchmarking on two held-out sets (provided by Schrödinger and Merck) containing over 460 ligands and 16 targets, PBCNet demonstrated substantial advantages in terms of both prediction accuracy and computational efficiency. Equipped with a fine-tuning operation, the performance of PBCNet reaches that of Schrödinger's FEP+, which is much more computationally intensive and requires substantial expert intervention. A further simulation-based experiment showed that active learning-optimized PBCNet may accelerate lead optimization campaigns by 473%. Finally, for the convenience of users, a web service for PBCNet is established to facilitate complex relative binding affinity prediction through an easy-to-operate graphical interface.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Information Science and Technology, Shanghai Tech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Zhaojun Li
- College of Computer and Information Engineering, Dezhou University, Dezhou City, China
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Geng Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jie Hu
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yanbei Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaohong Liu
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing, Jiangsu, China.
| |
Collapse
|
12
|
Zhang Y, Menke J, He J, Nittinger E, Tyrchan C, Koch O, Zhao H. Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification. J Cheminform 2023; 15:75. [PMID: 37649050 PMCID: PMC10469421 DOI: 10.1186/s13321-023-00744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/10/2023] [Indexed: 09/01/2023] Open
Abstract
Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.
Collapse
Affiliation(s)
- Yumeng Zhang
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Janosch Menke
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany
| | - Hongtao Zhao
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
| |
Collapse
|
13
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|