1
|
van Tilborg D, Grisoni F. Traversing chemical space with active deep learning for low-data drug discovery. NATURE COMPUTATIONAL SCIENCE 2024:10.1038/s43588-024-00697-2. [PMID: 39333789 DOI: 10.1038/s43588-024-00697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 08/22/2024] [Indexed: 09/30/2024]
Abstract
Deep learning is accelerating drug discovery. However, current approaches are often affected by limitations in the available data, in terms of either size or molecular diversity. Active deep learning has high potential for low-data drug discovery, as it allows iterative model improvement during the screening process. However, there are several 'known unknowns' that limit the wider adoption of active deep learning in drug discovery: (1) what the best computational strategies are for chemical space exploration, (2) how active learning holds up to traditional, non-iterative, approaches and (3) how it should be used in the low-data scenarios typical of drug discovery. To provide answers, this study simulates a low-data drug discovery scenario, and systematically analyzes six active learning strategies combined with two deep learning architectures, on three large-scale molecular libraries. We identify the most important determinants of success in low-data regimes and show that active learning can achieve up to a sixfold improvement in hit discovery when compared with traditional screening methods.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
2
|
Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024:1-11. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. AREAS COVERED In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated by gaining more prospective experience. EXPERT OPINION AI models are often developed to excel on retrospective benchmarks unlikely to anticipate their prospective performance. As a result, only a few of these models are ever reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). The authors have discussed what can go wrong in practice with AI for drug discovery. The authors hope that this will help inform the decisions of editors, funders investors, and researchers working in this area.
Collapse
Affiliation(s)
- Ghita Ghislat
- Department of Life Sciences, Imperial College London, London, UK
| | | | | | | |
Collapse
|
3
|
Zhou G, Rusnac DV, Park H, Canzani D, Nguyen HM, Stewart L, Bush MF, Nguyen PT, Wulff H, Yarov-Yarovoy V, Zheng N, DiMaio F. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat Commun 2024; 15:7761. [PMID: 39237523 PMCID: PMC11377542 DOI: 10.1038/s41467-024-52061-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/23/2024] [Indexed: 09/07/2024] Open
Abstract
Structure-based virtual screening is a key tool in early drug discovery, with growing interest in the screening of multi-billion chemical compound libraries. However, the success of virtual screening crucially depends on the accuracy of the binding pose and binding affinity predicted by computational docking. Here we develop a highly accurate structure-based virtual screen method, RosettaVS, for predicting docking poses and binding affinities. Our approach outperforms other state-of-the-art methods on a wide range of benchmarks, partially due to our ability to model receptor flexibility. We incorporate this into a new open-source artificial intelligence accelerated virtual screening platform for drug discovery. Using this platform, we screen multi-billion compound libraries against two unrelated targets, a ubiquitin ligase target KLHDC2 and the human voltage-gated sodium channel NaV1.7. For both targets, we discover hit compounds, including seven hits (14% hit rate) to KLHDC2 and four hits (44% hit rate) to NaV1.7, all with single digit micromolar binding affinities. Screening in both cases is completed in less than seven days. Finally, a high resolution X-ray crystallographic structure validates the predicted docking pose for the KLHDC2 ligand complex, demonstrating the effectiveness of our method in lead discovery.
Collapse
Affiliation(s)
- Guangfeng Zhou
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Domnita-Valeria Rusnac
- Howard Hughes Medical Institute, Department of Pharmacology, University of Washington, Seattle, WA, USA
| | - Hahnbeom Park
- Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
- KIST-SKKU Brain Research Center, SKKU Institute for Convergence, Sungkyunkwan University, Suwon, Republic of Korea
| | - Daniele Canzani
- Department of Chemistry, University of Washington, Seattle, WA, USA
| | - Hai Minh Nguyen
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - Lance Stewart
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Matthew F Bush
- Department of Chemistry, University of Washington, Seattle, WA, USA
| | - Phuong Tran Nguyen
- Department of Physiology and Membrane Biology, University of California Davis, Davis, CA, USA
| | - Heike Wulff
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - Vladimir Yarov-Yarovoy
- Department of Physiology and Membrane Biology, University of California Davis, Davis, CA, USA
- Department of Anesthesiology and Pain Medicine, University of California Davis, Sacramento, CA, USA
| | - Ning Zheng
- Howard Hughes Medical Institute, Department of Pharmacology, University of Washington, Seattle, WA, USA.
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Loeffler HH, Wan S, Klähn M, Bhati AP, Coveney PV. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. J Chem Theory Comput 2024; 20. [PMID: 39225482 PMCID: PMC11428133 DOI: 10.1021/acs.jctc.4c00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Active learning (AL) is a specific instance of sequential experimental design and uses machine learning to intelligently choose the next data point or batch of molecular structures to be evaluated. In this sense, it closely mimics the iterative design-make-test-analysis cycle of laboratory experiments to find optimized compounds for a given design task. Here, we describe an AL protocol which combines generative molecular AI, using REINVENT, and physics-based absolute binding free energy molecular dynamics simulation, using ESMACS, to discover new ligands for two different target proteins, 3CLpro and TNKS2. We have deployed our generative active learning (GAL) protocol on Frontier, the world's only exa-scale machine. We show that the protocol can find higher-scoring molecules compared to the baseline, a surrogate ML docking model for 3CLpro and compounds with experimentally determined binding affinities for TNKS2. The ligands found are also chemically diverse and occupy a different chemical space than the baseline. We vary the batch sizes that are put forward for free energy assessment in each GAL cycle to assess the impact on their efficiency on the GAL protocol and recommend their optimal values in different scenarios. Overall, we demonstrate a powerful capability of the combination of physics-based and AI methods which yields effective chemical space sampling at an unprecedented scale and is of immediate and direct relevance to modern, data-driven drug discovery.
Collapse
Affiliation(s)
- Hannes H. Loeffler
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Marco Klähn
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, Amsterdam 1098XH, The Netherlands
| |
Collapse
|
5
|
Pala D, Clark DE. Caught between a ROCK and a hard place: current challenges in structure-based drug design. Drug Discov Today 2024; 29:104106. [PMID: 39029868 DOI: 10.1016/j.drudis.2024.104106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/27/2024] [Accepted: 07/13/2024] [Indexed: 07/21/2024]
Abstract
The discipline of structure-based drug design (SBDD) is several decades old and it is tempting to think that the proliferation of experimental structures for many drug targets might make computer-aided drug design (CADD) straightforward. However, this is far from true. In this review, we illustrate some of the challenges that CADD scientists face every day in their work, even now. We use Rho-associated protein kinase (ROCK), and public domain structures and data, as an example to illustrate some of the challenges we have experienced during our project targeting this protein. We hope that this will help to prevent unrealistic expectations of what CADD can accomplish and to educate non-CADD scientists regarding the challenges still facing their CADD colleagues.
Collapse
Affiliation(s)
- Daniele Pala
- Medicinal Chemistry and Drug Design Technologies Department, Chiesi Farmaceutici S.p.A, Research Center, Largo Belloli 11/a, 43122 Parma, Italy
| | - David E Clark
- Charles River, 6-9 Spire Green Centre, Flex Meadow, Harlow CM19 5TR, UK.
| |
Collapse
|
6
|
Tom G, Schmid SP, Baird SG, Cao Y, Darvish K, Hao H, Lo S, Pablo-García S, Rajaonson EM, Skreta M, Yoshikawa N, Corapi S, Akkoc GD, Strieth-Kalthoff F, Seifrid M, Aspuru-Guzik A. Self-Driving Laboratories for Chemistry and Materials Science. Chem Rev 2024; 124:9633-9732. [PMID: 39137296 PMCID: PMC11363023 DOI: 10.1021/acs.chemrev.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Self-driving laboratories (SDLs) promise an accelerated application of the scientific method. Through the automation of experimental workflows, along with autonomous experimental planning, SDLs hold the potential to greatly accelerate research in chemistry and materials discovery. This review provides an in-depth analysis of the state-of-the-art in SDL technology, its applications across various scientific disciplines, and the potential implications for research and industry. This review additionally provides an overview of the enabling technologies for SDLs, including their hardware, software, and integration with laboratory infrastructure. Most importantly, this review explores the diverse range of scientific domains where SDLs have made significant contributions, from drug discovery and materials science to genomics and chemistry. We provide a comprehensive review of existing real-world examples of SDLs, their different levels of automation, and the challenges and limitations associated with each domain.
Collapse
Affiliation(s)
- Gary Tom
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Stefan P. Schmid
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland
| | - Sterling G. Baird
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Yang Cao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Kourosh Darvish
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Han Hao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Stanley Lo
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Sergio Pablo-García
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
| | - Ella M. Rajaonson
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Naruki Yoshikawa
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Samantha Corapi
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Gun Deniz Akkoc
- Forschungszentrum
Jülich GmbH, Helmholtz Institute
for Renewable Energy Erlangen-Nürnberg, Cauerstr. 1, 91058 Erlangen, Germany
- Department
of Chemical and Biological Engineering, Friedrich-Alexander Universität Erlangen-Nürnberg, Egerlandstr. 3, 91058 Erlangen, Germany
| | - Felix Strieth-Kalthoff
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- School of
Mathematics and Natural Sciences, University
of Wuppertal, Gaußstraße
20, 42119 Wuppertal, Germany
| | - Martin Seifrid
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Department
of Materials Science and Engineering, North
Carolina State University, Raleigh, North Carolina 27695, United States of America
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science & Engineering, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research (CIFAR), 661
University Ave, Toronto, Ontario M5G 1M1, Canada
| |
Collapse
|
7
|
Crivelli-Decker J, Beckwith Z, Tom G, Le L, Khuttan S, Salomon-Ferrer R, Beall J, Gómez-Bombarelli R, Bortolato A. Machine Learning Guided AQFEP: A Fast and Efficient Absolute Free Energy Perturbation Solution for Virtual Screening. J Chem Theory Comput 2024; 20. [PMID: 39146234 PMCID: PMC11360131 DOI: 10.1021/acs.jctc.4c00399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/25/2024] [Accepted: 07/29/2024] [Indexed: 08/17/2024]
Abstract
Structure-based methods in drug discovery have become an integral part of the modern drug discovery process. The power of virtual screening lies in its ability to rapidly and cost-effectively explore enormous chemical spaces to select promising ligands for further experimental investigation. Relative free energy perturbation (RFEP) and similar methods are the gold standard for binding affinity prediction in drug discovery hit-to-lead and lead optimization phases, but have high computational cost and the requirement of a structural analog with a known activity. Without a reference molecule requirement, absolute FEP (AFEP) has, in theory, better accuracy for hit ID, but in practice, the slow throughput is not compatible with VS, where fast docking and unreliable scoring functions are still the standard. Here, we present an integrated workflow to virtually screen large and diverse chemical libraries efficiently, combining active learning with a physics-based scoring function based on a fast absolute free energy perturbation method. We validated the performance of the approach in the ranking of structurally related ligands, virtual screening hit rate enrichment, and active learning chemical space exploration; disclosing the largest reported collection of free energy simulations to date.
Collapse
Affiliation(s)
| | - Zane Beckwith
- SandboxAQ, Palo Alto, California 94301, United States
| | - Gary Tom
- SandboxAQ, Palo Alto, California 94301, United States
- Department
of Chemistry and Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada
- Vector
Institute for Artificial Intelligence, Toronto, ON M5S
3H6, Canada
| | - Ly Le
- SandboxAQ, Palo Alto, California 94301, United States
| | - Sheenam Khuttan
- SandboxAQ, Palo Alto, California 94301, United States
- Department
of Chemistry, Brooklyn College of the City
University of New York, Brooklyn, New York 11367, United States
| | | | - Jackson Beall
- SandboxAQ, Palo Alto, California 94301, United States
| | - Rafael Gómez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | | |
Collapse
|
8
|
Muegge I, Bentzien J, Ge Y. Perspectives on current approaches to virtual screening in drug discovery. Expert Opin Drug Discov 2024:1-11. [PMID: 39132881 DOI: 10.1080/17460441.2024.2390511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Accepted: 08/06/2024] [Indexed: 08/13/2024]
Abstract
INTRODUCTION For the past two decades, virtual screening (VS) has been an efficient hit finding approach for drug discovery. Today, billions of commercially accessible compounds are routinely screened, and many successful examples of VS have been reported. VS methods continue to evolve, including machine learning and physics-based methods. AREAS COVERED The authors examine recent examples of VS in drug discovery and discuss prospective hit finding results from the critical assessment of computational hit-finding experiments (CACHE) challenge. The authors also highlight the cost considerations and open-source options for conducting VS and examine chemical space coverage and library selections for VS. EXPERT OPINION The advancement of sophisticated VS approaches, including the use of machine learning techniques and increased computer resources as well as the ease of access to synthetically available chemical spaces, and commercial and open-source VS platforms allow for interrogating ultra-large libraries (ULL) of billions of molecules. An impressive number of prospective ULL VS campaigns have generated potent and structurally novel hits across many target classes. Nonetheless, many successful contemporary VS approaches still use considerably smaller focused libraries. This apparent dichotomy illustrates that VS is best conducted in a fit-for-purpose way choosing an appropriate chemical space. Better methods need to be developed to tackle more challenging targets.
Collapse
Affiliation(s)
- Ingo Muegge
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Jörg Bentzien
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Yunhui Ge
- Research department, Alkermes, Inc, Waltham, MA, USA
| |
Collapse
|
9
|
Kashafutdinova IM, Poyezzhayeva A, Gimadiev T, Madzhidov T. Active learning approaches in molecule pKi prediction. Mol Inform 2024:e202400154. [PMID: 39105614 DOI: 10.1002/minf.202400154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 06/21/2024] [Accepted: 06/23/2024] [Indexed: 08/07/2024]
Abstract
During the early stages of drug design, identifying compounds with suitable bioactivities is crucial. Given the vast array of potential drug databases, it's feasible to assay only a limited subset of candidates. The optimal method for selecting the candidates, aiming to minimize the overall number of assays, involves an active learning (AL) approach. In this work, we benchmarked a range of AL strategies with two main objectives: (1) to identify a strategy that ensures high model performance and (2) to select molecules with desired properties using minimal assays. To evaluate the different AL strategies, we employed the simulated AL workflow based on "virtual" experiments. These experiments leveraged ChEMBL datasets, which come with known biological activity values for the molecules. Furthermore, for classification tasks, we proposed the hybrid selection strategy that unified both exploration and exploitation AL strategies into a single acquisition function, defined by parameters n and c. We have also shown that popular minimal margin and maximal variance selection approaches for exploration selection correspond to minimization of the hybrid acquisition function with n=1 and 2 respectively. The balance between the exploration and exploitation strategies can be adjusted using a coefficient (c), making the optimal strategy selection straightforward. The primary strength of the hybrid selection method lies in its adaptability; it offers the flexibility to adjust the criteria for molecule selection based on the specific task by modifying the value of the contribution coefficient. Our analysis revealed that, in regression tasks, AL strategies didn't succeed at ensuring high model performance, however, they were successful in selecting molecules with desired properties using minimal number of tests. In analogous experiments in classification tasks, exploration strategy and the hybrid selection function with a constant c<1 (for n=1) and c≤0.2 (for n=2) were effective in achieving the goal of constructing a high-performance predictive model using minimal data. When searching for molecules with desired properties, exploitation, and the hybrid function with c≥1 (n=1) and c≥0.7 (n=2) demonstrated efficiency identifying molecules in fewer iterations compared to random selection method. Notably, when the hybrid function was set to an intermediate coefficient value (c=0.7), it successfully addressed both tasks simultaneously.
Collapse
Affiliation(s)
- I M Kashafutdinova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, 420008, Russia
| | - A Poyezzhayeva
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, 420008, Russia
| | - T Gimadiev
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, 420008, Russia
| | - T Madzhidov
- Chemistry Solutions, Elsevier, London, EC2Y 5AS, UK
| |
Collapse
|
10
|
Peng S, Rajjou L. Advancing plant biology through deep learning-powered natural language processing. PLANT CELL REPORTS 2024; 43:208. [PMID: 39102077 DOI: 10.1007/s00299-024-03294-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024]
Abstract
The application of deep learning methods, specifically the utilization of Large Language Models (LLMs), in the field of plant biology holds significant promise for generating novel knowledge on plant cell systems. The LLM framework exhibits exceptional potential, particularly with the development of Protein Language Models (PLMs), allowing for in-depth analyses of nucleic acid and protein sequences. This analytical capacity facilitates the discernment of intricate patterns and relationships within biological data, encompassing multi-scale information within DNA or protein sequences. The contribution of PLMs extends beyond mere sequence patterns and structure--function recognition; it also supports advancements in genetic improvements for agriculture. The integration of deep learning approaches into the domain of plant sciences offers opportunities for major breakthroughs in basic research across multi-scale plant traits. Consequently, the strategic application of deep learning methodologies, particularly leveraging the potential of LLMs, will undoubtedly play a pivotal role in advancing plant sciences, plant production, plant uses and propelling the trajectory toward sustainable agroecological and agro-food transitions.
Collapse
Affiliation(s)
- Shuang Peng
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France
| | - Loïc Rajjou
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France.
| |
Collapse
|
11
|
Carlsson J, Luttens A. Structure-based virtual screening of vast chemical space as a starting point for drug discovery. Curr Opin Struct Biol 2024; 87:102829. [PMID: 38848655 DOI: 10.1016/j.sbi.2024.102829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 06/09/2024]
Abstract
Structure-based virtual screening aims to find molecules forming favorable interactions with a biological macromolecule using computational models of complexes. The recent surge of commercially available chemical space provides the opportunity to search for ligands of therapeutic targets among billions of compounds. This review offers a compact overview of structure-based virtual screens of vast chemical spaces, highlighting successful applications in early drug discovery for therapeutically important targets such as G protein-coupled receptors and viral enzymes. Emphasis is placed on strategies to explore ultra-large chemical libraries and synergies with emerging machine learning techniques. The current opportunities and future challenges of virtual screening are discussed, indicating that this approach will play an important role in the next-generation drug discovery pipeline.
Collapse
Affiliation(s)
- Jens Carlsson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, BMC, Box 596, SE-751 24 Uppsala, Sweden.
| | - Andreas Luttens
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
12
|
Chen S, Xie J, Ye R, Xu DD, Yang Y. Structure-aware dual-target drug design through collaborative learning of pharmacophore combination and molecular simulation. Chem Sci 2024; 15:10366-10380. [PMID: 38994407 PMCID: PMC11234869 DOI: 10.1039/d4sc00094c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 06/09/2024] [Indexed: 07/13/2024] Open
Abstract
Dual-target drug design has gained significant attention in the treatment of complex diseases, such as cancers and autoimmune disorders. A widely employed design strategy is combining pharmacophores to leverage the knowledge of structure-activity relationships of both targets. Unfortunately, pharmacophore combination often struggles with long and expensive trial and error, because the protein pockets of the two targets impose complex structural constraints. In this study, we propose AIxFuse, a structure-aware dual-target drug design method that learns pharmacophore fusion patterns to satisfy the dual-target structural constraints simulated by molecular docking. AIxFuse employs two self-play reinforcement learning (RL) agents to learn pharmacophore selection and fusion by comprehensive feedback including dual-target molecular docking scores. Collaboratively, the molecular docking scores are learned by active learning (AL). Through collaborative RL and AL, AIxFuse learns to generate molecules with multiple desired properties. AIxFuse is shown to outperform state-of-the-art methods in generating dual-target drugs against glycogen synthase kinase-3 beta (GSK3β) and c-Jun N-terminal kinase 3 (JNK3). When applied to another task against retinoic acid receptor-related orphan receptor γ-t (RORγt) and dihydroorotate dehydrogenase (DHODH), AIxFuse exhibits consistent performance while compared methods suffer from performance drops, leading to a 5 times higher performance in success rate. Docking studies demonstrate that AIxFuse can generate molecules concurrently satisfying the binding mode required by both targets. Further free energy perturbation calculation indicates that the generated candidates have promising binding free energies against both targets.
Collapse
Affiliation(s)
- Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
- AixplorerBio Inc. Jiaxing 314031 China
| | | | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University Guangzhou 510006 China
| |
Collapse
|
13
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
14
|
Parkhill SL, Johnson EO. Integrating bacterial molecular genetics with chemical biology for renewed antibacterial drug discovery. Biochem J 2024; 481:839-864. [PMID: 38958473 PMCID: PMC11346456 DOI: 10.1042/bcj20220062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/04/2024]
Abstract
The application of dyes to understanding the aetiology of infection inspired antimicrobial chemotherapy and the first wave of antibacterial drugs. The second wave of antibacterial drug discovery was driven by rapid discovery of natural products, now making up 69% of current antibacterial drugs. But now with the most prevalent natural products already discovered, ∼107 new soil-dwelling bacterial species must be screened to discover one new class of natural product. Therefore, instead of a third wave of antibacterial drug discovery, there is now a discovery bottleneck. Unlike natural products which are curated by billions of years of microbial antagonism, the vast synthetic chemical space still requires artificial curation through the therapeutics science of antibacterial drugs - a systematic understanding of how small molecules interact with bacterial physiology, effect desired phenotypes, and benefit the host. Bacterial molecular genetics can elucidate pathogen biology relevant to therapeutics development, but it can also be applied directly to understanding mechanisms and liabilities of new chemical agents with new mechanisms of action. Therefore, the next phase of antibacterial drug discovery could be enabled by integrating chemical expertise with systematic dissection of bacterial infection biology. Facing the ambitious endeavour to find new molecules from nature or new-to-nature which cure bacterial infections, the capabilities furnished by modern chemical biology and molecular genetics can be applied to prospecting for chemical modulators of new targets which circumvent prevalent resistance mechanisms.
Collapse
Affiliation(s)
- Susannah L. Parkhill
- Systems Chemical Biology of Infection and Resistance Laboratory, The Francis Crick Institute, London, U.K
- Faculty of Life Sciences, University College London, London, U.K
| | - Eachan O. Johnson
- Systems Chemical Biology of Infection and Resistance Laboratory, The Francis Crick Institute, London, U.K
- Faculty of Life Sciences, University College London, London, U.K
- Department of Chemistry, Imperial College, London, U.K
- Department of Chemistry, King's College London, London, U.K
| |
Collapse
|
15
|
Kim MA, Ai Q, Norquist AJ, Schrier J, Chan EM. Active Learning of Ligands That Enhance Perovskite Nanocrystal Luminescence. ACS NANO 2024; 18:14514-14522. [PMID: 38776469 DOI: 10.1021/acsnano.4c02094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Ligands play a critical role in the optical properties and chemical stability of colloidal nanocrystals (NCs), but identifying ligands that can enhance NC properties is daunting, given the high dimensionality of chemical space. Here, we use machine learning (ML) and robotic screening to accelerate the discovery of ligands that enhance the photoluminescence quantum yield (PLQY) of CsPbBr3 perovskite NCs. We developed a ML model designed to predict the relative PL enhancement of perovskite NCs when coordinated with a ligand selected from a pool of 29,904 candidate molecules. Ligand candidates were selected using an active learning (AL) approach that accounted for uncertainty quantified by twin regressors. After eight experimental iterations of batch AL (corresponding to 21 initial and 72 model-recommended ligands), the uncertainty of the model decreased, demonstrating an increased confidence in the model predictions. Feature importance and counterfactual analyses of model predictions illustrate the potential use of ligand field strength in designing PL-enhancing ligands. Our versatile AL framework can be readily adapted to screen the effect of ligands on a wide range of colloidal nanomaterials.
Collapse
Affiliation(s)
- Min A Kim
- The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Qianxiang Ai
- Department of Chemistry and Biochemistry, Fordham University, 441 E. Fordham Rd, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, 370 Lancaster Ave, Haverford, Pennsylvania 19041, United States
| | - Joshua Schrier
- Department of Chemistry and Biochemistry, Fordham University, 441 E. Fordham Rd, The Bronx, New York 10458, United States
| | - Emory M Chan
- The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
16
|
Retchin M, Wang Y, Takaba K, Chodera JD. DrugGym: A testbed for the economics of autonomous drug discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596296. [PMID: 38854082 PMCID: PMC11160604 DOI: 10.1101/2024.05.28.596296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
Collapse
Affiliation(s)
- Michael Retchin
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Simons Center for Computational Chemistry and Center for Data Science, New York University, New York, NY 10004
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - John D. Chodera
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
17
|
Cavasotto CN, Di Filippo JI, Scardino V. Lessons learnt from machine learning in early stages of drug discovery. Expert Opin Drug Discov 2024; 19:631-633. [PMID: 38727031 DOI: 10.1080/17460441.2024.2354279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024]
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina
- Facultad de Ciencias Biomédicas, Universidad Austral, Pilar, Buenos Aires, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Argentina
| | - Juan I Di Filippo
- Facultad de Ciencias Biomédicas, Universidad Austral, Pilar, Buenos Aires, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Argentina
- Meton AI, Inc, Wilmington, DE, USA
| | - Valeria Scardino
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Argentina
- Meton AI, Inc, Wilmington, DE, USA
| |
Collapse
|
18
|
Bedart C, Simoben CV, Schapira M. Emerging structure-based computational methods to screen the exploding accessible chemical space. Curr Opin Struct Biol 2024; 86:102812. [PMID: 38603987 DOI: 10.1016/j.sbi.2024.102812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/15/2024] [Accepted: 03/16/2024] [Indexed: 04/13/2024]
Abstract
Structure-based virtual screening can be a valuable approach to computationally select hit candidates based on their predicted interaction with a protein of interest. The recent explosion in the size of chemical libraries increases the chances of hitting high-quality compounds during virtual screening exercises but also poses new challenges as the number of chemically accessible molecules grows faster than the computing power necessary to screen them. We review here two novel approaches rapidly gaining in popularity to address this problem: machine learning-accelerated and synthon-based library screening. We summarize the results from seminal proof-of-concept studies, highlight the latest developments, and discuss limitations and future directions.
Collapse
Affiliation(s)
- Corentin Bedart
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, F-59000, Lille, France
| | - Conrad Veranso Simoben
- Structural Genomics Consortium, University of Toronto, 101 College Street, MaRS South Tower, Suite 700, Toronto, Ontario M5G 1L7, Canada
| | - Matthieu Schapira
- Structural Genomics Consortium, University of Toronto, 101 College Street, MaRS South Tower, Suite 700, Toronto, Ontario M5G 1L7, Canada; Department of Pharmacology and Toxicology, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada.
| |
Collapse
|
19
|
Bande AY, Baday S. Accelerating Molecular Docking using Machine Learning Methods. Mol Inform 2024; 43:e202300167. [PMID: 38850231 DOI: 10.1002/minf.202300167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 04/18/2023] [Indexed: 06/10/2024]
Abstract
Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.
Collapse
Affiliation(s)
- Abdulsalam Y Bande
- Computer Science Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
| | - Sefer Baday
- Computer Science Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
- Applied Informatics Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
- Artificial Intelligence and Data Engineering Department, Faculty of Computer Informatics and Engineering, Istanbul Technical University, Istanbul, 34469, Türkiye
| |
Collapse
|
20
|
Wang L, Zhou Z, Yang X, Shi S, Zeng X, Cao D. The present state and challenges of active learning in drug discovery. Drug Discov Today 2024; 29:103985. [PMID: 38642700 DOI: 10.1016/j.drudis.2024.103985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhenran Zhou
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Xixi Yang
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Shaohua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
| |
Collapse
|
21
|
Zhang X, Shen C, Zhang H, Kang Y, Hsieh CY, Hou T. Advancing Ligand Docking through Deep Learning: Challenges and Prospects in Virtual Screening. Acc Chem Res 2024; 57:1500-1509. [PMID: 38577892 DOI: 10.1021/acs.accounts.4c00093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Molecular docking, also termed ligand docking (LD), is a pivotal element of structure-based virtual screening (SBVS) used to predict the binding conformations and affinities of protein-ligand complexes. Traditional LD methodologies rely on a search and scoring framework, utilizing heuristic algorithms to explore binding conformations and scoring functions to evaluate binding strengths. However, to meet the efficiency demands of SBVS, these algorithms and functions are often simplified, prioritizing speed over accuracy.The emergence of deep learning (DL) has exerted a profound impact on diverse fields, ranging from natural language processing to computer vision and drug discovery. DeepMind's AlphaFold2 has impressively exhibited its ability to accurately predict protein structures solely from amino acid sequences, highlighting the remarkable potential of DL in conformation prediction. This groundbreaking advancement circumvents the traditional search-scoring frameworks in LD, enhancing both accuracy and processing speed and thereby catalyzing a broader adoption of DL algorithms in binding pose prediction. Nevertheless, a consensus on certain aspects remains elusive.In this Account, we delineate the current status of employing DL to augment LD within the VS paradigm, highlighting our contributions to this domain. Furthermore, we discuss the challenges and future prospects, drawing insights from our scholarly investigations. Initially, we present an overview of VS and LD, followed by an introduction to DL paradigms, which deviate significantly from traditional search-scoring frameworks. Subsequently, we delve into the challenges associated with the development of DL-based LD (DLLD), encompassing evaluation metrics, application scenarios, and physical plausibility of the predicted conformations. In the evaluation of LD algorithms, it is essential to recognize the multifaceted nature of the metrics. While the accuracy of binding pose prediction, often measured by the success rate, is a pivotal aspect, the scoring/screening power and computational speed of these algorithms are equally important given the pivotal role of LD tools in VS. Regarding application scenarios, early methods focused on blind docking, where the binding site is unknown. However, recent studies suggest a shift toward identifying binding sites rather than solely predicting binding poses within these models. In contrast, LD with a known pocket in VS has been shown to be more practical. Physical plausibility poses another significant challenge. Although DLLD models often achieve higher success rates compared to traditional methods, they may generate poses with implausible local structures, such as incorrect bond angles or lengths, which are disadvantageous for postprocessing tasks like visualization. Finally, we discuss the future perspectives for DLLD, emphasizing the need to improve generalization ability, strike a balance between speed and accuracy, account for protein conformation flexibility, and enhance physical plausibility. Additionally, we delve into the comparison between generative and regression algorithms in this context, exploring their respective strengths and potential.
Collapse
Affiliation(s)
- Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Haotian Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
22
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
23
|
Marin E, Kovaleva M, Kadukova M, Mustafin K, Khorn P, Rogachev A, Mishin A, Guskov A, Borshchevskiy V. Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking. J Chem Inf Model 2024; 64:2612-2623. [PMID: 38157481 PMCID: PMC11005039 DOI: 10.1021/acs.jcim.3c01661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/28/2023] [Accepted: 12/04/2023] [Indexed: 01/03/2024]
Abstract
Structure-based drug discovery is a process for both hit finding and optimization that relies on a validated three-dimensional model of a target biomolecule, used to rationalize the structure-function relationship for this particular target. An ultralarge virtual screening approach has emerged recently for rapid discovery of high-affinity hit compounds, but it requires substantial computational resources. This study shows that active learning with simple linear regression models can accelerate virtual screening, retrieving up to 90% of the top-1% of the docking hit list after docking just 10% of the ligands. The results demonstrate that it is unnecessary to use complex models, such as deep learning approaches, to predict the imprecise results of ligand docking with a low sampling depth. Furthermore, we explore active learning meta-parameters and find that constant batch size models with a simple ensembling method provide the best ligand retrieval rate. Finally, our approach is validated on the ultralarge size virtual screening data set, retrieving 70% of the top-0.05% of ligands after screening only 2% of the library. Altogether, this work provides a computationally accessible approach for accelerated virtual screening that can serve as a blueprint for the future design of low-compute agents for exploration of the chemical space via large-scale accelerated docking. With recent breakthroughs in protein structure prediction, this method can significantly increase accessibility for the academic community and aid in the rapid discovery of high-affinity hit compounds for various targets.
Collapse
Affiliation(s)
- Egor Marin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Margarita Kovaleva
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Maria Kadukova
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- University
Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Khalid Mustafin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Polina Khorn
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Andrey Rogachev
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| | - Alexey Mishin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Albert Guskov
- Groningen
Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Valentin Borshchevskiy
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| |
Collapse
|
24
|
Gorantla R, Kubincová A, Suutari B, Cossins BP, Mey ASJS. Benchmarking Active Learning Protocols for Ligand-Binding Affinity Prediction. J Chem Inf Model 2024; 64:1955-1965. [PMID: 38446131 PMCID: PMC10966646 DOI: 10.1021/acs.jcim.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/23/2024] [Indexed: 03/07/2024]
Abstract
Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as well as the features of the data sets on the outcomes. We use four affinity data sets for different targets (TYK2, USP7, D2R, Mpro) to systematically evaluate the performance of machine learning models [Gaussian process (GP) model and Chemprop model], sample selection protocols, and the batch size based on metrics describing the overall predictive power of the model (R2, Spearman rank, root-mean-square error) as well as the accurate identification of top 2%/5% binders (Recall, F1 score). Both models have a comparable Recall of top binders on large data sets, but the GP model surpasses the Chemprop model when training data are sparse. A larger initial batch size, especially on diverse data sets, increased the Recall of both models as well as overall correlation metrics. However, for subsequent cycles, smaller batch sizes of 20 or 30 compounds proved to be desirable. Furthermore, adding artificial Gaussian noise to the data up to a certain threshold still allowed the model to identify clusters with top-scoring compounds. However, excessive noise (<1σ) did impact the model's predictive and exploitative capabilities.
Collapse
Affiliation(s)
- Rohan Gorantla
- School
of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
- EaStCHEM
School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, U.K.
- Exscientia, Schrödinger Building, Oxford OX4 4GE, U.K.
| | | | | | | | | |
Collapse
|
25
|
Cao Z, Sciabola S, Wang Y. Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening. J Chem Inf Model 2024; 64:1882-1891. [PMID: 38442000 DOI: 10.1021/acs.jcim.3c01938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
Collapse
Affiliation(s)
- Zhonglin Cao
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Ye Wang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
26
|
Pal S, Nare Z, Rao VA, Smith BO, Morrison I, Fitzgerald EA, Scott A, Bingham MJ, Pesnot T. Accelerating BRPF1b hit identification with BioPhysical and Active Learning Screening (BioPALS). ChemMedChem 2024; 19:e202300590. [PMID: 38372199 DOI: 10.1002/cmdc.202300590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 01/25/2024] [Accepted: 02/12/2024] [Indexed: 02/20/2024]
Abstract
We report the development of BioPhysical and Active Learning Screening (BioPALS); a rapid and versatile hit identification protocol combining AI-powered virtual screening with a GCI-driven biophysical confirmation workflow. Its application to the BRPF1b bromodomain afforded a range of novel micromolar binders with favorable ADMET properties. In addition to the excellent in silico/in vitro confirmation rate demonstrated with BRPF1b, binding kinetics were determined, and binding topologies predicted for all hits. BioPALS is a lean, data-rich, and standardized approach to hit identification applicable to a wide range of biological targets.
Collapse
Affiliation(s)
- Sandeep Pal
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | - Zandile Nare
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | - Vincenzo A Rao
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | - Brian O Smith
- University of Glasgow, School of Molecular Biosciences, College of Medical Veterinary and Life Sciences, G12 8QQ, Glasgow, UK
| | - Ian Morrison
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | | | - Andrew Scott
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | - Matilda J Bingham
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| | - Thomas Pesnot
- Concept Life Sciences, Frith Knoll Road, Chapel-en-le-Frith, SK23 0PG, High Peak, UK
| |
Collapse
|
27
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
28
|
Esterhuizen JA, Mathur A, Goldsmith BR, Linic S. High-Performance Iridium-Molybdenum Oxide Electrocatalysts for Water Oxidation in Acid: Bayesian Optimization Discovery and Experimental Testing. J Am Chem Soc 2024; 146:5511-5522. [PMID: 38373924 DOI: 10.1021/jacs.3c13491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Ir oxides are costly and scarce catalysts for oxygen evolution reaction (OER) in acid. There has been extensive interest in developing alternatives that are either Ir-free or require smaller amounts of Ir to drive the reactions at acceptable rates. One design strategy is to identify Ir-based mixed oxides that achieve similar performance while requiring smaller amounts of Ir. The obstacle to this strategy has been a very large phase space of the Ir-based mixed metal oxides, in terms of the metals combined with Ir and the different crystallographic structures of the mixed oxides, which prevents a thorough exploration of possible materials. In this work, we developed a workflow that uses machine-learning-aided Bayesian optimization in combination with density functional theory to make the exploration of this phase space plausible. This screening identified Mo as a promising dopant for forming acid-tolerant Ir-based oxides for the OER. We synthesized and characterized the Ir-Mo mixed oxides in the form of thin-film electrocatalysts with a known surface area. We show that these mixed oxides exhibited overpotentials ∼30 mV lower than a pure Ir control while maintaining 24% lower Ir dissolution rates than the Ir control. These findings suggest that Mo is a promising dopant and highlight the promise of machine learning to guide the experimental exploration and optimization of catalytic materials.
Collapse
Affiliation(s)
- Jacques A Esterhuizen
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
- Catalysis Science and Technology Institute, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
| | - Aarti Mathur
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
- Catalysis Science and Technology Institute, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
| | - Bryan R Goldsmith
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
- Catalysis Science and Technology Institute, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
| | - Suljo Linic
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
- Catalysis Science and Technology Institute, University of Michigan, Ann Arbor, Michigan 48109-2136, United States
| |
Collapse
|
29
|
Cheng C, Beroza P. Shape-Aware Synthon Search (SASS) for Virtual Screening of Synthon-Based Chemical Spaces. J Chem Inf Model 2024; 64:1251-1260. [PMID: 38335044 DOI: 10.1021/acs.jcim.3c01865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2024]
Abstract
Virtual screening of large-scale chemical libraries has become increasingly useful for identifying high-quality candidates for drug discovery. While it is possible to exhaustively screen chemical spaces that number on the order of billions, indirect combinatorial approaches are needed to efficiently navigate larger, synthon-based virtual spaces. We describe Shape-Aware Synthon Search (SASS), a synthon-based virtual screening method that carries out shape similarity searches in the synthon space instead of the enumerated product space. SASS can replicate results from exhaustive searches in ultralarge, combinatorial spaces with high recall on a variety of query molecules while only scoring a small subspace of possible enumerated products, thereby significantly accelerating large-scale, shape-based virtual screening.
Collapse
Affiliation(s)
- Chen Cheng
- Discovery Chemistry, Genentech, South San Francisco, California 94080, United States
| | - Paul Beroza
- Discovery Chemistry, Genentech, South San Francisco, California 94080, United States
| |
Collapse
|
30
|
Klarich K, Goldman B, Kramer T, Riley P, Walters WP. Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases. J Chem Inf Model 2024; 64:1158-1171. [PMID: 38316125 PMCID: PMC10900287 DOI: 10.1021/acs.jcim.3c01790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 02/07/2024]
Abstract
Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.
Collapse
Affiliation(s)
- Kathryn Klarich
- ReNAgade
Therapeutics, 640 Memorial Drive, Cambridge, Massachusetts 02139, United States
| | - Brian Goldman
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - Trevor Kramer
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - Patrick Riley
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - W. Patrick Walters
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| |
Collapse
|
31
|
Cheng F, Wang F, Tang J, Zhou Y, Fu Z, Zhang P, Haines JL, Leverenz JB, Gan L, Hu J, Rosen-Zvi M, Pieper AA, Cummings J. Artificial intelligence and open science in discovery of disease-modifying medicines for Alzheimer's disease. Cell Rep Med 2024; 5:101379. [PMID: 38382465 PMCID: PMC10897520 DOI: 10.1016/j.xcrm.2023.101379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 08/15/2023] [Accepted: 12/19/2023] [Indexed: 02/23/2024]
Abstract
The high failure rate of clinical trials in Alzheimer's disease (AD) and AD-related dementia (ADRD) is due to a lack of understanding of the pathophysiology of disease, and this deficit may be addressed by applying artificial intelligence (AI) to "big data" to rapidly and effectively expand therapeutic development efforts. Recent accelerations in computing power and availability of big data, including electronic health records and multi-omics profiles, have converged to provide opportunities for scientific discovery and treatment development. Here, we review the potential utility of applying AI approaches to big data for discovery of disease-modifying medicines for AD/ADRD. We illustrate how AI tools can be applied to the AD/ADRD drug development pipeline through collaborative efforts among neurologists, gerontologists, geneticists, pharmacologists, medicinal chemists, and computational scientists. AI and open data science expedite drug discovery and development of disease-modifying therapeutics for AD/ADRD and other neurodegenerative diseases.
Collapse
Affiliation(s)
- Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA.
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Jian Tang
- Mila-Quebec Institute for Learning Algorithms and CIFAR AI Research Chair, HEC Montreal, Montréal, QC H3T 2A7, Canada
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Zhimin Fu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; College of Pharmacy, Northeast Ohio Medical University, Rootstown, OH 44272, USA
| | - Pengyue Zhang
- Department of Biostatistics and Health Data Science, Indiana University, Indianapolis, IN 46037, USA
| | - Jonathan L Haines
- Cleveland Institute for Computational Biology, and Department of Population & Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - James B Leverenz
- Lou Ruvo Center for Brain Health, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Li Gan
- Helen and Robert Appel Alzheimer's Disease Research Institute, Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10021, USA
| | - Jianying Hu
- IBM Research, Yorktown Heights, New York, NY 10598, USA
| | - Michal Rosen-Zvi
- AI for Accelerated Healthcare and Life Sciences Discovery, IBM Research Labs, Haifa 3498825, Israel; Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9190500, Israel
| | - Andrew A Pieper
- Brain Health Medicines Center, Harrington Discovery Institute, University Hospitals Cleveland Medical Center, Cleveland, OH, 44106, USA; Department of Psychiatry, Case Western Reserve University, Cleveland, OH 44106, USA; Geriatric Psychiatry, GRECC, Louis Stokes Cleveland VA Medical Center, Cleveland, OH 44106, USA; Institute for Transformative Molecular Medicine, School of Medicine, Case Western Reserve University, Cleveland OH 44106, USA; Department of Pathology, Case Western Reserve University, School of Medicine, Cleveland, OH, 44106, USA; Department of Neurosciences, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA
| | - Jeffrey Cummings
- Chambers-Grundy Center for Transformative Neuroscience, Department of Brain Health, School of Integrated Health Sciences, UNLV, Las Vegas, NV 89154, USA
| |
Collapse
|
32
|
Nam K, Shao Y, Major DT, Wolf-Watz M. Perspectives on Computational Enzyme Modeling: From Mechanisms to Design and Drug Development. ACS OMEGA 2024; 9:7393-7412. [PMID: 38405524 PMCID: PMC10883025 DOI: 10.1021/acsomega.3c09084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/15/2024] [Accepted: 01/19/2024] [Indexed: 02/27/2024]
Abstract
Understanding enzyme mechanisms is essential for unraveling the complex molecular machinery of life. In this review, we survey the field of computational enzymology, highlighting key principles governing enzyme mechanisms and discussing ongoing challenges and promising advances. Over the years, computer simulations have become indispensable in the study of enzyme mechanisms, with the integration of experimental and computational exploration now established as a holistic approach to gain deep insights into enzymatic catalysis. Numerous studies have demonstrated the power of computer simulations in characterizing reaction pathways, transition states, substrate selectivity, product distribution, and dynamic conformational changes for various enzymes. Nevertheless, significant challenges remain in investigating the mechanisms of complex multistep reactions, large-scale conformational changes, and allosteric regulation. Beyond mechanistic studies, computational enzyme modeling has emerged as an essential tool for computer-aided enzyme design and the rational discovery of covalent drugs for targeted therapies. Overall, enzyme design/engineering and covalent drug development can greatly benefit from our understanding of the detailed mechanisms of enzymes, such as protein dynamics, entropy contributions, and allostery, as revealed by computational studies. Such a convergence of different research approaches is expected to continue, creating synergies in enzyme research. This review, by outlining the ever-expanding field of enzyme research, aims to provide guidance for future research directions and facilitate new developments in this important and evolving field.
Collapse
Affiliation(s)
- Kwangho Nam
- Department
of Chemistry and Biochemistry, University
of Texas at Arlington, Arlington, Texas 76019, United States
| | - Yihan Shao
- Department
of Chemistry and Biochemistry, University
of Oklahoma, Norman, Oklahoma 73019-5251, United States
| | - Dan T. Major
- Department
of Chemistry and Institute for Nanotechnology & Advanced Materials, Bar-Ilan University, Ramat-Gan 52900, Israel
| | | |
Collapse
|
33
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
34
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
35
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
36
|
Priyadarshini MS, Romiluyi O, Wang Y, Miskin K, Ganley C, Clancy P. PAL 2.0: a physics-driven bayesian optimization framework for material discovery. MATERIALS HORIZONS 2024; 11:781-791. [PMID: 37997168 DOI: 10.1039/d3mh01474f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
The lack of efficient discovery tools for advanced functional materials remains a major bottleneck to enabling advances in the next-generation energy, health, and sustainability technologies. One main factor contributing to this inefficiency is the large combinatorial space of materials (with respect to material compositions and processing conditions) that is typically redolent of such materials-centric applications. Searches of this large combinatorial space are often influenced by expert knowledge and clustered close to material configurations that are known to perform well, thus ignoring potentially high-performing candidates in unanticipated regions of the composition-space or processing protocol. Moreover, experimental characterization or first principles quantum mechanical calculations of all possible material candidates can be prohibitively expensive, making exhaustive approaches to determine the best candidates infeasible. As a result, there remains a need for the development of computational algorithms that can efficiently search a large parameter space for a given material application. Here, we introduce PAL 2.0, a method that combines a physics-based surrogate model with Bayesian optimization. The key contributing factor of our proposed framework is the ability to create a physics-based hypothesis using XGBoost and Neural Networks. This hypothesis provides a physics-based "prior" (or initial beliefs) to a Gaussian process model, which is then used to perform a search of the material design space. In this paper, we demonstrate the usefulness of our approach on three material test cases: (1) discovery of metal halide perovskites with desired photovoltaic properties, (2) design of metal halide perovskite-solvent pairs that produce the best solution-processed films and (3) design of organic thermoelectric semiconductors. Our results indicate that the novel PAL 2.0 approach outperforms other state-of-the-art methods in its efficiency to search the material design space for the optimal candidate. We also demonstrate the physics-based surrogate models constructed in PAL 2.0 have lower prediction errors for material compositions not seen by the model. To the best of our knowledge, there is no competing algorithm capable of this useful combination for materials discovery, especially those for which data are scarce.
Collapse
Affiliation(s)
- Maitreyee Sharma Priyadarshini
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA.
| | - Oluwaseun Romiluyi
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA.
| | - Yiran Wang
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA.
| | - Kumar Miskin
- Department of Materials Science and Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA
| | - Connor Ganley
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA.
| | - Paulette Clancy
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, 21218, Maryland, USA.
| |
Collapse
|
37
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
38
|
Cai H, Shen C, Jian T, Zhang X, Chen T, Han X, Yang Z, Dang W, Hsieh CY, Kang Y, Pan P, Ji X, Song J, Hou T, Deng Y. CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training. Chem Sci 2024; 15:1449-1471. [PMID: 38274053 PMCID: PMC10806797 DOI: 10.1039/d3sc05552c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
The expertise accumulated in deep neural network-based structure prediction has been widely transferred to the field of protein-ligand binding pose prediction, thus leading to the emergence of a variety of deep learning-guided docking models for predicting protein-ligand binding poses without relying on heavy sampling. However, their prediction accuracy and applicability are still far from satisfactory, partially due to the lack of protein-ligand binding complex data. To this end, we create a large-scale complex dataset containing ∼9 M protein-ligand docking complexes for pre-training, and propose CarsiDock, the first deep learning-guided docking approach that leverages pre-training of millions of predicted protein-ligand complexes. CarsiDock contains two main stages, i.e., a deep learning model for the prediction of protein-ligand atomic distance matrices, and a translation, rotation and torsion-guided geometry optimization procedure to reconstruct the matrices into a credible binding pose. The pre-training and multiple innovative architectural designs facilitate the dramatically improved docking accuracy of our approach over the baselines in terms of multiple docking scenarios, thereby contributing to its outstanding early recognition performance in several retrospective virtual screening campaigns. Further explorations demonstrate that CarsiDock can not only guarantee the topological reliability of the binding poses but also successfully reproduce the crucial interactions in crystalized structures, highlighting its superior applicability.
Collapse
Affiliation(s)
- Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xiaoqi Han
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Zhuo Yang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Wei Dang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| |
Collapse
|
39
|
Chakrabarti M, Tan YS, Balius TE. Considerations Around Structure-Based Drug Discovery for KRAS Using DOCK. Methods Mol Biol 2024; 2797:67-90. [PMID: 38570453 DOI: 10.1007/978-1-0716-3822-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
Molecular docking is a popular computational tool in drug discovery. Leveraging structural information, docking software predicts binding poses of small molecules to cavities on the surfaces of proteins. Virtual screening for ligand discovery is a useful application of docking software. In this chapter, using the enigmatic KRAS protein as an example system, we endeavor to teach the reader about best practices for performing molecular docking with UCSF DOCK. We discuss methods for virtual screening and docking molecules on KRAS. We present the following six points to optimize our docking setup for prosecuting a virtual screen: protein structure choice, pocket selection, optimization of the scoring function, modification of sampling spheres and sampling procedures, choosing an appropriate portion of chemical space to dock, and the choice of which top scoring molecules to pick for purchase.
Collapse
Affiliation(s)
- Mayukh Chakrabarti
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Y Stanley Tan
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Trent E Balius
- NCI RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
| |
Collapse
|
40
|
Colliandre L, Muller C. Bayesian Optimization in Drug Discovery. Methods Mol Biol 2024; 2716:101-136. [PMID: 37702937 DOI: 10.1007/978-1-0716-3449-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Drug discovery deals with the search for initial hits and their optimization toward a targeted clinical profile. Throughout the discovery pipeline, the candidate profile will evolve, but the optimization will mainly stay a trial-and-error approach. Tons of in silico methods have been developed to improve and fasten this pipeline. Bayesian optimization (BO) is a well-known method for the determination of the global optimum of a function. In the last decade, BO has gained popularity in the early drug design phase. This chapter starts with the concept of black box optimization applied to drug design and presents some approaches to tackle it. Then it focuses on BO and explains its principle and all the algorithmic building blocks needed to implement it. This explanation aims to be accessible to people involved in drug discovery projects. A strong emphasis is made on the solutions to deal with the specific constraints of drug discovery. Finally, a large set of practical applications of BO is highlighted.
Collapse
|
41
|
Rasmussen MH, Duan C, Kulik HJ, Jensen JH. Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. J Cheminform 2023; 15:121. [PMID: 38111020 PMCID: PMC10729461 DOI: 10.1186/s13321-023-00790-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/28/2023] [Indexed: 12/20/2023] Open
Abstract
With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman's rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman's rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman's rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).
Collapse
Affiliation(s)
- Maria H Rasmussen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, USA
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
42
|
Moesgaard L, Pedersen ML, Uhd Nielsen C, Kongsted J. Structure-based discovery of novel P-glycoprotein inhibitors targeting the nucleotide binding domains. Sci Rep 2023; 13:21217. [PMID: 38040777 PMCID: PMC10692163 DOI: 10.1038/s41598-023-48281-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 11/24/2023] [Indexed: 12/03/2023] Open
Abstract
P-glycoprotein (P-gp), a membrane transport protein overexpressed in certain drug-resistant cancer cells, has been the target of numerous drug discovery projects aimed at overcoming drug resistance in cancer. Most characterized P-gp inhibitors bind at the large hydrophobic drug binding domain (DBD), but none have yet attained regulatory approval. In this study, we explored the potential of designing inhibitors that target the nucleotide binding domains (NBDs), by computationally screening a large library of 2.6 billion synthesizable molecules, using a combination of machine learning-guided molecular docking and molecular dynamics (MD). 14 of the computationally best-scoring molecules were subsequently tested for their ability to inhibit P-gp mediated calcein-AM efflux. In total, five diverse compounds exhibited inhibitory effects in the calcein-AM assay without displaying toxicity. The activity of these compounds was confirmed by their ability to decrease the verapamil-stimulated ATPase activity of P-gp in a subsequent assay. The discovery of these five novel P-gp inhibitors demonstrates the potential of in-silico screening in drug discovery and provides a new stepping point towards future potent P-gp inhibitors.
Collapse
Affiliation(s)
- Laust Moesgaard
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark.
| | - Maria L Pedersen
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| | - Carsten Uhd Nielsen
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| | - Jacob Kongsted
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| |
Collapse
|
43
|
Viswanathan K, Goel M, Laghuvarapu S, Varma G, Priyakumar UD. Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines. Sci Rep 2023; 13:21069. [PMID: 38030689 PMCID: PMC10686981 DOI: 10.1038/s41598-023-42952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/16/2023] [Indexed: 12/01/2023] Open
Abstract
The discovery of potential therapeutic agents for life-threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug-like molecules that can be used as potential candidates for novel targets. Existing techniques like high-throughput screening and virtual screening are time-consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time-consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre-trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning-based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with [Formula: see text]a 70% improvement in runtime compared to the baseline model by labeling only [Formula: see text]30% of molecules compared to the baseline oracle.
Collapse
Affiliation(s)
- Karthik Viswanathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Manan Goel
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Girish Varma
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
44
|
Yang L, Liang X, Zhang N, Lu L. STAR: A Web Server for Assisting Directed Protein Evolution with Machine Learning. ACS OMEGA 2023; 8:44751-44756. [PMID: 38046324 PMCID: PMC10688154 DOI: 10.1021/acsomega.3c04832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 12/05/2023]
Abstract
Protein engineering has made significant contributions to industries such as agriculture, food, and pharmaceuticals. In recent years, directed evolution combined with artificial intelligence has emerged as a cutting-edge R&D approach. However, the application of machine learning techniques can be challenging for those without relevant experience and coding skills. To address this issue, we have developed a web-based protein sequence recommendation system: STAR (Sequence recommendaTion via ARtificial intelligence). Our system utilizes Bayesian optimization as its backbone and includes a filtering step using a regression model to enhance the success rate of recommended sequences. Additionally, we have incorporated an in silico-directed evolution approach to expand the exploration of the protein space. The Web site can be accessed at https://www.FindProteinStar.com/.
Collapse
Affiliation(s)
- Likun Yang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Xiaoli Liang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Na Zhang
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| | - Lu Lu
- Asymchem Life Science (Tianjin) Co.,
Ltd, Tianjin 300457, P. R. China
| |
Collapse
|
45
|
Xiang Y, Tang YH, Gong Z, Liu H, Wu L, Lin G, Sun H. Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules. J Chem Inf Model 2023; 63:6515-6524. [PMID: 37857374 DOI: 10.1021/acs.jcim.3c01430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R2 > 0.99 against the computational data and R2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
Collapse
Affiliation(s)
- Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Zheng Gong
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyi Liu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liang Wu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Huai Sun
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
46
|
Shiammala PN, Duraimutharasan NKB, Vaseeharan B, Alothaim AS, Al-Malki ES, Snekaa B, Safi SZ, Singh SK, Velmurugan D, Selvaraj C. Exploring the artificial intelligence and machine learning models in the context of drug design difficulties and future potential for the pharmaceutical sectors. Methods 2023; 219:82-94. [PMID: 37778659 DOI: 10.1016/j.ymeth.2023.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/21/2023] [Accepted: 09/25/2023] [Indexed: 10/03/2023] Open
Abstract
Artificial intelligence (AI), particularly deep learning as a subcategory of AI, provides opportunities to accelerate and improve the process of discovering and developing new drugs. The use of AI in drug discovery is still in its early stages, but it has the potential to revolutionize the way new drugs are discovered and developed. As AI technology continues to evolve, it is likely that AI will play an even greater role in the future of drug discovery. AI is used to identify new drug targets, design new molecules, and predict the efficacy and safety of potential drugs. The inclusion of AI in drug discovery can screen millions of compounds in a matter of hours, identifying potential drug candidates that would have taken years to find using traditional methods. AI is highly utilized in the pharmaceutical industry by optimizing processes, reducing waste, and ensuring quality control. This review covers much-needed topics, including the different types of machine-learning techniques, their applications in drug discovery, and the challenges and limitations of using machine learning in this field. The state-of-the-art of AI-assisted pharmaceutical discovery is described, covering applications in structure and ligand-based virtual screening, de novo drug creation, prediction of physicochemical and pharmacokinetic properties, drug repurposing, and related topics. Finally, many obstacles and limits of present approaches are outlined, with an eye on potential future avenues for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
| | | | - Baskaralingam Vaseeharan
- Department of Animal Health and Management, Science Block, Alagappa University, Karaikudi, Tamil Nadu 630 003, India
| | - Abdulaziz S Alothaim
- Department of Biology, College of Science in Zulfi, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Esam S Al-Malki
- Department of Biology, College of Science in Zulfi, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Babu Snekaa
- Laboratory for Artificial Intelligence and Molecular Modelling, Department of Pharmacology, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, Tamil Nadu 600077, India
| | - Sher Zaman Safi
- Faculty of Medicine, Bioscience and Nursing, MAHSA University, Jenjarom 42610, Selangor, Malaysia
| | - Sanjeev Kumar Singh
- Computer Aided Drug Design and Molecular Modelling Lab, Department of Bioinformatics, Science Block, Alagappa University, Karaikudi-630 003, Tamil Nadu, India
| | - Devadasan Velmurugan
- Department of Biotechnology, College of Engineering & Technology, SRM Institute of Science & Technology, Kattankulathur, Chennai, Tamil Nadu 603203, India
| | - Chandrabose Selvaraj
- Laboratory for Artificial Intelligence and Molecular Modelling, Department of Pharmacology, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, Tamil Nadu 600077, India; Laboratory for Artificial Intelligence and Molecular Modelling, Center for Global Health Research, Saveetha Medical College, Saveetha Institute of Medical and Technical Sciences, Saveetha Nagar, Thandalam, Chennai, Tamil Nadu 602105, India.
| |
Collapse
|
47
|
Casetti N, Alfonso-Ramos JE, Coley CW, Stuyver T. Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery. Chemistry 2023; 29:e202301957. [PMID: 37526059 DOI: 10.1002/chem.202301957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/02/2023]
Abstract
Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.
Collapse
Affiliation(s)
- Nicholas Casetti
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| |
Collapse
|
48
|
Li CH, Tabor DP. Generative organic electronic molecular design informed by quantum chemistry. Chem Sci 2023; 14:11045-11055. [PMID: 37860647 PMCID: PMC10583709 DOI: 10.1039/d3sc03781a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 09/11/2023] [Indexed: 10/21/2023] Open
Abstract
Generative molecular design strategies have emerged as promising alternatives to trial-and-error approaches for exploring and optimizing within large chemical spaces. To date, generative models with reinforcement learning approaches have frequently used low-cost methods to evaluate the quality of the generated molecules, enabling many loops through the generative model. However, for functional molecular materials tasks, such low-cost methods are either not available or would require the generation of large amounts of training data to train surrogate machine learning models. In this work, we develop a framework that connects the REINVENT reinforcement learning framework with excited state quantum chemistry calculations to discover molecules with specified molecular excited state energy levels, specifically molecules with excited state landscapes that would serve as promising singlet fission or triplet-triplet annihilation materials. We employ a two-step curriculum strategy to first find a set of diverse promising molecules, then demonstrate the framework's ability to exploit a more focused chemical space with anthracene derivatives. Under this protocol, we show that the framework can find desired molecules and improve Pareto fronts for targeted properties versus synthesizability. Moreover, we are able to find several different design principles used by chemists for the design of singlet fission and triplet-triplet annihilation molecules.
Collapse
Affiliation(s)
- Cheng-Han Li
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| | - Daniel P Tabor
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| |
Collapse
|
49
|
Sivula T, Yetukuri L, Kalliokoski T, Käsnänen H, Poso A, Pöhner I. Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries. J Chem Inf Model 2023; 63:5773-5783. [PMID: 37655823 PMCID: PMC10523430 DOI: 10.1021/acs.jcim.3c01239] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Indexed: 09/02/2023]
Abstract
The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.
Collapse
Affiliation(s)
- Toni Sivula
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| | | | - Tuomo Kalliokoski
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Heikki Käsnänen
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Antti Poso
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
- Department
of Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls University, Tübingen DE-72076, Germany
- Cluster
of Excellence iFIT (EXC 2180) “Image-Guided and Functionally
Instructed Tumor Therapies”, University
of Tübingen, Tübingen DE-72076, Germany
- Tübingen
Center for Academic Drug Discovery & Development (TüCAD2), Tübingen DE-72076, Germany
| | - Ina Pöhner
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| |
Collapse
|
50
|
Alnammi M, Liu S, Ericksen SS, Ananiev GE, Voter AF, Guo S, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries. J Chem Inf Model 2023; 63:5513-5528. [PMID: 37625010 PMCID: PMC10538940 DOI: 10.1021/acs.jcim.3c00912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Indexed: 08/27/2023]
Abstract
Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC50 value of 1.3 μM.
Collapse
Affiliation(s)
- Moayad Alnammi
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Information and Computer Science, King
Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
| | - Shengchao Liu
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
| | - Spencer S. Ericksen
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Gene E. Ananiev
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Andrew F. Voter
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - Song Guo
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - James L. Keck
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - F. Michael Hoffmann
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
- McArdle Laboratory
for Cancer Research, University of Wisconsin−Madison, Madison, Wisconsin 53705, United States
| | - Scott A. Wildman
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Anthony Gitter
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Biostatistics and Medical Informatics, University of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| |
Collapse
|