1
|
Wang L, Zhou Z, Yang X, Shi S, Zeng X, Cao D. The present state and challenges of active learning in drug discovery. Drug Discov Today 2024; 29:103985. [PMID: 38642700 DOI: 10.1016/j.drudis.2024.103985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/22/2024]
Abstract
Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.
Collapse
Affiliation(s)
- Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China
| | - Zhenran Zhou
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Xixi Yang
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China
| | - Shaohua Shi
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410082, Hunan, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
| |
Collapse
|
2
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
3
|
McNair D. Artificial Intelligence and Machine Learning for Lead-to-Candidate Decision-Making and Beyond. Annu Rev Pharmacol Toxicol 2023; 63:77-97. [PMID: 35679624 DOI: 10.1146/annurev-pharmtox-051921-023255] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The use of artificial intelligence (AI) and machine learning (ML) in pharmaceutical research and development has to date focused on research: target identification; docking-, fragment-, and motif-based generation of compound libraries; modeling of synthesis feasibility; rank-ordering likely hits according to structural and chemometric similarity to compounds having known activity and affinity to the target(s); optimizing a smaller library for synthesis and high-throughput screening; and combining evidence from screening to support hit-to-lead decisions. Applying AI/ML methods to lead optimization and lead-to-candidate (L2C) decision-making has shown slower progress, especially regarding predicting absorption, distribution, metabolism, excretion, and toxicology properties. The present review surveys reasons why this is so, reports progress that has occurred in recent years, and summarizes some of the issues that remain. Effective AI/ML tools to derisk L2C and later phases of development are important to accelerate the pharmaceutical development process, ameliorate escalating development costs, and achieve greater success rates.
Collapse
Affiliation(s)
- Douglas McNair
- Global Health, Integrated Development, Bill & Melinda Gates Foundation, Seattle, Washington, USA;
| |
Collapse
|
4
|
Ding X, Cui R, Yu J, Liu T, Zhu T, Wang D, Chang J, Fan Z, Liu X, Chen K, Jiang H, Li X, Luo X, Zheng M. Active Learning for Drug Design: A Case Study on the Plasma Exposure of Orally Administered Drugs. J Med Chem 2021; 64:16838-16853. [PMID: 34779199 DOI: 10.1021/acs.jmedchem.1c01683] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The success of artificial intelligence (AI) models has been limited by the requirement of large amounts of high-quality training data, which is just the opposite of the situation in most drug discovery pipelines. Active learning (AL) is a subfield of AI that focuses on algorithms that select the data they need to improve their models. Here, we propose a two-phase AL pipeline and apply it to the prediction of drug oral plasma exposure. In phase I, the AL-based model demonstrated a remarkable capability to sample informative data from a noisy data set, which used only 30% of the training data to yield a prediction capability with an accuracy of 0.856 on an independent test set. In phase II, the AL-based model explored a large diverse chemical space (855K samples) for experimental testing and feedback. Improved accuracy and new highly confident predictions (50K samples) were observed, which suggest that the model's applicability domain has been significantly expanded.
Collapse
Affiliation(s)
- Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Rongrong Cui
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Tiantian Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Tingfei Zhu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Jie Chang
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Zisheng Fan
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Xiaomeng Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China.,School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China.,School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China.,School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China.,School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| |
Collapse
|
6
|
Battiti FO, Newman AH, Bonifazi A. Exception That Proves the Rule: Investigation of Privileged Stereochemistry in Designing Dopamine D 3R Bitopic Agonists. ACS Med Chem Lett 2020; 11:1956-1964. [PMID: 33062179 DOI: 10.1021/acsmedchemlett.9b00660] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 02/28/2020] [Indexed: 01/11/2023] Open
Abstract
In this study, starting from our selective D3R agonist FOB02-04A (5), we investigated the chemical space around the linker portion of the molecule via insertion of a hydroxyl substituent and ring-expansion of the trans-cyclopropyl moiety into a trans-cyclohexyl scaffold. Moreover, to further elucidate the importance of the primary pharmacophore stereochemistry in the design of bitopic ligands, we investigated the chiral requirements of (+)-PD128907 ((+)-(4a R ,10b R )-2)) by synthesizing and resolving bitopic analogues in all the cis and trans combinations of its 9-methoxy-3,4,4a,10b-tetrahydro-2H,5H-chromeno[4,3-b][1,4] oxazine scaffold. Despite the lack of success in obtaining new analogues with improved biological profiles, in comparison to our current leads, a "negative" result due to a poor or simply not improved biological profile is fundamental toward better understanding chemical space and optimal stereochemistry for target recognition. Herein, we identified essential structural information to understand the differences between orthosteric and bitopic ligand-receptor binding interactions, discriminate D3R active and inactive states, and assist multitarget receptor recognition. Exploring stereochemical complexity and developing extended D3R SAR from this new library complements previously described SAR and inspires future structural and computational biology investigation. Moreover, the expansion of chemical space characterization for D3R agonism may be utilized in machine learning and artificial intelligence (AI)-based drug design, in the future.
Collapse
Affiliation(s)
- Francisco O. Battiti
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse—Intramural Research Program, National Institutes of Health, 333 Cassell Drive, Baltimore, Maryland 21224, United States
| | - Amy Hauck Newman
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse—Intramural Research Program, National Institutes of Health, 333 Cassell Drive, Baltimore, Maryland 21224, United States
| | - Alessandro Bonifazi
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse—Intramural Research Program, National Institutes of Health, 333 Cassell Drive, Baltimore, Maryland 21224, United States
| |
Collapse
|
7
|
Nakano T, Takeda S, Brown JB. Active learning effectively identifies a minimal set of maximally informative and asymptotically performant cytotoxic structure-activity patterns in NCI-60 cell lines. RSC Med Chem 2020; 11:1075-1087. [PMID: 33479700 PMCID: PMC7513593 DOI: 10.1039/d0md00110d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 06/30/2020] [Indexed: 11/21/2022] Open
Abstract
The NCI-60 cancer cell line screening panel has provided insights for development of subtype-specific chemical therapies and repurposing. By extracting chemical structure and cytotoxicity patterns, virtual screening potentially complements the availability of high-throughput assay platforms and improves bioactive compound discovery rates by computational prefiltering of candidate compound libraries. Many groups report high prediction performances in computational models of NCI-60 data when using cross-validation or similar techniques, yet prospective therapy development in novel cancers may have little to no such data and further may not have the resources to perform hit identification using large compound libraries. In contrast to bulk screening and analysis, the active learning methodology has demonstrated how to identify compounds for screening in small batches and update computational models iteratively, leading to predictive models with a minimum number of compounds, and importantly clarifying data volumes at which limits in predictive ability are achieved. Here, in replicate per-cell line experiments using 50% of data (∼20 000 compounds) as the external prediction target, predictive limits are reproducibly demonstrated at the stage of systematic selection of 10-30% of the incorporable half. The pattern was consistent across all 60 cell lines. Limits of predictability are found to be correlated to the doubling times of cell lines and the number of cellular response discontinuities (activity cliffs) present per cell line. Organization into chemical scaffolds delineated degrees of predictive challenge. These results provide key insights for strategies in developing new inhibitors in existing cell lines or for future automated therapy selection in personalized oncotherapy.
Collapse
Affiliation(s)
- Takumi Nakano
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| | - Shunichi Takeda
- Kyoto University Graduate School of Medicine , Department of Radiation Genetics , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan
| | - J B Brown
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| |
Collapse
|