1
|
Lv Q, Chen G, He H, Yang Z, Zhao L, Chen HY, Chen CYC. TCMBank: bridges between the largest herbal medicines, chemical ingredients, target proteins, and associated diseases with intelligence text mining. Chem Sci 2023; 14:10684-10701. [PMID: 37829020 PMCID: PMC10566508 DOI: 10.1039/d3sc02139d] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/30/2023] [Indexed: 10/14/2023] Open
Abstract
Traditional Chinese Medicine (TCM) has long been viewed as a precious source of modern drug discovery. AI-assisted drug discovery (AIDD) has been investigated extensively. However, there are still two challenges in applying AIDD to guide TCM drug discovery: the lack of a large amount of standardized TCM-related information and AIDD is prone to pathological failures in out-of-domain data. We have released TCM Database@Taiwan in 2011, and it has been widely disseminated and used. Now, we developed TCMBank, the largest systematic free TCM database, which is an extension of TCM Database@Taiwan. TCMBank contains 9192 herbs, 61 966 ingredients (unduplicated), 15 179 targets, 32 529 diseases, and their pairwise relationships. By integrating multiple data sources, TCMBank provides 3D structure information of ingredients and provides a standard list and detailed information on herbs, ingredients, targets and diseases. TCMBank has an intelligent document identification module that continuously adds TCM-related information retrieved from the literature in PubChem. In addition, driven by TCMBank big data, we developed an ensemble learning-based drug discovery protocol for identifying potential leads and drug repurposing. We take colorectal cancer and Alzheimer's disease as examples to demonstrate how to accelerate drug discovery by artificial intelligence. Using TCMBank, researchers can view literature-driven relationship mapping between herbs/ingredients and genes/diseases, allowing the understanding of molecular action mechanisms for ingredients and identification of new potentially effective treatments. TCMBank is available at https://TCMBank.CN/.
Collapse
Affiliation(s)
- Qiujie Lv
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
| | - Guanxing Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
| | - Haohuai He
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
| | - Ziduo Yang
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
| | - Lu Zhao
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou Guangdong 510655 P. R. China
- Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou Guangdong 510655 P. R. China
| | - Hsin-Yi Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University Shenzhen Guangdong 518107 P. R. China
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
- Guangdong L-Med Medicine Biotechnology Co., Ltd Meizhou Guangdong 514699 P. R. China
| |
Collapse
|
2
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
3
|
Zhao X, Sun Y, Zhang R, Chen Z, Hua Y, Zhang P, Guo H, Cui X, Huang X, Li X. Machine Learning Modeling and Insights into the Structural Characteristics of Drug-Induced Neurotoxicity. J Chem Inf Model 2022; 62:6035-6045. [PMID: 36448818 DOI: 10.1021/acs.jcim.2c01131] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Neurotoxicity can be resulted from many diverse clinical drugs, which has been a cause of concern to human populations across the world. The detection of drug-induced neurotoxicity (DINeurot) potential with biological experimental methods always required a lot of budget and time. In addition, few studies have addressed the structural characteristics of neurotoxic chemicals. In this study, we focused on the computational modeling for drug-induced neurotoxicity with machine learning methods and the insights into the structural characteristics of neurotoxic chemicals. Based on the clinical drug data with neurotoxicity effects, we developed 35 different classifiers by combining five different machine learning methods and seven fingerprint packages. The best-performing model achieved good results on both 5-fold cross-validation (balanced accuracy of 76.51%, AUC value of 0.83, and MCC value of 0.52) and external validation (balanced accuracy of 83.63%, AUC value of 0.87, and MCC value of 0.67). The model can be freely accessed on the web server DINeuroTpredictor (http://dineurot.sapredictor.cn/). We also analyzed the distribution of several key molecular properties between neurotoxic and non-neurotoxic structures. The results indicated that several physicochemical properties were significantly different between the neurotoxic and non-neurotoxic compounds, including molecular polar surface area (MPSA), AlogP, the number of hydrogen bond acceptors (nHAcc) and donors (nHDon), the number of rotatable bonds (nRotB), and the number of aromatic rings (nAR). In addition, 18 structural alerts responsible for chemical neurotoxicity were identified. The structural alerts have been integrated with our web server SApredictor (http://www.sapredictor.cn). The results of this study could provide useful information for the understanding of the structural characteristics and computational prediction for chemical neurotoxicity.
Collapse
Affiliation(s)
- Xia Zhao
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Yuhao Sun
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Zhaoyang Chen
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Yuqing Hua
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Pei Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Xueyan Cui
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Xin Huang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, Shandong250014, China
| |
Collapse
|
4
|
Soares TA, Nunes-Alves A, Mazzolari A, Ruggiu F, Wei GW, Merz K. The (Re)-Evolution of Quantitative Structure-Activity Relationship (QSAR) Studies Propelled by the Surge of Machine Learning Methods. J Chem Inf Model 2022; 62:5317-5320. [PMID: 36437763 DOI: 10.1021/acs.jcim.2c01422] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Thereza A Soares
- Department of Chemistry, University of São Paulo, Ribeirão Preto 055508-090, Brazil.,Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo 0315, Norway
| | - Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Berlin 10623, Germany
| | - Angelica Mazzolari
- Department of Pharmaceutical Sciences, University of Milan, Via Mangiagalli 25, Milan I-20133, Italy
| | - Fiorella Ruggiu
- Insitro Inc., 279 East Grand Avenue, South San Francisco 94080, California, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing 48824, Michigan, United States
| | - Kenneth Merz
- Department of Chemistry, Michigan State University, East Lansing 48824, Michigan, United States
| |
Collapse
|
5
|
Sheridan RP. Stability of Prediction in Production ADMET Models as a Function of Version: Why and When Predictions Change. J Chem Inf Model 2022; 62:3477-3485. [PMID: 35849796 DOI: 10.1021/acs.jcim.2c00803] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As with other pharma companies, we maintain production QSAR models of ADMET end points and update them regularly. Here, for six ADMET end points, we examine the predictions of test set molecules on multiple versions of random forest models spanning a period of 10 years. For any given end point, the predictions for the majority of molecules are similar for all model versions. However, for a small minority of molecules, the prediction shifts substantially over the span of a few versions. For most molecules that shift, the prediction becomes more accurate at later times. This Perspective investigates metrics that can help indicate which molecules will shift substantially in prediction and when the shift will occur.
Collapse
Affiliation(s)
- Robert P Sheridan
- Computational and Structural Chemistry, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| |
Collapse
|