1
|
Wu Z, Chen J, Li Y, Deng Y, Zhao H, Hsieh CY, Hou T. From Black Boxes to Actionable Insights: A Perspective on Explainable Artificial Intelligence for Scientific Discovery. J Chem Inf Model 2023; 63:7617-7627. [PMID: 38079566 DOI: 10.1021/acs.jcim.3c01642] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
The application of Explainable Artificial Intelligence (XAI) in the field of chemistry has garnered growing interest for its potential to justify the prediction of black-box machine learning models and provide actionable insights. We first survey a range of XAI techniques adapted for chemical applications and categorize them based on the technical details of each methodology. We then present a few case studies to illustrate the practical utility of XAI, such as identifying carcinogenic molecules and guiding molecular optimizations, in order to provide chemists with concrete examples of ways to take full advantage of XAI-augmented machine learning for chemistry. Despite the initial success of XAI in chemistry, we still face the challenges of developing more reliable explanations, assuring robustness against adversarial actions, and customizing the explanation for different applications and needs of the diverse scientific community. Finally, we discuss the emerging role of large language models like GPT in generating natural language explanations and discusses the specific challenges associated with them. We advocate that addressing the aforementioned challenges and actively embracing new techniques may contribute to establishing machine learning as an indispensable technique for chemistry in this digital era.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Jihong Chen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Yitong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Company, Limited, Hangzhou, 310018 Zhejiang, P. R. China
| | - Haitao Zhao
- Center for Intelligent and Biomimetic Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 440305 Guangdong, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, P. R. China
| |
Collapse
|
2
|
Mittal A, Ahuja G. Advancing chemical carcinogenicity prediction modeling: opportunities and challenges. Trends Pharmacol Sci 2023; 44:400-410. [PMID: 37183054 DOI: 10.1016/j.tips.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/18/2023] [Indexed: 05/16/2023]
Abstract
Carcinogenicity assessment of any compound is a laborious and expensive exercise with several associated ethical and practical concerns. While artificial intelligence (AI) offers promising solutions, unfortunately, it is contingent on several challenges concerning the inadequacy of available experimentally validated (non)carcinogen datasets and variabilities within bioassays, which contribute to the compromised model training. Existing AI solutions that leverage classical chemistry-driven descriptors do not provide adequate biological interpretability involved in imparting carcinogenicity. This highlights the urgency to devise alternative AI strategies. We propose multiple strategies, including implementing data-driven (integrated databases) and known carcinogen-characteristic-derived features to overcome these apparent shortcomings. In summary, these next-generation approaches will continue facilitating robust chemical carcinogenicity prediction, concomitant with deeper mechanistic insights.
Collapse
Affiliation(s)
- Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| |
Collapse
|
3
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023; 3:239-258. [PMID: 37334191 PMCID: PMC10273415 DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 01/19/2023]
Abstract
In the last five years, there has been tremendous growth in machine learning and artificial intelligence as applied to polymer science. Here, we highlight the unique challenges presented by polymers and how the field is addressing them. We focus on emerging trends with an emphasis on topics that have received less attention in the review literature. Finally, we provide an outlook for the field, outline important growth areas in machine learning and artificial intelligence for polymer science and discuss important advances from the greater material science community.
Collapse
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
4
|
Wu Z, Wang J, Du H, Jiang D, Kang Y, Li D, Pan P, Deng Y, Cao D, Hsieh CY, Hou T. Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nat Commun 2023; 14:2585. [PMID: 37142585 PMCID: PMC10160109 DOI: 10.1038/s41467-023-38192-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 04/12/2023] [Indexed: 05/06/2023] Open
Abstract
Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, P.R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, P.R. China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| |
Collapse
|
5
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
6
|
The AI system that picks carcinogens out of the chemical crowd. Nature 2022. [DOI: 10.1038/d41586-022-02165-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|