1
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
2
|
Biehn SE, Goncalves LM, Lehmann J, Marty JD, Mueller C, Ramirez SA, Tillier F, Sage CR. BioPrint meets the AI age: development of artificial intelligence-based ADMET models for the drug-discovery platform SAFIRE. Future Med Chem 2024; 16:587-599. [PMID: 38372202 DOI: 10.4155/fmc-2024-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 02/08/2024] [Indexed: 02/20/2024] Open
Abstract
Background: To prioritize compounds with a higher likelihood of success, artificial intelligence models can be used to predict absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of molecules quickly and efficiently. Methods: Models were trained with BioPrint database proprietary data along with public datasets to predict various ADMET end points for the SAFIRE platform. Results: SAFIRE models performed at or above 75% accuracy and 0.4 Matthew's correlation coefficient with validation sets. Training with both proprietary and public data improved model performance and expanded the chemical space on which the models were trained. The platform features scoring functionality to guide user decision-making. Conclusion: High-quality datasets along with chemical space considerations yielded ADMET models performing favorably with utility in the drug discovery process.
Collapse
Affiliation(s)
- Sarah E Biehn
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | | | - Juerg Lehmann
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Jessica D Marty
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Christoph Mueller
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Samuel A Ramirez
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Fabien Tillier
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Carleton R Sage
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| |
Collapse
|
3
|
Shen T, Guo J, Han Z, Zhang G, Liu Q, Si X, Wang D, Wu S, Xia J. AutoMolDesigner for Antibiotic Discovery: An AI-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J Chem Inf Model 2024; 64:575-583. [PMID: 38265916 DOI: 10.1021/acs.jcim.3c01562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Discovery of small-molecule antibiotics with novel chemotypes serves as one of the essential strategies to address antibiotic resistance. Although a considerable number of computational tools committed to molecular design have been reported, there is a deficit in holistic and efficient tools specifically developed for small-molecule antibiotic discovery. To address this issue, we report AutoMolDesigner, a computational modeling software dedicated to small-molecule antibiotic design. It is a generalized framework comprising two functional modules, i.e., generative-deep-learning-enabled molecular generation and automated machine-learning-based antibacterial activity/property prediction, wherein individually trained models and curated datasets are out-of-the-box for whole-cell-based antibiotic screening and design. It is open-source, thus allowing for the incorporation of new features for flexible use. Unlike most software programs based on Linux and command lines, this application equipped with a Qt-based graphical user interface can be run on personal computers with multiple operating systems, making it much easier to use for experimental scientists. The software and related materials are freely available at GitHub (https://github.com/taoshen99/AutoMolDesigner) and Zenodo (https://zenodo.org/record/10097899).
Collapse
Affiliation(s)
- Tao Shen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jiale Guo
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Gao Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Qingxin Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Xinxin Si
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
4
|
Cao H, Peng J, Zhou Z, Yang Z, Wang L, Sun Y, Wang Y, Liang Y. Investigation of the Binding Fraction of PFAS in Human Plasma and Underlying Mechanisms Based on Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17762-17773. [PMID: 36282672 DOI: 10.1021/acs.est.2c04400] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
More than 7000 per- and polyfluorinated alkyl substances (PFAS) have been documented in the U.S. Environmental Protection Agency's CompTox Chemicals database. These PFAS can be used in a broad range of industrial and consumer applications but may pose potential environmental issues and health risks. However, little is known about emerging PFAS bioaccumulation to assess their chemical safety. This study focuses specifically on the large and high-quality data set of fluorochemicals from the related environmental and pharmaceutical chemicals databases, and machine learning (ML) models were developed for the classification prediction of the unbound fraction of compounds in plasma. A comprehensive evaluation of the ML models shows that the best blending model yields an accuracy of 0.901 for the test set. The predictions suggest that most PFAS (∼92%) have a high binding fraction in plasma. Introduction of alkaline amino groups is likely to reduce the binding affinities of PFAS with plasma proteins. Molecular dynamics simulations indicate a clear distinction between the high and low binding fractions of PFAS. These computational workflows can be used to predict the bioaccumulation of emerging PFAS and are also helpful for the molecular design of PFAS to prevent the release of high-bioaccumulation compounds into the environment.
Collapse
Affiliation(s)
- Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Jianhua Peng
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yawei Wang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
5
|
Wu W, Qian J, Liang C, Yang J, Ge G, Zhou Q, Guan X. GeoDILI: A Robust and Interpretable Model for Drug-Induced Liver Injury Prediction Using Graph Neural Network-Based Molecular Geometric Representation. Chem Res Toxicol 2023; 36:1717-1730. [PMID: 37839069 DOI: 10.1021/acs.chemrestox.3c00199] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Drug-induced liver injury (DILI) is a significant cause of drug failure and withdrawal due to liver damage. Accurate prediction of hepatotoxic compounds is crucial for safe drug development. Several DILI prediction models have been published, but they are built on different data sets, making it difficult to compare model performance. Moreover, most existing models are based on molecular fingerprints or descriptors, neglecting molecular geometric properties and lacking interpretability. To address these limitations, we developed GeoDILI, an interpretable graph neural network that uses a molecular geometric representation. First, we utilized a geometry-based pretrained molecular representation and optimized it on the DILI data set to improve predictive performance. Second, we leveraged gradient information to obtain high-precision atomic-level weights and deduce the dominant substructure. We benchmarked GeoDILI against recently published DILI prediction models, as well as popular GNN models and fingerprint-based machine learning models using the same data set, showing superior predictive performance of our proposed model. We applied the interpretable method in the DILI data set and derived seven precise and mechanistically elucidated structural alerts. Overall, GeoDILI provides a promising approach for accurate and interpretable DILI prediction with potential applications in drug discovery and safety assessment. The data and source code are available at GitHub repository (https://github.com/CSU-QJY/GeoDILI).
Collapse
Affiliation(s)
- Wenxuan Wu
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jiayu Qian
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Changjie Liang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jingya Yang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Guangbo Ge
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Qingping Zhou
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| |
Collapse
|
6
|
Khaouane A, Khaouane L, Ferhat S, Hanini S. Deep Learning for Drug Development: Using CNNs in MIA-QSAR to Predict Plasma Protein Binding of Drugs. AAPS PharmSciTech 2023; 24:232. [PMID: 37964128 DOI: 10.1208/s12249-023-02686-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/24/2023] [Indexed: 11/16/2023] Open
Abstract
Predicting plasma protein binding (PPB) is crucial in drug development due to its profound impact on drug efficacy and safety. In our study, we employed a convolutional neural network (CNN) as a tool to extract valuable information from the molecular structures of 100 different drugs. These extracted features were then used as inputs for a feedforward network to predict the PPB of each drug. Through this approach, we successfully obtained 10 specific numerical features from each drug's molecular structure, which represent fundamental aspects of their molecular composition. Leveraging the CNN's ability to capture these features significantly improved the precision of our predictions. Our modeling results revealed impressive accuracy, with an R2 train value of 0.89 for the training dataset, a [Formula: see text] of 0.98, a [Formula: see text] of 0.931 for the external validation dataset, and a low cross-validation mean squared error (CV-MSE) of 0.0213. These metrics highlight the effectiveness of our deep learning techniques in the fields of pharmacokinetics and drug development. This study makes a substantial contribution to the expanding body of research exploring the application of artificial intelligence (AI) and machine learning in drug development. By adeptly capturing and utilizing molecular features, our method holds promise for enhancing drug efficacy and safety assessments in pharmaceutical research. These findings underscore the potential for future investigations in this exciting and transformative field.
Collapse
Affiliation(s)
- Affaf Khaouane
- Laboratory of Biomaterial and Transport Phenomena (LBMPT), University of Médéa, pole urbain, 26000, Médéa, Algeria.
| | - Latifa Khaouane
- Laboratory of Biomaterial and Transport Phenomena (LBMPT), University of Médéa, pole urbain, 26000, Médéa, Algeria
| | - Samira Ferhat
- Laboratory of Biomaterial and Transport Phenomena (LBMPT), University of Médéa, pole urbain, 26000, Médéa, Algeria
| | - Salah Hanini
- Laboratory of Biomaterial and Transport Phenomena (LBMPT), University of Médéa, pole urbain, 26000, Médéa, Algeria
| |
Collapse
|
7
|
Wang Y, Huang M, Deng H, Li W, Wu Z, Tang Y, Liu G. Identification of vital chemical information via visualization of graph neural networks. Brief Bioinform 2023; 24:6936421. [PMID: 36537081 DOI: 10.1093/bib/bbac577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/02/2022] [Accepted: 11/25/2022] [Indexed: 12/24/2022] Open
Abstract
Qualitative or quantitative prediction models of structure-activity relationships based on graph neural networks (GNNs) are prevalent in drug discovery applications and commonly have excellently predictive power. However, the network information flows of GNNs are highly complex and accompanied by poor interpretability. Unfortunately, there are relatively less studies on GNN attributions, and their developments in drug research are still at the early stages. In this work, we adopted several advanced attribution techniques for different GNN frameworks and applied them to explain multiple drug molecule property prediction tasks, enabling the identification and visualization of vital chemical information in the networks. Additionally, we evaluated them quantitatively with attribution metrics such as accuracy, sparsity, fidelity and infidelity, stability and sensitivity; discussed their applicability and limitations; and provided an open-source benchmark platform for researchers. The results showed that all attribution techniques were effective, while those directly related to the predicted labels, such as integrated gradient, preferred to have better attribution performance. These attribution techniques we have implemented could be directly used for the vast majority of chemical GNN interpretation tasks.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Hua Deng
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
8
|
Recent Studies of Artificial Intelligence on In Silico Drug Distribution Prediction. Int J Mol Sci 2023; 24:ijms24031815. [PMID: 36768139 PMCID: PMC9915725 DOI: 10.3390/ijms24031815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/19/2023] Open
Abstract
Drug distribution is an important process in pharmacokinetics because it has the potential to influence both the amount of medicine reaching the active sites and the effectiveness as well as safety of the drug. The main causes of 90% of drug failures in clinical development are lack of efficacy and uncontrolled toxicity. In recent years, several advances and promising developments in drug distribution property prediction have been achieved, especially in silico, which helped to drastically reduce the time and expense of screening undesired drug candidates. In this study, we provide comprehensive knowledge of drug distribution background, influencing factors, and artificial intelligence-based distribution property prediction models from 2019 to the present. Additionally, we gathered and analyzed public databases and datasets commonly utilized by the scientific community for distribution prediction. The distribution property prediction performance of five large ADMET prediction tools is mentioned as a benchmark for future research. On this basis, we also offer future challenges in drug distribution prediction and research directions. We hope that this review will provide researchers with helpful insight into distribution prediction, thus facilitating the development of innovative approaches for drug discovery.
Collapse
|