1
|
Chen S, Gao N, Li C, Zhai F, Jiang X, Zhang P, Guan J, Li K, Xiang R, Ling G. DrugSK: A Stacked Ensemble Learning Framework for Predicting Drug Combinations of Multiple Diseases. J Chem Inf Model 2024; 64:5317-5327. [PMID: 38900583 DOI: 10.1021/acs.jcim.4c00296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Combination therapy is an important direction of continuous exploration in the field of medicine, with the core goals of improving treatment efficacy, reducing adverse reactions, and optimizing clinical outcomes. Machine learning technology holds great promise in improving the prediction of drug synergy combinations. However, most studies focus on single disease-oriented collaborative predictive models or involve excessive feature categories, making it challenging to predict the majority of new drugs. To address these challenges, the DrugSK comprehensive model was developed, which utilizes SMILES-BERT to extract structural information from 3492 drugs and trains on reactions from 48,756 drug combinations. DrugSK is an integrated learning model capable of predicting interactions among various drug categories. First, the primary learner is trained from the initial data set. Random forest, support vector machine, and XGboost model are selected as primary learners and logistic regression as secondary learners. A new data set is then "generated" to train level 2 learners, which can be thought of as a prediction for each model. Finally, the results are filtered using logistic regression. Furthermore, the combination of the new antibacterial drug Drafloxacin with other antibacterial agents was tested. The synergistic effect of Drafloxacin and Isavuconazonium in the fight against Candida albicans has been confirmed, providing enlightenment for the clinical treatment of skin infection. DrugSK's prediction is accurate in practical application and can also predict the probability of the outcome. In addition, the tendency of Drafloxacin and antifungal drugs to be synergistic was found. The development of DrugSK will provide a new blueprint for predicting drug combination synergies.
Collapse
Affiliation(s)
- Siqi Chen
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Nan Gao
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Chunzhi Li
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Fei Zhai
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Xiwei Jiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Peng Zhang
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Jibin Guan
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Kefeng Li
- Center for Artificial Intelligence-Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SR 999708, China
| | - Rongwu Xiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Liaoning Medical Big Data and Artificial Intelligence Engineering Technology Research Center, Shenyang 110016, China
| | - Guixia Ling
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| |
Collapse
|
2
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
3
|
Sidorov P, Tsuji N. A Primer on 2D Descriptors in Selectivity Modeling for Asymmetric Catalysis. Chemistry 2024; 30:e202302837. [PMID: 38010242 DOI: 10.1002/chem.202302837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
Machine learning has permeated all fields of research, including chemistry, and is now an integral part of the design of novel compounds with desired properties. In the field of asymmetric catalysis, the preference still lies with models based on a physical understanding of the catalysis phenomenon and the electronic and steric properties of catalysts. However, such models require quantum chemical calculations and are thus limited by their computational cost. Here, we highlight the recent advances in modeling catalyst selectivity by using the 2D structures of catalysts and substrates. While these have a less explicit mechanistic connection to the modeled property, 2D descriptors, such as topological indices, molecular fingerprints, and fragments, offer the tremendous advantages of low cost and high speed of calculations. This makes them optimal for the in-silico screening of large amounts of data. We provide an overview of common quantitative structure-property relationship workflow, model building and validation techniques, applications of these methodologies in asymmetric catalysis design, and an outlook on improving the understanding of 2D-based models.
Collapse
Affiliation(s)
- Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| |
Collapse
|
4
|
Guo J. Improving structure-based protein-ligand affinity prediction by graph representation learning and ensemble learning. PLoS One 2024; 19:e0296676. [PMID: 38232063 DOI: 10.1371/journal.pone.0296676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024] Open
Abstract
Predicting protein-ligand binding affinity presents a viable solution for accelerating the discovery of new lead compounds. The recent widespread application of machine learning approaches, especially graph neural networks, has brought new advancements in this field. However, some existing structure-based methods treat protein macromolecules and ligand small molecules in the same way and ignore the data heterogeneity, potentially leading to incomplete exploration of the biochemical information of ligands. In this work, we propose LGN, a graph neural network-based fusion model with extra ligand feature extraction to effectively capture local features and global features within the protein-ligand complex, and make use of interaction fingerprints. By combining the ligand-based features and interaction fingerprints, LGN achieves Pearson correlation coefficients of up to 0.842 on the PDBbind 2016 core set, compared to 0.807 when using the features of complex graphs alone. Finally, we verify the rationalization and generalization of our model through comprehensive experiments. We also compare our model with state-of-the-art baseline methods, which validates the superiority of our model. To reduce the impact of data similarity, we increase the robustness of the model by incorporating ensemble learning.
Collapse
Affiliation(s)
- Jia Guo
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Beijing, P.R. China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
5
|
Zankov D, Madzhidov T, Polishchuk P, Sidorov P, Varnek A. Multi-Instance Learning Approach to the Modeling of Enantioselectivity of Conformationally Flexible Organic Catalysts. J Chem Inf Model 2023; 63:6629-6641. [PMID: 37902548 DOI: 10.1021/acs.jcim.3c00393] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Dmitry Zankov
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg 67081, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| | - Timur Madzhidov
- Chemistry Solutions, Elsevier Ltd., Oxford OX5 1GB, United Kingdom
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Palacký University, Olomouc 77900, Czech Republic
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg 67081, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| |
Collapse
|
6
|
Feature Selection for the Interpretation of Antioxidant Mechanisms in Plant Phenolics. Molecules 2023; 28:molecules28031454. [PMID: 36771125 PMCID: PMC9921549 DOI: 10.3390/molecules28031454] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/16/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023] Open
Abstract
Antioxidants, represented by plant phenolics, protect living tissues by scavenging reactive oxygen species through diverse reaction mechanisms. Research on antioxidants is often individualized, for example, focusing on the evaluation of their activity against a single reactive oxygen species or examining the antioxidant properties of compounds with similar structures. In this study, multivariate analysis was used to comprehensively examine antioxidant properties. Eighteen features were selected to explain the results of the antioxidant capacity tests. These selected features were then evaluated by supervised learning, using the results of the antioxidant capacity assays. Dimension-reduction techniques were also used to represent the compound space with antioxidants as a two-dimensional distribution. A small amount of data obtained from several assays provided us with comprehensive information on the relationships between the structures and activities of antioxidants.
Collapse
|