1
|
Tsutsui Y, Yanaka I, Takeda K, Kondo M, Takizawa S, Kojima R, Konishi A, Yasuda M. Selective recognition between aromatics and aliphatics by cage-shaped borates supported by a machine learning approach. Org Biomol Chem 2024; 22:4283-4291. [PMID: 38602393 DOI: 10.1039/d4ob00408f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Selective recognition between hydrocarbon moieties is a longstanding issue. Although we developed a π-pocket Lewis acid catalyst with high selectivity for aromatic aldehydes over aliphatic ones, a general strategy for catalyst design remains elusive. As an approach that transfers the molecular recognition based on multiple cooperative non-covalent interactions within the π-pocket to a rational catalyst design, herein, we demonstrate Lewis acid catalysts showing improved selectivity through the support of an ensemble algorithm with random forest, Ada Boost, and XG Boost as a machine learning (ML) approach. Using 7963 explanatory variables extracted from model hetero-Diels-Alder reactions, the ensemble algorithm predicted the chemoselectivity of unlearned catalysts. Experiments confirmed the prediction. The proposed catalyst shows the highest selective recognition, reminiscing enzymatic catalytic activity. Additionally, a SHapley Additive exPlanations (SHAP) method suggested that the selectivity originates from the polarizability and three-dimensional size of the catalyst. This insight leads to rational design guidelines for Lewis acid catalysts with dispersion forces.
Collapse
Affiliation(s)
- Yuya Tsutsui
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
| | - Issei Yanaka
- Department of Engineering, Graduate School of Integrated Science and Technology, Shizuoka University, Hamamatsu, 432-8561, Japan.
| | - Kazuhiro Takeda
- Department of Engineering, Graduate School of Integrated Science and Technology, Shizuoka University, Hamamatsu, 432-8561, Japan.
| | - Masaru Kondo
- School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | | | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Sakyo-ku, 606-8507, Japan
| | - Akihito Konishi
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
- Innovative Catalysis Science Division, Institute for Open and Transdisciplinary Research Initiatives (ICS-OTRI), Osaka University, Suita, 565-0871, Japan
| | - Makoto Yasuda
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, 565-0871, Japan.
- Innovative Catalysis Science Division, Institute for Open and Transdisciplinary Research Initiatives (ICS-OTRI), Osaka University, Suita, 565-0871, Japan
| |
Collapse
|
2
|
Cysewski P, Jeliński T, Przybyłek M. Finding the Right Solvent: A Novel Screening Protocol for Identifying Environmentally Friendly and Cost-Effective Options for Benzenesulfonamide. Molecules 2023; 28:5008. [PMID: 37446671 DOI: 10.3390/molecules28135008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/23/2023] [Accepted: 06/24/2023] [Indexed: 07/15/2023] Open
Abstract
This study investigated the solubility of benzenesulfonamide (BSA) as a model compound using experimental and computational methods. New experimental solubility data were collected in the solvents DMSO, DMF, 4FM, and their binary mixtures with water. The predictive model was constructed based on the best-performing regression models trained on available experimental data, and their hyperparameters were optimized using a newly developed Python code. To evaluate the models, a novel scoring function was formulated, considering not only the accuracy but also the bias-variance tradeoff through a learning curve analysis. An ensemble approach was adopted by selecting the top-performing regression models for test and validation subsets. The obtained model accurately back-calculated the experimental data and was used to predict the solubility of BSA in 2067 potential solvents. The analysis of the entire solvent space focused on the identification of solvents with high solubility, a low environmental impact, and affordability, leading to a refined list of potential candidates that meet all three requirements. The proposed procedure has general applicability and can significantly improve the quality and speed of experimental solvent screening.
Collapse
Affiliation(s)
- Piotr Cysewski
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland
| | - Tomasz Jeliński
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland
| | - Maciej Przybyłek
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland
| |
Collapse
|
3
|
Li M, Zeng M, Zhang H, Chen H, Guan L. Biological Activity Predictions of Ligands Based on Hybrid Molecular Fingerprinting and Ensemble Learning. ACS OMEGA 2023; 8:5561-5570. [PMID: 36816680 PMCID: PMC9933080 DOI: 10.1021/acsomega.2c06944] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
The biological activity predictions of ligands are an important research direction, which can improve the efficiency and success probability of drug screening. However, the traditional prediction method has the disadvantages of complex modeling and low screening efficiency. Machine learning is considered an important research direction to solve these traditional method problems in the near future. This paper proposes a machine learning model with high predictive accuracy and stable prediction ability, namely, the back propagation neural network cross-support vector regression model (BPCSVR). By comparing multiple molecular descriptors, MACCS fingerprint and ECFP6 fingerprint were selected as inputs, and the stable prediction ability of the model was improved by integrating multiple models and correcting similar samples. We used leave-one-out cross-validation on 3038 samples from six data sets. The coefficient of determination, root mean square error, and absolute error were used as the evaluation parameters. After comparing the multiclass models, the results show that the BPCSVR model has stable prediction ability in different data sets, and the prediction accuracy is higher than other comparison models.
Collapse
|
4
|
Jiao Z, Zhang Z, Jung S, Wang Q. Machine learning based quantitative consequence prediction models for toxic dispersion casualty. J Loss Prev Process Ind 2022. [DOI: 10.1016/j.jlp.2022.104952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
5
|
Yang JR, Chen Q, Wang H, Hu XY, Guo YM, Chen JZ. Reliable CA-(Q)SAR generation based on entropy weight optimized by grid search and correction factors. Comput Biol Med 2022; 146:105573. [DOI: 10.1016/j.compbiomed.2022.105573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/31/2022] [Accepted: 04/26/2022] [Indexed: 11/03/2022]
|
6
|
Escobar-Hernandez HU, Pérez LM, Hu P, Soto FA, Papadaki MI, Zhou HC, Wang Q. Thermal Stability of Metal–Organic Frameworks (MOFs): Concept, Determination, and Model Prediction Using Computational Chemistry and Machine Learning. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c00561] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Harold U. Escobar-Hernandez
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Lisa M. Pérez
- Division of Research, High Performance Research Computing, Texas A&M University, College Station, Texas 77843-3361, United States
| | - Pingfan Hu
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Fernando A. Soto
- Energy Engineering, Penn State Greater Allegheny, McKeesport, Pennsylvania 15132, United States
| | - Maria I. Papadaki
- Department of Environmental & Natural Resources Management, University of Patras, Agrinio GR30100, Greece
| | - Hong-Cai Zhou
- Department of Chemistry, Texas A&M University, College Station, Texas 77843-3255, United States
| | - Qingsheng Wang
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| |
Collapse
|
7
|
Terrell E. Estimation of Hansen solubility parameters with regularized regression for biomass conversion products: An application of adaptable group contribution. Chem Eng Sci 2022. [DOI: 10.1016/j.ces.2021.117184] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|