1
|
Taouti MM, Selmane N, Cheknane A, Benaya N, Hilal HS. DFT and machine learning integration to predict efficiency of modified metal-free dyes in DSSCs. J Mol Graph Model 2025; 136:108975. [PMID: 39938140 DOI: 10.1016/j.jmgm.2025.108975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 01/15/2025] [Accepted: 02/04/2025] [Indexed: 02/14/2025]
Abstract
Power conversion efficiency (PCE) prediction in dye-sensitized solar cells (DSSCs) increasingly relies on computation and machine learning, lowering experimental demands and accelerating materials discovery. In this work we incorporated quantum-chemical descriptors, computed via density-functional theory (DFT), with cheminformatic descriptors generated using the Mordred library to train two machine learning models. The Random Forest and XGBoost models were trained on a dataset of 40 dyes, together with their literature experimental PCEs. The model stabilities were investigated using multiple random state configurations (30, 38, 42 and 50). The trained models were used to evaluate newly engineered dyes, and then validated through electronic structure analysis. The novel dyes are derivatives of: (E)-10-methyl-9-(3-(10-methylacridin-9(10H)-ylidene)prop-1-en-1-yl)acridin-10-ium (C-PE3), 10-methyl-9-((1E,3E)-5-(10-methylacridin-9(10H)-ylidene)penta-1,3-dien-1-yl)acridin-10-ium (C-PE5) and 10-methyl-9-((1E,3E,5E)-7-(10-methylacridin-9(10H)-ylidene)hepta-1,3,5-trien-1-yl)acridin-10-ium (C-PE7). A R2 = 0.8904 and RMSE = 0.0038 for XGBoost as performer under the random state of 38 were achieved. Both models, XGBoost and RF identified C3-PE5 and C3-PE7 as top promising candidates, with predicted PCEs of 5.49 % and 5.43 %, respectively. By integrating DFT/cheminformatics and machine learning techniques, this study enabled PCE prediction with no need for experimental input.
Collapse
Affiliation(s)
- Mohammed Madani Taouti
- Laboratoire Matériaux, Systèmes Énergétiques, Énergies Renouvelables et gestion de l'Énergie (LMSEERGE). Université Amar Telidji de Laghouat. Bd des Martyrs BP37G, Laghouat, 03000, Algeria
| | - Naceur Selmane
- Laboratoire Matériaux, Systèmes Énergétiques, Énergies Renouvelables et gestion de l'Énergie (LMSEERGE). Université Amar Telidji de Laghouat. Bd des Martyrs BP37G, Laghouat, 03000, Algeria.
| | - Ali Cheknane
- Laboratoire Matériaux, Systèmes Énergétiques, Énergies Renouvelables et gestion de l'Énergie (LMSEERGE). Université Amar Telidji de Laghouat. Bd des Martyrs BP37G, Laghouat, 03000, Algeria.
| | - Noureddine Benaya
- Plateau Technique d'analyse Physico-chimique-laghouat-(PTAPC-L), Université Amar Telidji de Laghouat. Bd des Martyrs BP37G, Laghouat, 03000, Algeria
| | - Hikmat S Hilal
- SSERL, Department of Chemistry, An-Najah National University, Nablus, P400, Palestine.
| |
Collapse
|
2
|
Yang H, Wang Y, Zhao J, Li P, Li Z, Li L, Fan B, Wang F. Data-driven pipeline modeling for predicting unknown protein adulteration in dairy products. Food Chem 2025; 471:142736. [PMID: 39787999 DOI: 10.1016/j.foodchem.2024.142736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 12/23/2024] [Accepted: 12/30/2024] [Indexed: 01/12/2025]
Abstract
To preemptively predict unknown protein adulterants in food and curb the incidence of food fraud at its origin, data-driven models were developed using three machine learning (ML) algorithms. Among these, the random forest (RF)-based model achieved optimal performance, achieving accuracies of 96.2 %, 95.1 %, and 88.0 % in identifying odorless, tasteless, and colorless adulterants, respectively. These optimal models are then applied to implement external prediction, ultimately predicting 51 potential adulterants. From these, two cost-effective candidates were selected for adulteration tests. While there was no significant sensory difference between adulterated and unadulterated milk powder, the protein content in the adulterated milk powder increased. This study offers a proactive strategy to combat food fraud effectively.
Collapse
Affiliation(s)
- Huihui Yang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China
| | - Yutang Wang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China; Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji 831100, PR China.
| | - Jinyong Zhao
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China
| | - Ping Li
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China
| | - Zhixiang Li
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China
| | - Long Li
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China.
| | - Bei Fan
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China.
| | - Fengzhong Wang
- Institute of Food Science and Technology, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, PR China.
| |
Collapse
|
3
|
Farrukh A, Shaaban IA, Assiri MA, Tahir MH, El-Bahy ZM. UV/visible absorption maxima prediction of water-soluble organic compounds and generation of library of new organic compounds. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 328:125453. [PMID: 39571212 DOI: 10.1016/j.saa.2024.125453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 11/15/2024] [Accepted: 11/16/2024] [Indexed: 12/10/2024]
Abstract
In this study, UV/visible absorption maxima of organic compounds are predicted with the help of machine learning (ML). Four ML models are evaluated, the gradient boosting model has performed best. We also analyzed feature importance. Using Python-based tools, we generated and visualized a new set of 5,000 organic compounds. These compounds were screened based on their predicted UV/visible absorption maxima, selecting those with red-shifted absorption. The assessment of synthetic accessibility indicated that most of the chosen compounds are relatively easy to synthesize.
Collapse
Affiliation(s)
- Aftab Farrukh
- Department of Physics, PMAS-Arid Agriculture University, Rawalpindi 46300, Pakistan
| | - Ibrahim A Shaaban
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| | - Mohammed A Assiri
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| | - Mudassir Hussain Tahir
- Research Faculty of Agriculture, Field Science Center for Northern Biosphere, Hokkaido University, Sapporo Hokkaido, 060-8589, 060-0811, Japan.
| | - Zeinhom M El-Bahy
- Department of Chemistry, Faculty of Science, Al-Azhar University, Nasr City 11884, Cairo, Egypt
| |
Collapse
|
4
|
McDonald M, Koscher BA, Canty RB, Zhang J, Ning A, Jensen KF. Bayesian Optimization over Multiple Experimental Fidelities Accelerates Automated Discovery of Drug Molecules. ACS CENTRAL SCIENCE 2025; 11:346-356. [PMID: 40028358 PMCID: PMC11869128 DOI: 10.1021/acscentsci.4c01991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 01/28/2025] [Accepted: 01/29/2025] [Indexed: 03/05/2025]
Abstract
Different experiments of differing fidelities are commonly used in the search for new drug molecules. In classic experimental funnels, libraries of molecules undergo sequential rounds of virtual, coarse, and refined experimental screenings, with each level balanced between the cost of experiments and the number of molecules screened. Bayesian optimization offers an alternative approach, using iterative experiments to locate optimal molecules with fewer experiments than large-scale screening, but without the ability to weigh the costs and benefits of different types of experiments. In this work, we combine the multifidelity approach of the experimental funnel with Bayesian optimization to search for drug molecules iteratively, taking full advantage of different types of experiments, their costs, and the quality of the data they produce. We first demonstrate the utility of the multifidelity Bayesian optimization (MF-BO) approach on a series of drug targets with data reported in ChEMBL, emphasizing what properties of the chemical search space result in substantial acceleration with MF-BO. Then we integrate the MF-BO experiment selection algorithm into an autonomous molecular discovery platform to illustrate the prospective search for new histone deacetylase inhibitors using docking scores, single-point percent inhibitions, and dose-response IC50 values as low-, medium-, and high-fidelity experiments. A chemical search space with appropriate diversity and fidelity correlation for use with MF-BO was constructed with a genetic generative algorithm. The MF-BO integrated platform then docked more than 3,500 molecules, automatically synthesized and screened more than 120 molecules for percent inhibition, and selected a handful of molecules for manual evaluation at the highest fidelity. Many of the molecules screened have never been reported in any capacity. At the end of the search, several new histone deacetylase inhibitors were found with submicromolar inhibition, free of problematic hydroxamate moieties that constrain the use of current inhibitors.
Collapse
Affiliation(s)
- Matthew
A. McDonald
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
- Drexel
University, Department of Chemical and Biological
Engineering, 3101 Ludlow
St, Philadelphia, Pennsylvania 19104, United States
| | - Brent A. Koscher
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Richard B. Canty
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Jason Zhang
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Angelina Ning
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F. Jensen
- Massachusetts
Institute of Technology, Department of Chemical
Engineering, 77 Massachusetts
Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Wang Q, Wang B, Hou T, Ma F, Chang H, Dong Z, Wan Y. Screening estimates of bioaccumulation factors for 4950 per- and polyfluoroalkyl substances in aquatic species. JOURNAL OF HAZARDOUS MATERIALS 2025; 489:137672. [PMID: 40010215 DOI: 10.1016/j.jhazmat.2025.137672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 02/17/2025] [Accepted: 02/17/2025] [Indexed: 02/28/2025]
Abstract
The considerable variability in bioaccumulation factors (BAFs) of per- and polyfluoroalkyl substances (PFAS) across aquatic species, driven by the diversity of PFAS, complex water conditions, and species differences, underscores the resource-intensive nature of relying on experimental data. To develop a robust and effective approach for predicting BAFs, a predictive framework using a three-level stacking deep ensemble learning model was established. Initially, we compiled a substantial dataset of BAFs, encompassing a wide variety of PFAS across both marine and freshwater species. The stacking model demonstrated strong performance, achieving R-squared (R2) values of 0.94 and 0.89, and root-mean-square errors (RMSE) of 0.88 and 1.17 for training and testing, respectively. External validation revealed that 60 % and 90 % of predictions fell within 2-fold and 4-fold differences, respectively, from the observed values. Using this model, we predicted BAFs for 4950 PFAS in 54 global edible fish species, with the predicted median BAF values ranging from 22 L/kg to 477.09 L/kg. The results indicated that PFAS with multiple functional groups (e.g., benzene rings and ketones) exhibited higher BAFs. Finally, an accessible online tool (https://pfasbaf.hhra.net/) was launched to facilitate BAF predictions. This newly released application promises to offer valuable support for environmental risk management and policymaking efforts.
Collapse
Affiliation(s)
- Qi Wang
- Beijing Key Lab for Source Control Technology of Water Pollution, College of Environmental Sciences & Engineering, Beijing Forestry University, Beijing 100083, China
| | - Bixuan Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Ting Hou
- The Bureau of Ecology and Environment of the Wulanchabu, Wuluanchabu 012000, China
| | - Fujun Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Hong Chang
- Beijing Key Lab for Source Control Technology of Water Pollution, College of Environmental Sciences & Engineering, Beijing Forestry University, Beijing 100083, China.
| | - Zhaomin Dong
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China; School of Public Health, Southeast University, Nanjing 210000, China.
| | - Yi Wan
- Laboratory for Earth Surface Processes, College of Urban and Environmental Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
6
|
Makarov DM, Ksenofontov AA, Budkov YA. Consensus Modeling for Predicting Chemical Binding to Transthyretin as the Winning Solution of the Tox24 Challenge. Chem Res Toxicol 2025. [PMID: 39969008 DOI: 10.1021/acs.chemrestox.4c00421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
The utilization of predictive methodologies for the assessment of toxicological properties represents an alternative approach that facilitates the identification of safe compounds while concurrently reducing the financial costs associated with the process. The objective of the Tox24 Challenge was to assess the progress in computational methods for predicting the activity of chemical binding to transthyretin (TTR). In order to fulfill the requirements of this task, the data set, measured by the Environmental Protection Agency, consisted of 1512 chemical substances of diverse nature. This paper describes the model that won the Tox24 Challenge and the steps taken for its further improvement. The Transformer convolutional neural network (CNN) model achieved the best performance as a standalone solution. Meanwhile, a multitask model built on a graph CNN, trained using 11 additional acute systemic toxicity data sets with increased weighting on the TTR binding activity, showed comparable results on the blind test set. The winning solution was a consensus model consisting of two catBoost models with OEstate and Mold2 descriptor sets, as well as two transformer-based models. The improvement of this solution involved adding a fifth model based on multitask learning using the graph CNN method, which led to a reduction in RMSE on the blind test set to 20.3%. The winning model was developed using the OCHEM web platform and is available online at https://ochem.eu/article/162082.
Collapse
Affiliation(s)
- Dmitriy M Makarov
- G. A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Ivanovo 153045, Russia
| | - Alexander A Ksenofontov
- G. A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Ivanovo 153045, Russia
| | - Yury A Budkov
- G. A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Ivanovo 153045, Russia
- Laboratory of Computational Physics, HSE University, Tallinskaya st. 34, Moscow 123458, Russia
- School of Applied Mathematics, HSE University, Tallinskaya st. 34, Moscow 123458, Russia
| |
Collapse
|
7
|
Ahmad N, Eid G, El-Toony MM, Mahmood A. Harnessing machine learning for the rational design of high-performance fluorescent dyes. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 334:125918. [PMID: 39986253 DOI: 10.1016/j.saa.2025.125918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 02/11/2025] [Accepted: 02/16/2025] [Indexed: 02/24/2025]
Abstract
The design of fluorescent dyes with optimized performance is crucial for advancements in various fields, including bioimaging, diagnostics, and optoelectronics. Traditional approaches to dye design often rely on trial-and-error experimentation, which can be time-consuming and resource-intensive. 42 ML models are tried for each property. One best model is selected for each property. Gradient boosting regressor is best model for the prediction of excitation values while extra trees regressor is best model for the prediction of emission values. A database of 5000 new dyes is generated and analyzed. 30 dyes with higher excitation and emission values are selected. Synthetic accessibility analysis is done for 30 dyes and majority of dyes are easy to synthesized. Our results demonstrate that ML-assisted design can significantly accelerate the discovery process, reduce the need for costly experimental iterations, and lead to the development of dyes with tailored properties for specific applications.
Collapse
Affiliation(s)
- Nafees Ahmad
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ghada Eid
- Physics Department, Faculty of Sciences and Arts, King Khalid University, Tehama Branch, Saudi Arabia
| | - Mohamed M El-Toony
- Chemistry Department, Faculty of Sciences and Arts, King Khalid University, Tehama Branch, Saudi Arabia
| | - Asif Mahmood
- Key Laboratory of Cluster Science of Ministry of Education, Beijing Key Laboratory of Photoelectronic/Electrophotonic Conversion Materials, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
8
|
Jiang P, Xu Y, Rao K, Ma M, Wang Z. Systematic evaluation of sampling rate influences and variability in POCIS using meta-analysis and quantitative structure property relationship (QSPR). ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 367:125666. [PMID: 39793645 DOI: 10.1016/j.envpol.2025.125666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 01/07/2025] [Accepted: 01/07/2025] [Indexed: 01/13/2025]
Abstract
Despite the significant benefits of aquatic passive sampling (low detection limits and time-weighted average concentrations), the use of passive samplers is impeded by uncertainties, particularly concerning the accuracy of sampling rates. This study employed a systematic evaluation approach based on the combination of meta-analysis and quantitative structure-property relationships (QSPR) models to address these issues. A comprehensive meta-analysis based on extensive data from 298 studies on the Polar Organic Chemical Integrative Sampler (POCIS) identified essential configuration parameters, including the receiving phase (type, mass) and the diffusion-limiting membrane (type, thickness, pore size), as key factors influencing uptake kinetic parameters. The incomplete availability of these details across studies potentially impacts data reproducibility and comparability. The subsequent meta-regression and subgroup analysis were performed to reveal the most significant factors contributing to sampling rate variability and inter-study heterogeneity. The flow rate and octanol-water partitioning (Kow or pH-dependent Dow) were identified from all environmental factors and chemical properties. Furthermore, the impact of chemical properties on the sampling rates of POCIS was predicted by Quantitative Structure-Property Relationship (QSPR) models using 2D descriptors and random forest regression. The analysis highlighted that the electrotopological state and molecular mass are the most important chemical properties influencing the sampling rate. This study systematically unraveled the most important impact factors on reliable estimates of passive sampling rates, and these causes of uncertainty should be further considered in aquatic monitoring and assessment with passive samplers.
Collapse
Affiliation(s)
- Peiyu Jiang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiping Xu
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China.
| | - Kaifeng Rao
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China
| | - Mei Ma
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zijian Wang
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China
| |
Collapse
|
9
|
Taghavi A, Springer NA, Zanon PRA, Li Y, Li C, Childs-Disney JL, Disney MD. The evolution and application of RNA-focused small molecule libraries. RSC Chem Biol 2025:d4cb00272e. [PMID: 39957993 PMCID: PMC11824871 DOI: 10.1039/d4cb00272e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 02/06/2025] [Indexed: 02/18/2025] Open
Abstract
RNA structure plays a role in nearly every disease. Therefore, approaches that identify tractable small molecule chemical matter that targets RNA and affects its function would transform drug discovery. Despite this potential, discovery of RNA-targeted small molecule chemical probes and medicines remains in its infancy. Advances in RNA-focused libraries are key to enable more successful primary screens and to define structure-activity relationships amongst hit molecules. In this review, we describe how RNA-focused small molecule libraries have been used and evolved over time and provide underlying principles for their application to develop bioactive small molecules. We also describe areas that need further investigation to advance the field, including generation of larger data sets to inform machine learning approaches.
Collapse
Affiliation(s)
- Amirhossein Taghavi
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology 130 Scripps Way Jupiter FL 33458 USA
| | - Noah A Springer
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology 130 Scripps Way Jupiter FL 33458 USA
- Department of Chemistry, The Scripps Research Institute 130 Scripps Way Jupiter FL 33458 USA
| | - Patrick R A Zanon
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology 130 Scripps Way Jupiter FL 33458 USA
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, The University of Florida Gainesville FL 32610 USA
- Department of Computer & Information Science & Engineering, University of Florida Gainesville FL 32611 USA
| | - Chenglong Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, The University of Florida Gainesville FL 32610 USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology 130 Scripps Way Jupiter FL 33458 USA
| | - Matthew D Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology 130 Scripps Way Jupiter FL 33458 USA
- Department of Chemistry, The Scripps Research Institute 130 Scripps Way Jupiter FL 33458 USA
| |
Collapse
|
10
|
Beck AG, Fine J, Lam YH, Sherer EC, Regalado EL, Aggarwal P. Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries. J Chem Inf Model 2025; 65:1053-1060. [PMID: 39883037 DOI: 10.1021/acs.jcim.4c01980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.
Collapse
Affiliation(s)
- Armen G Beck
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Yu-Hong Lam
- Modeling and Informatics, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Edward C Sherer
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Erik L Regalado
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
11
|
Hua PX, Huang Z, Xu ZY, Zhao Q, Ye CY, Wang YF, Xu YH, Fu Y, Ding H. An active representation learning method for reaction yield prediction with small-scale data. Commun Chem 2025; 8:42. [PMID: 39929993 PMCID: PMC11811124 DOI: 10.1038/s42004-025-01434-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 01/27/2025] [Indexed: 02/13/2025] Open
Abstract
Reaction optimization plays an essential role in chemical research and industrial production. To explore a large reaction system, a practical issue is how to reduce the heavy experimental load for finding the high-yield conditions. In this paper, we present an efficient machine learning tool called "RS-Coreset", where the key idea is to take advantage of deep representation learning techniques to guide an interactive procedure for representing the full reaction space. Our proposed tool only uses small-scale data, say 2.5% to 5% of the instances, to predict the yields of the reaction space. We validate the performance on three public datasets and achieve state-of-the-art results. Moreover, we apply this tool to assist the realistic exploration of the Lewis base-boryl radicals enabled dechlorinative coupling reactions in our lab. The tool can help us to effectively predict the yields and even discover several feasible reaction combinations that were overlooked in previous articles.
Collapse
Affiliation(s)
- Peng-Xiang Hua
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhen Huang
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhe-Yuan Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Qiang Zhao
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Chen-Yang Ye
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Yi-Feng Wang
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yun-He Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yao Fu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Hu Ding
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| |
Collapse
|
12
|
Noreldeen HAA. Enhancing lipid identification in LC-HRMS data through machine learning-based retention time prediction. J Chromatogr A 2025; 1742:465650. [PMID: 39798479 DOI: 10.1016/j.chroma.2024.465650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 12/12/2024] [Accepted: 12/30/2024] [Indexed: 01/15/2025]
Abstract
The comprehensive identification of peaks in untargeted lipidomics using LC-MS/MS remains a significant challenge. Confidence in lipid annotation can be greatly improved by integrating a highly accurate machine learning-based retention time prediction model. Such an approach enables the identification of lipids for understanding pathogenic mechanisms, biomarker discovery, and drug screening. In this study, we developed a machine learning model to predict retention times and facilitate lipid peak annotations in LC-MS-based untargeted lipidomics. Our model achieved high correlation coefficients of 0.998 and 0.990, with mean absolute errors (MAE) of 0.107 min and 0.240 min for the training and test sets, respectively. External validation showed similarly strong performance, with correlations of 0.991 and 0.978, and MAE values of 0.241 min and 0.270 min. We also compared the impact of molecular descriptors and molecular fingerprints on the model's performance, finding that molecular descriptors outperformed molecular fingerprints across all datasets when using Random Forest (RF) for model construction. Notably, this retention time calibration model demonstrates robust performance across chromatographic systems with comparable gradients and flow rates. Overall, this machine learning model enhances lipid annotation accuracy and reduces errors in untargeted lipidomics, improving data analysis across multiple datasets.
Collapse
|
13
|
Tahir MH, Farrukh A, Alqahtany FZ, Badshah A, Shaaban IA, Assiri MA. Accelerated discovery of polymer donors for organic solar cells through machine learning: From library creation to performance forecasting. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 326:125298. [PMID: 39447304 DOI: 10.1016/j.saa.2024.125298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 09/10/2024] [Accepted: 10/16/2024] [Indexed: 10/26/2024]
Abstract
The design of novel polymer donors for organic solar cells has been a major research focus for decades, but discovering unique materials remains challenging due to the high cost of experimentation. In this study, machine learning models are employed to predict power conversion efficiency (PCE), Mordred descriptors are used for model training. Among the four machine learning models evaluated, the gradient boosting regressor emerged as the best-performing model. Additionally, a chemical library of polymer donors was generated and analyzed using various measures. 30 donors with highest PCE are selected and their synthetic accessibility is evaluated. Similarity analysis has indicated much resemblance in selected polymer donors.
Collapse
Affiliation(s)
- Mudassir Hussain Tahir
- Research Faculty of Agriculture, Field Science Center for Northern Biosphere, Hokkaido University, Sapporo, Hokkaido 060-8589, 060-0811, Japan
| | - Aftab Farrukh
- Department of Physics, PMAS-Arid Agriculture University, Rawalpindi 46300, Pakistan
| | - Faleh Zafer Alqahtany
- Department of Chemistry, College of Science, University of Bisha, Bisha, Saudi Arabia
| | - Amir Badshah
- Department of Chemistry, Kohat University of Science and Technology, Kohat 26000, Pakistan.
| | - Ibrahim A Shaaban
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| | - Mohammed A Assiri
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| |
Collapse
|
14
|
Faris A, Ibrahim IM, Alnajjar R, Hadni H, Bhat MA, Yaseen M, Chakraborty S, Alsakhen N, Shamkh IM, Mabood F, M Naglah A, Ullah I, Ziedan N, Elhallaoui M. QSAR-driven screening uncovers and designs novel pyrimidine-4,6-diamine derivatives as potent JAK3 inhibitors. J Biomol Struct Dyn 2025; 43:757-786. [PMID: 38059345 DOI: 10.1080/07391102.2023.2283168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 11/08/2023] [Indexed: 12/08/2023]
Abstract
This study presents a robust and integrated methodology that harnesses a range of computational techniques to facilitate the design and prediction of new inhibitors targeting the JAK3/STAT pathway. This methodology encompasses several strategies, including QSAR analysis, pharmacophore modeling, ADMET prediction, covalent docking, molecular dynamics (MD) simulations, and the calculation of binding free energies (MM/GBSA). An efficacious QSAR model was meticulously crafted through the employment of multiple linear regression (MLR). The initial MLR model underwent further refinement employing an artificial neural network (ANN) methodology aimed at minimizing predictive errors. Notably, both MLR and ANN exhibited commendable performance, showcasing R2 values of 0.89 and 0.95, respectively. The model's precision was assessed via leave-one-out cross-validation (CV) yielding a Q2 value of 0.65, supplemented by rigorous Y-randomization. , The pharmacophore model effectively differentiated between active and inactive drugs, identifying potential JAK3 inhibitors, and demonstrated validity with an ROC value of 0.86. The newly discovered and designed inhibitors exhibited high inhibitory potency, ranging from 6 to 8, as accurately predicted by the QSAR models. Comparative analysis with FDA-approved Tofacitinib revealed that the new compounds exhibited promising ADMET properties and strong covalent docking (CovDock) interactions. The stability of the new discovered and designed inhibitors within the JAK3 binding site was confirmed through 500 ns MD simulations, while MM/GBSA calculations supported their binding affinity. Additionally, a retrosynthetic study was conducted to facilitate the synthesis of these potential JAK3/STAT inhibitors. The overall integrated approach demonstrates the feasibility of designing novel JAK3/STAT inhibitors with robust efficacy and excellent ADMET characteristics that surpass Tofacitinib by a significant margin.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Abdelmoujoud Faris
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| | - Ibrahim M Ibrahim
- Biophysics Department, Faculty of Science, Cairo University, Cairo, Egypt
| | - Radwan Alnajjar
- Department of Chemistry, Faculty of Science, University of Benghazi, Benghazi, Libya
| | - Hanine Hadni
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| | - Mashooq Ahmad Bhat
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Muhammad Yaseen
- Institute of Chemical Sciences, University of Swat, Main Campus, Charbagh, Swat, Pakistan
| | - Souvik Chakraborty
- Department of Physiology, Bhairab Ganguly College, Belghoria, Kolkata, West Bengal, India
| | - Nada Alsakhen
- Department of Chemistry, Faculty of Science, The Hashemite University, Zarqa, Jordan
| | - Israa M Shamkh
- Botany and Microbiology Department, Faculty of Science, Cairo University, Cairo, Egypt
| | - Fazal Mabood
- Institute of Chemical Sciences, University of Swat, Main Campus, Charbagh, Swat, Pakistan
| | - Ahmed M Naglah
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Ihsan Ullah
- Institute of Chemical Sciences, University of Swat, Main Campus, Charbagh, Swat, Pakistan
| | - Noha Ziedan
- Department of Physical, Mathematical and Engineering Science, Faculty of Science, Business and Enterprise, University of Chester, Chester, UK
| | - Menana Elhallaoui
- LIMAS, Department of Chemical Sciences, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| |
Collapse
|
15
|
Amin SA, Sessa L, Gayen S, Piotto S. PPARγ modulator predictor (PGMP_v1): chemical space exploration and computational insights for enhanced type 2 diabetes mellitus management. Mol Divers 2025:10.1007/s11030-025-11118-5. [PMID: 39891837 DOI: 10.1007/s11030-025-11118-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 01/15/2025] [Indexed: 02/03/2025]
Abstract
Peroxisome proliferator-activated receptor gamma (PPARγ) plays a critical role in adipocyte differentiation and enhances insulin sensitivity. In contemporary drug discovery, in silico design strategies offer significant advantages by revealing essential structural insights for lead optimization. The study is guided by two main objectives: (i) a ligand-based approach to explore the chemical space of PPARγ modulators followed by molecular docking ensembles (MDEs) to investigate ligand-binding interactions, (ii) the development of a supervised ML model for a large dataset of compounds targeting PPARγ. Additionally, the combination of chemical space networks with ML models enables the rapid screening and prediction of PPARγ modulators. These modeling analyses will assist medicinal chemists in designing more potent PPARγ modulators. To further enhance accessibility for the scientific community, we developed an online tool, "PGMP_v1," aimed at prospective screening for PPARγ modulators. The tool "PGMP_v1" is available at the provided link https://github.com/Amincheminfom/PGMP_v1 . The integration of these computational methods has uncovered crucial structural motifs that are essential for PPARγ activity, advancing the development of more effective modulators in the future.
Collapse
Affiliation(s)
- Sk Abdul Amin
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084, Fisciano, SA, Italy.
| | - Lucia Sessa
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084, Fisciano, SA, Italy
| | - Shovanlal Gayen
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata, West Bengal, 700032, India
| | - Stefano Piotto
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084, Fisciano, SA, Italy
| |
Collapse
|
16
|
Monem S, Abdel-Hamid AH, Hassanien AE. Drug toxicity prediction model based on enhanced graph neural network. Comput Biol Med 2025; 185:109614. [PMID: 39721415 DOI: 10.1016/j.compbiomed.2024.109614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 12/15/2024] [Accepted: 12/21/2024] [Indexed: 12/28/2024]
Abstract
Prediction of drug toxicity remains a significant challenge and an essential process in drug discovery. Traditional machine learning algorithms struggle to capture the full scope of molecular structure features, limiting their effectiveness in toxicity prediction. Graph Neural Network offers a promising solution by effectively extracting drug features from their molecular graphs. However, existing graph learning algorithms fail to account for the interaction features between graph nodes and the indirect edges connecting them. This paper proposes an enhanced graph Neural Network algorithm that employs multi-view features for each node, capturing the feature interactions between each node and its neighbors. Additionally, the adjacency matrix is preprocessed to handle indirect edge interactions. A pooling technique is then applied to aggregate node features, followed by normalization and an activation layer. To further enhance the proposed algorithm, multi-scale attention is applied to learn graph features at different scales, utilizing weights to understand intricate relationships among node feature vectors. The proposed algorithm is evaluated using eight toxicity datasets, covering binary classification, multi-task multi-class, and regression tasks. For binary classification, the Tox21, AMES, Skin reaction, Carcinogens, and DILI datasets are tested. For multi-task multi-class, the ToxCast dataset is applied, and for regression, the LD50 and hREG datasets are tested. The proposed algorithm is compared with four well-known algorithms including Graph Convolution Network, Graph Attention Network, Graph Isomorphism Network, Enhanced Graph Isomorphism Network, and Graph Total Variation. For the classification task, the proposed algorithm achieves ROC-AUC scores of 0.752 for Tox21, 0.775 for AMES, 0.707 for Skin reaction, 0.845 for Carcinogens, 0.92 for DILI, and 0.691 for the ToxCast dataset. For the regression task, the algorithm attains mean square errors of 0.896 for the LD50 dataset and 0.766 for the hREG dataset. These results demonstrate an improvement over the compared algorithms across all evaluated datasets.
Collapse
Affiliation(s)
- Samar Monem
- Mathematics and Computer Science Department, Faculty of Science, Beni-Suef University, 62521, Beni-Suef, Egypt.
| | - Alaa H Abdel-Hamid
- Mathematics and Computer Science Department, Faculty of Science, Beni-Suef University, 62521, Beni-Suef, Egypt.
| | | |
Collapse
|
17
|
Goel M, Amawate A, Singh A, Bagler G. ToxinPredictor: Computational models to predict the toxicity of molecules. CHEMOSPHERE 2025; 370:143900. [PMID: 39701316 DOI: 10.1016/j.chemosphere.2024.143900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 12/02/2024] [Accepted: 12/03/2024] [Indexed: 12/21/2024]
Abstract
Predicting the toxicity of molecules is essential in fields like drug discovery, environmental protection, and industrial chemical management. While traditional experimental methods are time-consuming and costly, computational models offer an efficient alternative. In this study, we introduce ToxinPredictor, a machine learning-based model to predict the toxicity of small molecules using their structural properties. The model was trained on a curated dataset of 7550 toxic and 6514 non-toxic molecules, leveraging feature selection techniques like Boruta and PCA. The best-performing model, a Support Vector Machine (SVM), achieved state-of-the-art results with an AUROC of 91.7%, F1-score of 84.9%, and accuracy of 85.4%, outperforming existing solutions. SHAP analysis was applied to the SVM model to identify the most important molecular descriptors contributing to toxicity predictions, enhancing interpretability. Despite challenges related to data quality, ToxinPredictor provides a reliable framework for toxicity risk assessment, paving the way for safer drug development and improved environmental health assessments. We also created a user-friendly webserver, ToxinPredictor (https://cosylab.iiitd.edu.in/toxinpredictor) to facilitate the search and prediction of toxic compounds.
Collapse
Affiliation(s)
- Mansi Goel
- Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Center of Excellence in Healthcare, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India
| | - Arav Amawate
- Department of Computer Science, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India
| | - Angadjeet Singh
- Department of Computer Science, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India
| | - Ganesh Bagler
- Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Center of Excellence in Healthcare, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India.
| |
Collapse
|
18
|
Zheng Y, Thakolkaran P, Biswal AK, Smith JA, Lu Z, Zheng S, Nguyen BH, Kumar S, Vashisth A. AI-Guided Inverse Design and Discovery of Recyclable Vitrimeric Polymers. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2411385. [PMID: 39686685 PMCID: PMC11809429 DOI: 10.1002/advs.202411385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 11/23/2024] [Indexed: 12/18/2024]
Abstract
Vitrimer is a new, exciting class of sustainable polymers with healing abilities due to their dynamic covalent adaptive networks. However, a limited choice of constituent molecules restricts their property space and potential applications. To overcome this challenge, an innovative approach coupling molecular dynamics (MD) simulations and a novel graph variational autoencoder (VAE) model for inverse design of vitrimer chemistries with desired glass transition temperature (Tg) is presented. The first diverse vitrimer dataset of one million chemistries is curated and Tg for 8,424 of them is calculated by high-throughput MD simulations calibrated by a Gaussian process model. The proposed VAE employs dual graph encoders and a latent dimension overlapping scheme which allows for individual representation of multi-component vitrimers. High accuracy and efficiency of the framework are demonstrated by discovering novel vitrimers with desirable Tg beyond the training regime. To validate the effectiveness of the framework in experiments, vitrimer chemistries are generated with a target Tg = 323 K. By incorporating chemical intuition, a novel vitrimer with Tg of 311-317 K is synthesized, experimentally demonstrating healability and flowability. The proposed framework offers an exciting tool for polymer chemists to design and synthesize novel, sustainable polymers for various applications.
Collapse
Affiliation(s)
- Yiwen Zheng
- Department of Mechanical EngineeringUniversity of WashingtonSeattleWA98195USA
| | - Prakash Thakolkaran
- Department of Materials Science and EngineeringDelft University of TechnologyDelftCD2628The Netherlands
| | - Agni K. Biswal
- Department of Mechanical EngineeringUniversity of WashingtonSeattleWA98195USA
| | - Jake A. Smith
- Microsoft ResearchRedmondWA98052USA
- Paul G. Allen School of Computer Science and EngineeringUniversity of WashingtonSeattleWA98195USA
| | - Ziheng Lu
- Microsoft Research AsiaBeijing100080China
| | | | - Bichlien H. Nguyen
- Microsoft ResearchRedmondWA98052USA
- Paul G. Allen School of Computer Science and EngineeringUniversity of WashingtonSeattleWA98195USA
| | - Siddhant Kumar
- Department of Materials Science and EngineeringDelft University of TechnologyDelftCD2628The Netherlands
| | - Aniruddh Vashisth
- Department of Mechanical EngineeringUniversity of WashingtonSeattleWA98195USA
| |
Collapse
|
19
|
Łapińska N, Szlęk J, Pacławski A, Mendyk A. Machine Learning Tool for New Selective Serotonin and Serotonin-Norepinephrine Reuptake Inhibitors. Molecules 2025; 30:637. [PMID: 39942741 PMCID: PMC11819831 DOI: 10.3390/molecules30030637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 01/17/2025] [Accepted: 01/24/2025] [Indexed: 02/16/2025] Open
Abstract
Depression, a serious mood disorder, affects about 5% of the population. Currently, there are two groups of antidepressants that are the first-line treatment for depressive disorder: selective serotonin reuptake inhibitors and serotonin-norepinephrine reuptake inhibitors. The aim of the study was to develop Quantitative Structure-Activity Relationship (QSAR) models for serotonin (SERT) and norepinephrine (NET) transporters to predict the affinity and inhibition potential of new molecules. Models were developed using the Automated Machine Learning tool Mljar based on 80% of the dataset according to 10-fold cross-validation and externally validated on the remaining 20% of data. The molecular representation featured two-dimensional Mordred descriptors. For each model, Shapley additive explanations analysis was performed to clarify the influence of the descriptors on the models' predictions. Based on the final QSAR models, the following results were obtained: NET and pIC50 value RMSEtest = 0.678, R2test = 0.640; NET and pKi RMSEtest = 0.590, R2test = 0.709; SERT and pIC50 RMSEtest = 0.645, R2test = 0.678; SERT and pKi value RMSEtest = 0.540, R2test = 0.828. QSAR models for serotonin and norepinephrine transporters have been made available in a new module of the SerotoninAI application to enhance usability for scientists.
Collapse
Affiliation(s)
- Natalia Łapińska
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.P.); (A.M.)
| | - Jakub Szlęk
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.P.); (A.M.)
- Bioinformatics and In Silico Analysis Laboratory, Center for the Development of Therapies for Civilization and Age-Related Diseases (CDT-CARD), 8 Skawińska St., 31-066 Kraków, Poland
| | - Adam Pacławski
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.P.); (A.M.)
| | - Aleksander Mendyk
- Department of Pharmaceutical Technology and Biopharmaceutics, Jagiellonian University Medical College, 30-688 Kraków, Poland; (N.Ł.); (A.P.); (A.M.)
| |
Collapse
|
20
|
Qu X, Jiang C, Shan M, Ke W, Chen J, Zhao Q, Hu Y, Liu J, Qin LP, Cheng G. Prediction of Proteolysis-Targeting Chimeras Retention Time Using XGBoost Model Incorporated with Chromatographic Conditions. J Chem Inf Model 2025; 65:613-625. [PMID: 39786356 DOI: 10.1021/acs.jcim.4c01732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules that target undruggable proteins, enhance selectivity and prevent target accumulation through catalytic activity. The unique structure of PROTACs presents challenges in structural identification and drug design. Liquid chromatography (LC), combined with mass spectrometry (MS), enhances compound annotation by providing essential retention time (RT) data, especially when MS alone is insufficient. However, predicting RT for PROTACs remains challenging. To address this, we compiled the PROTAC-RT data set from literature and evaluated the performance of four machine learning algorithms─extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbor (KNN) and support vector machines (SVM)─and a deep learning model, fully connected neural network (FCNN), using 24 molecular fingerprints and descriptors. Through screening combinations of molecular fingerprints, descriptors and chromatographic condition descriptors (CCs), we developed an optimized XGBoost model (XGBoost + moe206+Path + Charge + CCs) that achieved an R2 of 0.958 ± 0.027 and an RMSE of 0.934 ± 0.412. After hyperparameter tuning, the model's R2 improved to 0.963 ± 0.023, with an RMSE of 0.896 ± 0.374. The model showed strong predictive accuracy under new chromatographic separation conditions and was validated using six experimentally determined compounds. SHapley Additive exPlanations (SHAP) not only highlights the advantages of XGBoost but also emphasizes the importance of CCs and molecular features, such as bond variability, van der Waals surface area, and atomic charge states. The optimized XGBoost model combines moe206, path, charge descriptors, and CCs, providing a fast and precise method for predicting the RT of PROTACs compounds, thus facilitating their annotation.
Collapse
Affiliation(s)
- Xinhao Qu
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Chen Jiang
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
- Universal Identification Technology (Hangzhou) Co., Ltd., Hangzhou 311199, China
| | - Mengyi Shan
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Wenhao Ke
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Jing Chen
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, People's Republic of China
| | - Qiming Zhao
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Youhong Hu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Jia Liu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Lu-Ping Qin
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Gang Cheng
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| |
Collapse
|
21
|
Zhang B, Quan L, Zhang Z, Cao L, Chen Q, Peng L, Wang J, Jiang Y, Nie L, Li G, Wu T, Lyu Q. MVCL-DTI: Predicting Drug-Target Interactions Using a Multiview Contrastive Learning Model on a Heterogeneous Graph. J Chem Inf Model 2025; 65:1009-1026. [PMID: 39812134 DOI: 10.1021/acs.jcim.4c02073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Accurate prediction of drug-target interactions (DTIs) is pivotal for accelerating the processes of drug discovery and drug repurposing. MVCL-DTI, a novel model leveraging heterogeneous graphs for predicting DTIs, tackles the challenge of synthesizing information from varied biological subnetworks. It integrates neighbor view, meta-path view, and diffusion view to capture semantic features and employs an attention-based contrastive learning approach, along with a multiview attention-weighted fusion module, to effectively integrate and adaptively weight the information from the different views. Tested under various conditions on benchmark data sets, including varying positive-to-negative sample ratios, conducting hard negative sampling experiments, and masking known DTIs with different ratios, as well as redundant DTIs with various similarity metrics, MVCL-DTI exhibits strong robust generalization. The model is then employed to predict novel DTIs, with a particular focus on COVID-19-related drugs, highlighting its practical applicability. Ultimately, through features visualization and computational properties analysis, we've pinpointed critical elements, including Gene Ontology and substituent nodes, along with a proper initialization strategy, underscoring their vital role in DTI prediction tasks.
Collapse
Affiliation(s)
- Bei Zhang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- China Mobile (Suzhou) Software Technology Company Limited, Suzhou 215163, China
| | - Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Zhijun Zhang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lexin Cao
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Qiufeng Chen
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Liangchen Peng
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Junkai Wang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Liangpeng Nie
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Geng Li
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Tingfang Wu
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| |
Collapse
|
22
|
Zhang M, Deng Y, Zhou Q, Gao J, Zhang D, Pan X. Advancing micro-nano supramolecular assembly mechanisms of natural organic matter by machine learning for unveiling environmental geochemical processes. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2025; 27:24-45. [PMID: 39745028 DOI: 10.1039/d4em00662c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The nano-self-assembly of natural organic matter (NOM) profoundly influences the occurrence and fate of NOM and pollutants in large-scale complex environments. Machine learning (ML) offers a promising and robust tool for interpreting and predicting the processes, structures and environmental effects of NOM self-assembly. This review seeks to provide a tutorial-like compilation of data source determination, algorithm selection, model construction, interpretability analyses, applications and challenges for big-data-based ML aiming at elucidating NOM self-assembly mechanisms in environments. The results from advanced nano-submicron-scale spatial chemical analytical technologies are suggested as input data which provide the combined information of molecular interactions and structural visualization. The existing ML algorithms need to handle multi-scale and multi-modal data, necessitating the development of new algorithmic frameworks. Interpretable supervised models are crucial owing to their strong capacity of quantifying the structure-property-effect relationships and bridging the gap between simply data-driven ML and complicated NOM assembly practice. Then, the necessity and challenges are discussed and emphasized on adopting ML to understand the geochemical behaviors and bioavailability of pollutants as well as the elemental cycling processes in environments resulting from the NOM self-assembly patterns. Finally, a research framework integrating ML, experiments and theoretical simulation is proposed for comprehensively and efficiently understanding the NOM self-assembly-involved environmental issues.
Collapse
Affiliation(s)
- Ming Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Yihui Deng
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Qianwei Zhou
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, P. R. China
| | - Jing Gao
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Daoyong Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Xiangliang Pan
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| |
Collapse
|
23
|
Yamane F, Ikemura K, Kondo M, Ueno M, Okuda M. Identification of dequalinium as a potent inhibitor of human organic cation transporter 2 by machine learning based QSAR model. Sci Rep 2025; 15:2581. [PMID: 39833227 PMCID: PMC11746930 DOI: 10.1038/s41598-024-79377-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 11/08/2024] [Indexed: 01/22/2025] Open
Abstract
Human organic cation transporter 2 (hOCT2/SLC22A2) is a key drug transporter that facilitates the transport of endogenous and exogenous organic cations. Because hOCT2 is responsible for the development of adverse effects caused by platinum-based anti-cancer agents, drugs with OCT2 inhibitory effects may serve as prophylactic agents against the toxicity of platinum-based anti-cancer agents. In the present study, we established a machine learning-based quantitative structure-activity relationship (QSAR) model for hOCT2 inhibitors based on the public ChEMBL database and explored novel hOCT2 inhibitors among the FDA-approved drugs. Using our QSAR model, we identified 162 candidate hOCT2 inhibitors among the FDA-approved drugs registered in the DrugBank database. After manual selection and in vitro assays, we found that dequalinium, a quaternary ammonium cation antimicrobial agent, is a potent hOCT2 inhibitor (IC50 = 88.16 ± 7.14 nM). Moreover, dequalinium inhibited hOCT2-mediated transport of platinum anti-cancer agents (cisplatin and oxaliplatin) in a concentration-dependent manner. Our study is the first to demonstrate the construction of a novel machine learning-based QSAR model for hOCT2 inhibitors and identify a novel hOCT2 inhibitor among FDA-approved drugs using this model.
Collapse
Affiliation(s)
- Fumihiro Yamane
- Department of Hospital Pharmacy, School of Pharmaceutical Sciences, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Kenji Ikemura
- Department of Pharmacy, Osaka University Hospital, 2-15 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Masayoshi Kondo
- Department of Hospital Pharmacy, School of Pharmaceutical Sciences, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Manami Ueno
- Department of Hospital Pharmacy, School of Pharmaceutical Sciences, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Masahiro Okuda
- Department of Pharmacy, Osaka University Hospital, 2-15 Yamadaoka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
24
|
Nakajima H, Murata C, Noto N, Saito S. Database Construction for the Virtual Screening of the Ruthenium-Catalyzed Hydrogenation of Ketones. J Org Chem 2025; 90:1054-1060. [PMID: 39762115 DOI: 10.1021/acs.joc.4c02347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2025]
Abstract
During the recent development of machine-learning (ML) methods for organic synthesis, the value of "failed experiments" has increasingly been acknowledged. Accordingly, we have developed an exhaustive database comprising 300 entries of experimental data obtained by performing ruthenium-catalyzed hydrogenation reactions using 10 ketones as substrates and 30 phosphine ligands. After evaluating the predictive performance of ML models using the constructed database, we conducted a virtual screening of commercially available phosphine ligands. For the virtual screening, we utilized several models, such as histogram-based gradient boosting and Ridge regression, combined with the Mordred descriptors and MACCSKeys, respectively. The disclosed approach resulted in the identification of high-performance phosphine ligands, and the rationale behind the predictions in the virtual screening was analyzed using SHAP.
Collapse
Affiliation(s)
- Haruno Nakajima
- Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan
| | - Chihaya Murata
- Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan
| | - Naoki Noto
- Integrated Research Consortium on Chemical Sciences (IRCCS), Nagoya University, Nagoya 464-8602, Japan
| | - Susumu Saito
- Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan
- Integrated Research Consortium on Chemical Sciences (IRCCS), Nagoya University, Nagoya 464-8602, Japan
| |
Collapse
|
25
|
Haas BC, Hardy MA, Sowndarya S V S, Adams K, Coley CW, Paton RS, Sigman MS. Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines. DIGITAL DISCOVERY 2025; 4:222-233. [PMID: 39664609 PMCID: PMC11626426 DOI: 10.1039/d4dd00284a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 11/27/2024] [Indexed: 12/13/2024]
Abstract
Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development.
Collapse
Affiliation(s)
- Brittany C Haas
- Department of Chemistry, University of Utah Salt Lake City Utah 84112 USA
| | - Melissa A Hardy
- Department of Chemistry, University of Utah Salt Lake City Utah 84112 USA
| | - Shree Sowndarya S V
- Department of Chemistry, Colorado State University Fort Collins Colorado 80523 USA
| | - Keir Adams
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins Colorado 80523 USA
| | - Matthew S Sigman
- Department of Chemistry, University of Utah Salt Lake City Utah 84112 USA
| |
Collapse
|
26
|
Bornschlegl AJ, Duchstein P, Wu J, Rocha-Ortiz JS, Caicedo-Reina M, Ortiz A, Insuasty B, Zahn D, Lüer L, Brabec CJ. An Automated Workflow to Discover the Structure-Stability Relations for Radiation Hard Molecular Semiconductors. J Am Chem Soc 2025; 147:1957-1967. [PMID: 39752396 DOI: 10.1021/jacs.4c14824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Emerging photovoltaics for outer space applications are one of the many examples where radiation hard molecular semiconductors are essential. However, due to a lack of general design principles, their resilience against extra-terrestrial high-energy radiation can currently not be predicted. In this work, the discovery of radiation hard materials is accelerated by combining the strengths of high-throughput, lab automation and machine learning. This way, a large material library of more than 130 organic hole transport materials is automatically processed, degraded, and measured. The materials are degraded under ultraviolet-C (UVC) light in a nitrogen atmosphere, serving as the conditions for electromagnetic radiation hardness tests. A value closely related to the differential quantum yield for photodegradation is extracted from the evolution of the UV-visible (UV-vis) spectra over time and used as a stability target. Following this procedure, a stability ranking spanning over 3 orders of magnitude was obtained. Combining Gaussian Process Regression based on predictors from structural fingerprints and manual filtering of the materials by features, structure-stability relations for UVC stable materials could be found: Fused aromatic ring clusters are beneficial, whereas thiophene, methoxy and vinylene groups are detrimental. Comparing the UV-vis spectra of the degraded material in film and solution, bond cleavage could be made out as the leading degradation mechanism. Even though UVC light can in principle break most organic bonds, the stable materials are able to distribute and dissipate the energy well enough so that the chemical structures remain stable. The established predictive model quantifies the effect of specific molecular features on UVC stability, allowing chemists to consider UVC stability in their molecular design strategy. In the future, a larger data set will allow to inversely design molecular semiconductors which show high performance and radiation hardness at the same time.
Collapse
Affiliation(s)
- Andreas J Bornschlegl
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 7, 91058 Erlangen, Germany
| | - Patrick Duchstein
- Chair for Theoretical Chemistry/Computer Chemistry Center (CCC), Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Nägelsbachstraße 25, 91052 Erlangen, Germany
| | - Jianchang Wu
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 7, 91058 Erlangen, Germany
- Helmholtz-Institute Erlangen-Nürnberg for Renewable Energy (HI ERN), Forschungszentrum Jülich GmbH, Immerwahrstraße 2, 91058 Erlangen, Germany
| | - Juan S Rocha-Ortiz
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 7, 91058 Erlangen, Germany
- Helmholtz-Institute Erlangen-Nürnberg for Renewable Energy (HI ERN), Forschungszentrum Jülich GmbH, Immerwahrstraße 2, 91058 Erlangen, Germany
| | - Mauricio Caicedo-Reina
- Heterocyclic Compounds Research Group, Department of Chemistry, Universidad del Valle, Calle 13 #100-00, 25360 Cali, Colombia
| | - Alejandro Ortiz
- Heterocyclic Compounds Research Group, Department of Chemistry, Universidad del Valle, Calle 13 #100-00, 25360 Cali, Colombia
- Center for Research and Innovation in Bioinformatics and Photonics-CIBioFi, Calle 13 #100-00, Edificio E-20, No. 1069, 25360 Cali, Colombia
| | - Braulio Insuasty
- Heterocyclic Compounds Research Group, Department of Chemistry, Universidad del Valle, Calle 13 #100-00, 25360 Cali, Colombia
- Center for Research and Innovation in Bioinformatics and Photonics-CIBioFi, Calle 13 #100-00, Edificio E-20, No. 1069, 25360 Cali, Colombia
| | - Dirk Zahn
- Chair for Theoretical Chemistry/Computer Chemistry Center (CCC), Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Nägelsbachstraße 25, 91052 Erlangen, Germany
| | - Larry Lüer
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 7, 91058 Erlangen, Germany
- Helmholtz-Institute Erlangen-Nürnberg for Renewable Energy (HI ERN), Forschungszentrum Jülich GmbH, Immerwahrstraße 2, 91058 Erlangen, Germany
| | - Christoph J Brabec
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 7, 91058 Erlangen, Germany
- Helmholtz-Institute Erlangen-Nürnberg for Renewable Energy (HI ERN), Forschungszentrum Jülich GmbH, Immerwahrstraße 2, 91058 Erlangen, Germany
| |
Collapse
|
27
|
Zhu M, Xiao Z, Zhang T, Lu G. Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish. JOURNAL OF HAZARDOUS MATERIALS 2025; 482:136606. [PMID: 39579709 DOI: 10.1016/j.jhazmat.2024.136606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 11/25/2024]
Abstract
Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.
Collapse
Affiliation(s)
- Minghua Zhu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China
| | - Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Zhang
- State Key Laboratory of Urban Water Resources and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Guanghua Lu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China.
| |
Collapse
|
28
|
Baei B, Askari P, Askari FS, Kiani SJ, Mohebbi A. Pharmacophore modeling and QSAR analysis of anti-HBV flavonols. PLoS One 2025; 20:e0316765. [PMID: 39804828 PMCID: PMC11730388 DOI: 10.1371/journal.pone.0316765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 12/15/2024] [Indexed: 01/16/2025] Open
Abstract
Due to its global burden, Targeting Hepatitis B virus (HBV) infection in humans is crucial. Herbal medicine has long been significant, with flavonoids demonstrating promising results. Hence, the present study aimed to establish a way of identifying flavonoids with anti-HBV activities. Flavonoid structures with anti-HBV activities were retrieved. A flavonol-based pharmacophore model was established using LigandScout v4.4. Screening was performed using the PharmIt server. A QSAR equation was developed and validated with independent sets of compounds. The applicability domain (AD) was defined using Euclidean distance calculations for model validation. The best model, consisting of 57 features, was generated. High-throughput screening (HTS) using the flavonol-based model resulted in 509 unique hits. The model's accuracy was further validated using a set of FDA-approved chemicals, demonstrating a sensitivity of 71% and a specificity of 100%. Additionally, the QSAR model with two predictors, x4a and qed, exhibited predictive solid performance with an adjusted-R2 value of 0.85 and 0.90 of Q2. PCA showed essential patterns and relationships within the dataset, with the first two components explaining nearly 98% of the total variance. Current HBV therapies tend to fail to provide a complete cure, emphasizing the need for new therapies. This study's importance was to highlight flavonols as potential anti-HBV medicines, presenting a supplementary option for existing therapy. The QSAR model has been validated with two separate chemical sets, guaranteeing its reproducibility and usefulness for other flavonols by utilizing the predictive characteristics of X4A and qed. These results provide new possibilities for discovering future anti-HBV drugs by integrating modeling and experimental research.
Collapse
Affiliation(s)
- Basireh Baei
- Infectious Disease Research Center, Golestan University of Medical Sciences, Gorgan, Iran
| | - Parnia Askari
- Department of Life and Science, York University, Toronto, Ontario, Canada
| | | | - Seyed Jalal Kiani
- Department of Virology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Alireza Mohebbi
- Vista Aria Rena Gene Inc., Gorgan, Golestan, Iran
- Department of Virology, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
29
|
Bediaga-Bañeres H, Moreno-Benítez I, Arrasate S, Pérez-Álvarez L, Halder AK, Cordeiro MNDS, González-Díaz H, Vilas-Vilela JL. Artificial Intelligence-Driven Modeling for Hydrogel Three-Dimensional Printing: Computational and Experimental Cases of Study. Polymers (Basel) 2025; 17:121. [PMID: 39795524 PMCID: PMC11723248 DOI: 10.3390/polym17010121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 12/26/2024] [Accepted: 12/29/2024] [Indexed: 01/13/2025] Open
Abstract
Determining the values of various properties for new bio-inks for 3D printing is a very important task in the design of new materials. For this purpose, a large number of experimental works have been consulted, and a database with more than 1200 bioprinting tests has been created. These tests cover different combinations of conditions in terms of print pressure, temperature, and needle values, for example. These data are difficult to deal with in terms of determining combinations of conditions to optimize the tests and analyze new options. The best model demonstrated a specificity (Sp) of 88.4% and a sensitivity (Sn) of 86.2% in the training series while achieving an Sp of 85.9% and an Sn of 80.3% in the external validation series. This model utilizes operators based on perturbation theory to analyze the complexity of the data. For comparative purposes, neural networks have been used, and very similar results have been obtained. The developed tool could easily be applied to predict the properties of bioprinting assays in silico. These findings could significantly improve the efficiency and accuracy of predictive models in bioprinting without resorting to trial-and-error tests, thereby saving time and funds. Ultimately, this tool may help pave the way for advances in personalized medicine and tissue engineering.
Collapse
Affiliation(s)
- Harbil Bediaga-Bañeres
- Department of Physical Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (L.P.-Á.)
| | - Isabel Moreno-Benítez
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (S.A.); (H.G.-D.)
| | - Sonia Arrasate
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (S.A.); (H.G.-D.)
| | - Leyre Pérez-Álvarez
- Department of Physical Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (L.P.-Á.)
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, 48940 Leioa, Spain
| | - Amit K. Halder
- LAQV-REQUIMTE, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (M.N.D.S.C.)
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Durgapur 713206, India
| | - M. Natalia D. S. Cordeiro
- LAQV-REQUIMTE, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (M.N.D.S.C.)
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (S.A.); (H.G.-D.)
- Basque Center for Biophysics, CSIC-UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| | - José Luis Vilas-Vilela
- Department of Physical Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (L.P.-Á.)
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, 48940 Leioa, Spain
| |
Collapse
|
30
|
Murray JD, Bennett-Lenane H, O’Dwyer PJ, Griffin BT. Establishing a Pharmacoinformatics Repository of Approved Medicines: A Database to Support Drug Product Development. Mol Pharm 2025; 22:408-423. [PMID: 39705554 PMCID: PMC11707741 DOI: 10.1021/acs.molpharmaceut.4c00991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 12/09/2024] [Accepted: 12/10/2024] [Indexed: 12/22/2024]
Abstract
Advanced predictive modeling approaches have harnessed data to fuel important innovations at all stages of drug development. However, the need for a machine-readable drug product library which consolidates many aspects of formulation design and performance remains largely unmet. This study presents a scripted, reproducible approach to database curation and explores its potential to streamline oral medicine development. The Product Information files for all centrally authorized drug products containing a small molecule active ingredient were retrieved programmatically from the European Medicines Agency Web site. Text processing isolated relevant information, including the maximum clinical dose, dosage form, route of administration, excipients, and pharmacokinetic performance. Chemical and bioactivity data were integrated through automated linking to external curated databases. The capability of this database to inform oral medicine development was assessed in the context of drug-likeness evaluation, excipient selection, and prediction of oral fraction absorbed. Existing filters of drug-likeness, such as the Rule of Five, were found to poorly capture the chemical space of marketed oral drug products. Association rule learning identified frequent patterns in tablet formulation compositions that can be used to establish excipient combinations that have seen clinical success. Binary prediction models of oral fraction absorbed constructed exclusively from regulatory data achieved acceptable performance (balanced accuracytest = 0.725), demonstrating its modelability and potential for use during early stage molecule prioritization tasks. This study illustrates the impact of highly linked drug product data in accelerating clinical translation and underlines the ongoing need for accuracy and completeness of data reported in the regulatory datasphere.
Collapse
Affiliation(s)
- Jack D. Murray
- School of Pharmacy, University
College Cork, College Road, Cork T12
K8AF, Ireland
| | | | - Patrick J. O’Dwyer
- School of Pharmacy, University
College Cork, College Road, Cork T12
K8AF, Ireland
| | - Brendan T. Griffin
- School of Pharmacy, University
College Cork, College Road, Cork T12
K8AF, Ireland
| |
Collapse
|
31
|
Haas BC, Kalyani D, Sigman MS. Applying statistical modeling strategies to sparse datasets in synthetic chemistry. SCIENCE ADVANCES 2025; 11:eadt3013. [PMID: 39742471 DOI: 10.1126/sciadv.adt3013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 11/20/2024] [Indexed: 01/03/2025]
Abstract
The application of statistical modeling in organic chemistry is emerging as a standard practice for probing structure-activity relationships and as a predictive tool for many optimization objectives. This review is aimed as a tutorial for those entering the area of statistical modeling in chemistry. We provide case studies to highlight the considerations and approaches that can be used to successfully analyze datasets in low data regimes, a common situation encountered given the experimental demands of organic chemistry. Statistical modeling hinges on the data (what is being modeled), descriptors (how data are represented), and algorithms (how data are modeled). Herein, we focus on how various reaction outputs (e.g., yield, rate, selectivity, solubility, stability, and turnover number) and data structures (e.g., binned, heavily skewed, and distributed) influence the choice of algorithm used for constructing predictive and chemically insightful statistical models.
Collapse
Affiliation(s)
- Brittany C Haas
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| | | | - Matthew S Sigman
- Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
32
|
Kim H, Shim H, Ranganath A, He S, Stevenson G, Allen JE. Protein-ligand binding affinity prediction using multi-instance learning with docking structures. Front Pharmacol 2025; 15:1518875. [PMID: 39830331 PMCID: PMC11738626 DOI: 10.3389/fphar.2024.1518875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 11/26/2024] [Indexed: 01/22/2025] Open
Abstract
Introduction Recent advances in 3D structure-based deep learning approaches demonstrate improved accuracy in predicting protein-ligand binding affinity in drug discovery. These methods complement physics-based computational modeling such as molecular docking for virtual high-throughput screening. Despite recent advances and improved predictive performance, most methods in this category primarily rely on utilizing co-crystal complex structures and experimentally measured binding affinities as both input and output data for model training. Nevertheless, co-crystal complex structures are not readily available and the inaccurate predicted structures from molecular docking can degrade the accuracy of the machine learning methods. Methods We introduce a novel structure-based inference method utilizing multiple molecular docking poses for each complex entity. Our proposed method employs multi-instance learning with an attention network to predict binding affinity from a collection of docking poses. Results We validate our method using multiple datasets, including PDBbind and compounds targeting the main protease of SARS-CoV-2. The results demonstrate that our method leveraging docking poses is competitive with other state-of-the-art inference models that depend on co-crystal structures. Discussion This method offers binding affinity prediction without requiring co-crystal structures, thereby increasing its applicability to protein targets lacking such data.
Collapse
Affiliation(s)
- Hyojin Kim
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Heesung Shim
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Aditya Ranganath
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Stewart He
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Garrett Stevenson
- Computational Engineering Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Jonathan E. Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
| |
Collapse
|
33
|
Saylor DM, Elder RM, Duelge K, Ranasinghe Arachchige NPR, Simon DD, Wickramasekara S, Young JA. Inter-laboratory study for extraction testing of medical devices. J Pharm Biomed Anal 2025; 252:116496. [PMID: 39405789 DOI: 10.1016/j.jpba.2024.116496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 11/07/2024]
Abstract
Biocompatibility evaluation of medical devices often relies on chemical testing according to ISO 10993-18 as a critical component for consideration. However, the precision associated with these non-targeted chemical characterization assessments has not been well established. Therefore, we have conducted a study to characterize intra-laboratory (repeatability) and inter-laboratory (reproducibility) variability associated with chemical testing of extractables from polymeric materials. To accomplish this, this study focused on two polymers, each with nine chemicals that were intentionally compounded into the materials. Eight different laboratories performed extraction testing in two solvents and subsequently characterized the extracts using gas chromatography and liquid chromatography methods. Analysis of the resulting data revealed the central 90 % range for the repeatability and reproducibility relative standard deviations are (0.09, 0.22) and (0.30, 0.85), respectively, for the participating laboratory methods. This finding implies that if the same sample was tested by two different laboratories using the same extraction conditions, there is 95 % confidence for 95 % of systems that the test results could exhibit differences up to 240 %. While the study was not designed to evaluate the relative impact of specific underlying factors that may contribute to variability in quantitation, the data obtained suggest the variability associated with analytical method alone is a substantial contribution to the overall variability. The relatively large reproducibility limits we observed may have significant implications where variability in extraction measurements can impact aspects of biocompatibility risk evaluation, such as exposure dose estimation and chemical equivalence assessments.
Collapse
Affiliation(s)
- David M Saylor
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States.
| | - Robert M Elder
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | - Kaleb Duelge
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | | | - David D Simon
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | | | - Joshua A Young
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| |
Collapse
|
34
|
Pang X, He X, Yang Y, Wang L, Sun Y, Cao H, Liang Y. NeuTox 2.0: A hybrid deep learning architecture for screening potential neurotoxicity of chemicals based on multimodal feature fusion. ENVIRONMENT INTERNATIONAL 2025; 195:109244. [PMID: 39742830 DOI: 10.1016/j.envint.2024.109244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 12/09/2024] [Accepted: 12/25/2024] [Indexed: 01/04/2025]
Abstract
Chemically induced neurotoxicity is a critical aspect of chemical safety assessment. Traditional and costly experimental methods call for the development of high-throughput virtual screening. However, the small datasets of neurotoxicity have limited the application of advanced deep learning techniques. The current study developed a hybrid deep learning architecture, NeuTox 2.0, through multimodal feature fusion for enhanced prediction accuracy and generalization ability. We incorporated transfer learning based on self-supervised learning, graph neural networks, and molecular fingerprints/descriptors. Four datasets were used to profile neurotoxicity; these were related to blood-brain barrier permeability, neuronal cytotoxicity, microelectrode array-based neural activity, and mammalian neurotoxicity. Comprehensive performance evaluations demonstrated that NeuTox 2.0 has relatively higher predictive capability across all statistical metrics. Specifically, NeuTox 2.0 exhibits remarkable performance in three of the four datasets. In the BBB dataset, although it does not outperform the PaDEL descriptor model, its performance closely approximates that of the top single-modal model. The ablation experiments indicated that NeuTox 2.0 can learn the deeper structural differences of molecules from various feature extractions and capture complex interactions and mapping relationships between various modalities, thereby improving performance for neurotoxicity prediction. Evaluations of anti-noise ability indicated that NeuTox 2.0 has excellent noise resistance relative to traditional machine learning. We applied the NeuTox 2.0 model to predict the neurotoxicity of 315,790 compounds in the REACH database. The results showed that 701 compounds exhibited potential neurotoxicity in the four neurotoxicity-related predictions. In conclusion, NeuTox 2.0 can be used as an efficient tool for early neurotoxicity screening of environmental chemicals.
Collapse
Affiliation(s)
- Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xuejun He
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
35
|
Arora S, Mittal A, Duari S, Chauhan S, Dixit NK, Mohanty SK, Sharma A, Solanki S, Sharma AK, Gautam V, Gahlot PS, Satija S, Nanshi J, Kapoor N, Cb L, Sengupta D, Mehrotra P, Ghosh TS, Ahuja G. Discovering geroprotectors through the explainable artificial intelligence-based platform AgeXtend. NATURE AGING 2025; 5:144-161. [PMID: 39627462 DOI: 10.1038/s43587-024-00763-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 10/25/2024] [Indexed: 01/24/2025]
Abstract
Aging involves metabolic changes that lead to reduced cellular fitness, yet the role of many metabolites in aging is unclear. Understanding the mechanisms of known geroprotective molecules reveals insights into metabolic networks regulating aging and aids in identifying additional geroprotectors. Here we present AgeXtend, an artificial intelligence (AI)-based multimodal geroprotector prediction platform that leverages bioactivity data of known geroprotectors. AgeXtend encompasses modules that predict geroprotective potential, assess toxicity and identify target proteins and potential mechanisms. We found that AgeXtend accurately identified the pro-longevity effects of known geroprotectors excluded from training data, such as metformin and taurine. Using AgeXtend, we screened ~1.1 billion compounds and identified numerous potential geroprotectors, which we validated using yeast and Caenorhabditis elegans lifespan assays, as well as exploring microbiome-derived metabolites. Finally, we evaluated endogenous metabolites predicted as senomodulators using senescence assays in human fibroblasts, highlighting AgeXtend's potential to reveal unidentified geroprotectors and provide insights into aging mechanisms.
Collapse
Affiliation(s)
- Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Subhadeep Duari
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Sonam Chauhan
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Nilesh Kumar Dixit
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Arushi Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Saveena Solanki
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Anmol Kumar Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Pushpendra Singh Gahlot
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Shiva Satija
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Jeet Nanshi
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Nikita Kapoor
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Lavanya Cb
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Parul Mehrotra
- Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi, India
| | - Tarini Shankar Ghosh
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi, India.
| |
Collapse
|
36
|
Caniceiro AB, Orzeł U, Rosário-Ferreira N, Filipek S, Moreira IS. Leveraging Artificial Intelligence in GPCR Activation Studies: Computational Prediction Methods as Key Drivers of Knowledge. Methods Mol Biol 2025; 2870:183-220. [PMID: 39543036 DOI: 10.1007/978-1-0716-4213-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
G protein-coupled receptors (GPCRs) are key molecules involved in cellular signaling and are attractive targets for pharmacological intervention. This chapter is designed to explore the range of algorithms used to predict GPCRs' activation states, while also examining the pharmaceutical implications of these predictions. Our primary objective is to show how artificial intelligence (AI) is key in GPCR research to reveal the intricate dynamics of activation and inactivation processes, shedding light on the complex regulatory mechanisms of this vital protein family. We describe several computational strategies that leverage diverse structural data from the Protein Data Bank, molecular dynamic simulations, or ligand-based methods to predict the activation states of GPCRs. We demonstrate how the integration of AI into GPCR research not only enhances our understanding of their dynamic properties but also presents immense potential for driving pharmaceutical research and development, offering promising new avenues in the search for newer, better therapeutic agents.
Collapse
Affiliation(s)
- Ana B Caniceiro
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Urszula Orzeł
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Nícia Rosário-Ferreira
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Sławomir Filipek
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal.
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal.
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
37
|
Han Z, Xia Z, Xia J, Tetko IV, Wu S. The state-of-the-art machine learning model for plasma protein binding prediction: Computational modeling with OCHEM and experimental validation. Eur J Pharm Sci 2025; 204:106946. [PMID: 39490636 DOI: 10.1016/j.ejps.2024.106946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds.
Collapse
Affiliation(s)
- Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhonghua Xia
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany.
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| |
Collapse
|
38
|
Sobańska AW, Orlikowska A, Famulska K, Bošnjak L, Bosiljevac D, Rasztawicka A, Sobański AM. Systematic Study of Steroid Drugs' Ability to Cross Biomembranes-The Possible Environmental Impact and Health Risks Associated with Exposure During Pregnancy. MEMBRANES 2024; 15:4. [PMID: 39852245 PMCID: PMC11766822 DOI: 10.3390/membranes15010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/24/2024] [Accepted: 12/25/2024] [Indexed: 01/26/2025]
Abstract
Thirty-seven steroid drugs of different types were investigated in silico for their environmental and pharmacokinetic properties (partition between soil and water, bioaccumulation in aquatic organisms, ability to be absorbed from the gastrointestinal tract and to cross biological barriers-skin, blood-brain barrier and placenta) using on-line tools and novel QSAR models. The same drugs were studied by Molecular Docking in the context of their ability to interact with two enzymes-glutathione S-transferase (GST) and human N-acetyltransferase 2 (NAT2), which are involved in the placenta's protective system against harmful xenobiotics. Steroid drugs are released to the environment from households, hospitals, manufacturing plants and farms (e.g., with natural fertilizers) and they can affect the aquatic life (reproduction and development of aquatic organisms), even at sub-ng/L concentrations. It was established that the majority of studied drugs are mobile in soil, so they may reach surface waters far from point of discharge, e.g., from farming; however, only a few of them are likely to bioaccumulate. All of them can be absorbed orally or through skin, and they are also expected to cross the placenta. Over 30% of studied compounds are likely to pass through the blood-brain barrier (although five compounds in this group are likely P-gp substrates, which may reduce their activity in the central nervous systems); they have also very high affinity for both studied enzymes.
Collapse
Affiliation(s)
- Anna W. Sobańska
- Department of Analytical Chemistry, Faculty of Pharmacy, Medical University of Lodz, 90-151 Lodz, Poland
| | - Aleksandra Orlikowska
- Faculty of Pharmacy, Medical University of Lodz, 90-151 Lodz, Poland; (A.O.); (K.F.); (A.R.)
| | - Karolina Famulska
- Faculty of Pharmacy, Medical University of Lodz, 90-151 Lodz, Poland; (A.O.); (K.F.); (A.R.)
| | - Lovro Bošnjak
- Faculty of Pharmacy and Biochemistry, University of Zagrzeb, 10000 Zagreb, Croatia; (L.B.); (D.B.)
| | - Domagoj Bosiljevac
- Faculty of Pharmacy and Biochemistry, University of Zagrzeb, 10000 Zagreb, Croatia; (L.B.); (D.B.)
| | - Aleksandra Rasztawicka
- Faculty of Pharmacy, Medical University of Lodz, 90-151 Lodz, Poland; (A.O.); (K.F.); (A.R.)
| | | |
Collapse
|
39
|
Ramahi ADA, Shinde VV, Pearce TC, Sinka IC. Virtual screening of drug materials for pharmaceutical tablet manufacturability with reference to sticking. Int J Pharm 2024; 667:124722. [PMID: 39293578 DOI: 10.1016/j.ijpharm.2024.124722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/12/2024] [Accepted: 09/13/2024] [Indexed: 09/20/2024]
Abstract
The manufacturing of pharmaceutical solid dosage forms, such as tablets involves a large number of successive processing operations including crystallisation of the drug substance, granulation, drying, milling, mixing of the formulation, and compaction. Each step is fraught with manufacturing problems. Undesired adhesion of powders to the surface of the compaction tooling, known as sticking, is a frequent and highly disruptive problem that occurs at the very end of the process chain when the tablet is formed. As alternatives to the mechanistic approaches to address sticking, we introduce two different machine learning strategies to predict sticking directly from the chemical formula of the drug substance, represented by molecular descriptors. An empirical database for sticking behaviour was developed and used to train the machine learning (ML) algorithms to predict sticking characteristics from molecular descriptors. The ML model has successfully classified sticking/non-sticking behaviour of powders with 100% separation. Predictions were made for materials in the Handbook of Pharmaceutical Excipients and a subset of molecules included in the ChemBL database, demonstrating the potential use of machine learning approaches to screen for sticking propensity early during drug discovery and development. This is the first time molecular descriptors and machine learning are used to predict and screen for sticking behaviour. The method has potential to transform the development of medicines by providing manufacturability information at the drug screening stage and is potentially applicable to other manufacturing problems controlled by the chemistry of the drug substance.
Collapse
|
40
|
Soares R, Azevedo L, Vasconcelos V, Pratas D, Sousa SF, Carneiro J. Machine Learning-Driven Discovery and Database of Cyanobacteria Bioactive Compounds: A Resource for Therapeutics and Bioremediation. J Chem Inf Model 2024; 64:9576-9593. [PMID: 39602490 DOI: 10.1021/acs.jcim.4c00995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Cyanobacteria strains have the potential to produce bioactive compounds that can be used in therapeutics and bioremediation. Therefore, compiling all information about these compounds to consider their value as bioresources for industrial and research applications is essential. In this study, a searchable, updated, curated, and downloadable database of cyanobacteria bioactive compounds was designed, along with a machine-learning model to predict the compounds' targets of newly discovered molecules. A Python programming protocol obtained 3431 cyanobacteria bioactive compounds, 373 unique protein targets, and 3027 molecular descriptors. PaDEL-descriptor, Mordred, and Drugtax software were used to calculate the chemical descriptors for each bioactive compound database record. The biochemical descriptors were then used to determine the most promising protein targets for human therapeutic approaches and environmental bioremediation using the best machine learning (ML) model. The creation of our database, coupled with the integration of computational docking protocols, represents an innovative approach to understanding the potential of cyanobacteria bioactive compounds. This resource, adhering to the findability, accessibility, interoperability, and reuse of digital assets (FAIR) principles, is an excellent tool for pharmaceutical and bioremediation researchers. Moreover, its capacity to facilitate the exploration of specific compounds' interactions with environmental pollutants is a significant advancement, aligning with the increasing reliance on data science and machine learning to address environmental challenges. This study is a notable step forward in leveraging cyanobacteria for both therapeutic and ecological sustainability.
Collapse
Affiliation(s)
- Renato Soares
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, Porto 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre s/n, Porto 4169-007, Portugal
- LAQV/REQUIMTE, BioSIM - Department of Biomedicine, Faculty of Medicine, University of Porto, Porto 4200-319, Portugal
| | - Luísa Azevedo
- UMIB-Unit for Multidisciplinary Research in Biomedicine, ICBAS - School of Medicine and Biomedical Sciences, University of Porto, Porto 4050-313, Portugal
- ITR - Laboratory for Integrative and Translational Research in Population Health, Porto 4050-313, Portugal
| | - Vitor Vasconcelos
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, Porto 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre s/n, Porto 4169-007, Portugal
| | - Diogo Pratas
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro 3810-193, Portugal
- DETI, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro 3810-193, Portugal
- DoV, Department of Virology, University of Helsinki, Helsinki 00100, Finland
| | - Sérgio F Sousa
- LAQV/REQUIMTE, BioSIM - Department of Biomedicine, Faculty of Medicine, University of Porto, Porto 4200-319, Portugal
| | - João Carneiro
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, Porto 4450-208, Portugal
| |
Collapse
|
41
|
Cheng S, Zhang Q, Min H, Jiang W, Liu J, Liu C, Wang Z. Development of a Predictive Model for N-Dealkylation of Amine Contaminants Based on Machine Learning Methods. TOXICS 2024; 12:931. [PMID: 39771146 PMCID: PMC11728645 DOI: 10.3390/toxics12120931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/14/2024] [Accepted: 12/20/2024] [Indexed: 01/16/2025]
Abstract
Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation of emerging amine contaminants. Machine learning has been widely used to identify sources of environmental pollutants and predict their toxicity. However, its application in screening critical biotransformation pathways for organic pollutants has been rarely reported. In this study, we first constructed a large dataset comprising 286 emerging amine pollutants through a thorough search of databases and literature. Then, we applied four machine learning methods-random forest, gradient boosting decision tree, extreme gradient boosting, and multi-layer perceptron-to develop binary classification models for N-dealkylation. These models were based on seven carefully selected molecular descriptors that represent reactivity-fit and structural-fit. Among the predictive models, the extreme gradient boosting shows the highest prediction accuracy of 81.0%. The SlogP_VSA2 descriptor is the primary factor influencing predictions of N-dealkylation metabolism. Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms, whose performance is generally better than any single algorithm, with an accuracy rate of 86.2%. Therefore, the classification model developed in this work can provide methodological support for the high-throughput screening of N-dealkylation of amine pollutants.
Collapse
Affiliation(s)
- Shiyang Cheng
- School of Envronment and Spatial Informatics, China University of Mining and Technology, XuZhou 221116, China; (S.C.); (Q.Z.); (H.M.); (W.J.)
| | - Qihang Zhang
- School of Envronment and Spatial Informatics, China University of Mining and Technology, XuZhou 221116, China; (S.C.); (Q.Z.); (H.M.); (W.J.)
| | - Hao Min
- School of Envronment and Spatial Informatics, China University of Mining and Technology, XuZhou 221116, China; (S.C.); (Q.Z.); (H.M.); (W.J.)
| | - Wenhui Jiang
- School of Envronment and Spatial Informatics, China University of Mining and Technology, XuZhou 221116, China; (S.C.); (Q.Z.); (H.M.); (W.J.)
| | - Jueting Liu
- School of Computer Science and Technology, China University of Mining and Technology, XuZhou 221116, China
| | - Chunsheng Liu
- School of Environmental Studies, China University of Geosciences, Wuhan 430079, China;
| | - Zehua Wang
- The Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada;
| |
Collapse
|
42
|
Liu J, Peeples J, Sayes CM. Evaluation of Machine Learning Based QSAR Models for the Classification of Lung Surfactant Inhibitors. ENVIRONMENT & HEALTH (WASHINGTON, D.C.) 2024; 2:912-917. [PMID: 39722839 PMCID: PMC11667287 DOI: 10.1021/envhealth.4c00118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 08/22/2024] [Accepted: 08/26/2024] [Indexed: 12/28/2024]
Abstract
Inhaled chemicals can cause dysfunction in the lung surfactant, a protein-lipid complex with critical biophysical and biochemical functions. This inhibition has many structure-related and dose-dependent mechanisms, making hazard identification challenging. We developed quantitative structure-activity relationships for predicting lung surfactant inhibition using machine learning. Logistic regression, support vector machines, random forest, gradient-boosted trees, prior-data-fitted networks, and multilayer perceptron were evaluated as methods. Multilayer perceptron had the strongest performance with 96% accuracy and an F1 score of 0.97. Support vector machines and logistic regression also performed well with lower computation costs. This serves as a proof-of-concept for efficient hazard screening in the emerging area of lung surfactant inhibition.
Collapse
Affiliation(s)
- James
Y. Liu
- Department
of Environmental Science, Baylor University, Waco, Texas 76798-7266, United States
| | - Joshua Peeples
- Department
of Electrical & Computer Engineering, Texas A&M University, College
Station, Texas 77845, United States
| | - Christie M. Sayes
- Department
of Environmental Science, Baylor University, Waco, Texas 76798-7266, United States
| |
Collapse
|
43
|
Selvam S, Balaji PD, Sohn H, Madhavan T. AISMPred: A Machine Learning Approach for Predicting Anti-Inflammatory Small Molecules. Pharmaceuticals (Basel) 2024; 17:1693. [PMID: 39770535 PMCID: PMC11676721 DOI: 10.3390/ph17121693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 11/21/2024] [Accepted: 12/04/2024] [Indexed: 01/11/2025] Open
Abstract
Background/Objectives: Inflammation serves as a vital response to diverse harmful stimuli like infections, toxins, or tissue injuries, aiding in the elimination of pathogens and tissue repair. However, persistent inflammation can lead to chronic diseases. Peptide therapeutics have gained attention for their specificity in targeting cells, yet their development remains costly and time-consuming. Therefore, small molecules, with their stability, low immunogenicity, and oral bioavailability, have become a focal point for predicting anti-inflammatory small molecules (AISMs). Methods: In this study, we introduce a computational method called AISMPred, designed to classify AISMs and non-AISMs. To develop this approach, we constructed a dataset comprising 1750 AISMs and non-AISMs, each annotated with IC50 values sourced from the PubChem BioAssay database. We computed two distinct types of molecular descriptors using PaDEL and Mordred tools. Subsequently, these descriptors were concatenated to form a hybrid feature set. The SVC-L1 regularization method was implemented for the optimum feature selection to develop robust Machine learning (ML) models. Five different conventional ML classifiers were employed, such as RF, ET, KNN, LR, and Ensemble methods. Results: A total of 15 ML models were developed using 2D, FP, and Hybrid feature sets, with the ET model with hybrid features achieving the highest accuracy of 92% and an AUC of 0.97 on the independent test dataset. Conclusions: This study provides an effective method for screening AISMs, potentially impacting drug discovery and design.
Collapse
Affiliation(s)
- Subathra Selvam
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.S.); (P.D.B.)
| | - Priya Dharshini Balaji
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.S.); (P.D.B.)
| | - Honglae Sohn
- Department of Chemistry, Chosun University, Gwangju 501-759, Republic of Korea
| | - Thirumurthy Madhavan
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.S.); (P.D.B.)
| |
Collapse
|
44
|
Sobańska AW, Sobański AM. Organic Sunscreens-Is Their Placenta Permeability the Only Issue Associated with Exposure During Pregnancy? In Silico Studies of Sunscreens' Placenta Permeability and Interactions with Selected Placental Enzymes. Molecules 2024; 29:5836. [PMID: 39769924 PMCID: PMC11728689 DOI: 10.3390/molecules29245836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 12/08/2024] [Accepted: 12/09/2024] [Indexed: 01/16/2025] Open
Abstract
One of the functions of placenta is to protect the fetus against harmful xenobiotics. Protective mechanisms of placenta are based on enzymes, e.g., antioxidant enzymes from the glutathione S-transferases group (GST) or human N-acetyltransferase 2 (NAT2). Many organic sunscreens are known to cross biological barriers-they are detected in mother's milk, semen, umbilical cord blood or placental tissues. Some organic sunscreens are able to cross the placenta and to interfere with fetal development; they are known or suspected endocrine disruptors or neurotoxins. In this study, 16 organic sunscreens were investigated in the context of their placenta permeability and interactions with gluthatione S-transferase and human N-acetyltransferase 2 enzymes present in the human placenta. Binary permeability models based on discriminant analysis and artificial neural networks proved that the majority of studied compounds are likely to cross the placenta by passive diffusion. Molecular docking analysis suggested that some sunscreens show stronger affinity for glutathione S-transferase and human N-acetyltransferase 2 that native ligands (glutathione and Coenzyme A for GST and NAT2, respectively)-it is therefore possible that they are able to reduce the enzyme's protective activity. It was established that sunscreens bind to the studied enzymes mainly by alkyl, hydrogen bonds, van der Waals, π-π, π-alkyl and π-sulfur interactions. To conclude, sunscreens may become stressors affecting humans by different mechanisms and at different stages of development.
Collapse
Affiliation(s)
- Anna W. Sobańska
- Department of Analytical Chemistry, Medical University of Lodz, Muszyńskiego 1, 90-151 Lodz, Poland
| | | |
Collapse
|
45
|
Kawagoe R, Ando T, Matsuzawa NN, Maeshima H, Kaneko H. Exploring Molecular Descriptors and Acquisition Functions in Bayesian Optimization for Designing Molecules with Low Hole Reorganization Energy. ACS OMEGA 2024; 9:48844-48854. [PMID: 39676955 PMCID: PMC11635491 DOI: 10.1021/acsomega.4c09124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 11/19/2024] [Accepted: 11/22/2024] [Indexed: 12/17/2024]
Abstract
Organic semiconductors have been widely studied owing to their potential applications in various devices, such as field-effect transistors, light-emitting diodes, solar cells, and image sensors. However, they have a limitation of significantly lower carrier mobility compared to silicon, which is a widely used inorganic semiconductor. Therefore, to address such limitations, these molecules should be further explored. Hole reorganization energy has been known to influence carrier mobility; that is, lower energy results in higher mobility. This study uses Bayesian optimization (BO) to identify molecules with low hole reorganization energies. While several acquisition functions (AFs), including probability of improvement, expected improvement, and mutual information, have been proposed for use in BO, it is well established that the performance of AFs can vary depending on the data set. We evaluate the performance of AFs applied to a data set of organic semiconductor molecules and propose a novel approach that alternates the use of AFs in the BO process. Our findings conclude that alternating AFs in BO enhance the stability of the search for molecules with low reorganization energy.
Collapse
Affiliation(s)
- Rinta Kawagoe
- Department
of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| | - Tatsuhito Ando
- Engineering
Division, Panasonic Industry Co., Ltd., Kadoma, Osaka 571-8506, Japan
| | | | - Hiroyuki Maeshima
- Engineering
Division, Panasonic Industry Co., Ltd., Kadoma, Osaka 571-8506, Japan
| | - Hiromasa Kaneko
- Department
of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
46
|
Mohammed I, Sagurthi SR. Current Approaches and Strategies Applied in First-in-class Drug Discovery. ChemMedChem 2024:e202400639. [PMID: 39648151 DOI: 10.1002/cmdc.202400639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 11/30/2024] [Accepted: 12/05/2024] [Indexed: 12/10/2024]
Abstract
First-in-class drug discovery (FICDD) offers novel therapies, new biological targets and mechanisms of action (MOAs) toward targeting various diseases and provides opportunities to understand unexplored biology and to target unmet diseases. Current screening approaches followed in FICDD for discovery of hit and lead molecules can be broadly categorized and discussed under phenotypic drug discovery (PDD) and target-based drug discovery (TBDD). Each category has been further classified and described with suitable examples from the literature outlining the current trends in screening approaches applied in small molecule drug discovery (SMDD). Similarly, recent applications of functional genomics, structural biology, artificial intelligence (AI), machine learning (ML), and other such advanced approaches in FICDD have also been highlighted in the article. Further, some of the current medicinal chemistry strategies applied during discovery of hits and optimization studies such as hit-to-lead (HTL) and lead optimization (LO) have been simultaneously overviewed in this article.
Collapse
Affiliation(s)
- Idrees Mohammed
- Drug Design & Molecular Medicine Laboratory, Department of Genetics & Biotechnology, Osmania University, Hyderabad, 500007, Telangana, India
| | - Someswar Rao Sagurthi
- Drug Design & Molecular Medicine Laboratory, Department of Genetics & Biotechnology, Osmania University, Hyderabad, 500007, Telangana, India
- Special Center for Molecular Medicine, Jawaharlal Nehru University, New Delhi, 110067, India
| |
Collapse
|
47
|
Kim JY, Khan SA, Vlachos DG. Similarity-Based Machine Learning for Small Data Sets: Predicting Biolubricant Base Oil Viscosities. J Phys Chem B 2024; 128:11963-11970. [PMID: 39579140 DOI: 10.1021/acs.jpcb.4c06687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2024]
Abstract
Machine learning (ML) has been successfully applied to learn patterns in experimental chemical data to predict molecular properties. However, experimental data can be time-consuming and expensive to obtain and, as a result, it is often scarce. Several ML methods face challenges when trained with limited data. Here, we introduce a similarity-based machine learning approach that enables precise model training on small data sets while requiring fewer features and enhancing prediction accuracy. We group molecules with similar structures, represented by molecular fingerprints, and use these groups to train separate ML models for each group. We first validate our method on larger data sets of dynamic viscosity and aqueous solubility, demonstrating comparable or better performance than traditional approaches while requiring fewer features. We then apply the validated methodology to predict the kinematic viscosity of biolubricant base oil molecules at 40 °C (KV40), where experimental data is particularly limited. Our method shows noticeable model performance improvement for KV40 prediction compared to transfer learning and the standard Random Forest. This approach provides a robust framework for limited data that can be readily generalized to a diverse range of molecular data sets especially when clear structural patterns exist in the data set.
Collapse
Affiliation(s)
- Jae Young Kim
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware 19716, United States
- Delaware Energy Institute (DEI), University of Delaware, 221 Academy St., Newark, Delaware 19716, United States
| | - Salman A Khan
- Delaware Energy Institute (DEI), University of Delaware, 221 Academy St., Newark, Delaware 19716, United States
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware 19716, United States
- Delaware Energy Institute (DEI), University of Delaware, 221 Academy St., Newark, Delaware 19716, United States
| |
Collapse
|
48
|
Qin W, Zheng S, Guo K, Yang M, Fang J. Predicting reaction kinetics of reactive bromine species with organic compounds by machine learning: Feature combination and knowledge transfer with reactive chlorine species. JOURNAL OF HAZARDOUS MATERIALS 2024; 480:136410. [PMID: 39509874 DOI: 10.1016/j.jhazmat.2024.136410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/19/2024] [Accepted: 11/04/2024] [Indexed: 11/15/2024]
Abstract
Reactive bromine species (RBS) such as bromine atom (Br•) and dibromine radical (Br2•-) are important oxidative species accounting for the transformation of organic compounds in bromide-containing water. This study developed quantitative structure-activity relationship (QSAR) models to predict second order rate constants (k) of RBS by machine learning (ML) and conducted knowledge transfer between RBS and reactive chlorine species (RCS, e.g., Cl• and Cl2•-) to improve model performance. The ML-based models (RMSEtest = 0.476 -0.712) outperformed the multiple linear regression-based models (RMSEtest = 0.572 -3.68) for predicting k of RBS. In addition, the combination of molecular fingerprints (MFs) and quantum descriptors (QDs) as input features improved the performance of ML-based models (RMSEtest = 0.476 -0.712) compared to those developed by MFs (RMSEtest = 0.524 -0.834) or QDs (RMSEtest = 0.572 -0.806) alone. EHOMO and Egap were identified to be the most important features affecting k of RBS based on SHAP analysis. A unified model integrating the datasets of four reactive halogen species (RHS, e.g., Br•, Br2•-, Cl• and Cl2•-) was further developed (R2test = 0.802), which showed better predictive performance than the individual models (R2test = 0.521 -0.776). Meanwhile, the model performance changed differently by employing knowledge transfer among RHS, which was improved for Br•/Cl•, mixed for Br•/Br2•- and Cl•/Cl2•-, but worse for Br2•-/Cl2•-. This study provides useful tools for predicting k of RHS in aqueous environments.
Collapse
Affiliation(s)
- Wenlei Qin
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China
| | - Shanshan Zheng
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen 518060, China
| | - Kaiheng Guo
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China
| | - Ming Yang
- HFI Huafu International, Guangzhou 510641, China
| | - Jingyun Fang
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China.
| |
Collapse
|
49
|
Correia J, Capela J, Rocha M. Deepmol: an automated machine and deep learning framework for computational chemistry. J Cheminform 2024; 16:136. [PMID: 39639396 PMCID: PMC11622685 DOI: 10.1186/s13321-024-00937-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 11/25/2024] [Indexed: 12/07/2024] Open
Abstract
The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance of model performance consistency across different datasets. Addressing these issues head-on, DeepMol stands out as an Automated ML (AutoML) tool by automating critical steps of the ML pipeline. DeepMol rapidly and automatically identifies the most effective data representation, pre-processing methods and model configurations for a specific molecular property/activity prediction problem. On 22 benchmark datasets, DeepMol obtained competitive pipelines compared with those requiring time-consuming feature engineering, model design and selection processes. As one of the first AutoML tools specifically developed for the computational chemistry domain, DeepMol stands out with its open-source code, in-depth tutorials, detailed documentation, and examples of real-world applications, all available at https://github.com/BioSystemsUM/DeepMol and https://deepmol.readthedocs.io/en/latest/ . By introducing AutoML as a groundbreaking feature in computational chemistry, DeepMol establishes itself as the pioneering state-of-the-art tool in the field.Scientific contributionDeepMol aims to provide an integrated framework of AutoML for computational chemistry. DeepMol provides a more robust alternative to other tools with its integrated pipeline serialization, enabling seamless deployment using the fit, transform, and predict paradigms. It uniquely supports both conventional and deep learning models for regression, classification and multi-task, offering unmatched flexibility compared to other AutoML tools. DeepMol's predefined configurations and customizable objective functions make it accessible to users at all skill levels while enabling efficient and reproducible workflows. Benchmarking on diverse datasets demonstrated its ability to deliver optimized pipelines and superior performance across various molecular machine-learning tasks.
Collapse
Affiliation(s)
- João Correia
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - João Capela
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal.
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal.
| |
Collapse
|
50
|
Elhadi A, Zhao D, Ali N, Sun F, Zhong S. Multi-method computational evaluation of the inhibitors against leucine-rich repeat kinase 2 G2019S mutant for Parkinson's disease. Mol Divers 2024; 28:4181-4197. [PMID: 38396210 DOI: 10.1007/s11030-024-10808-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/07/2024] [Indexed: 02/25/2024]
Abstract
Leucine-rich repeat kinase 2 G2019S mutant (LRRK2 G2019S) is a potential target for Parkinson's disease therapy. In this work, the computational evaluation of the LRRK2 G2019S inhibitors was conducted via a combined approach which contains a preliminary screening of a large database of compounds via similarity and pharmacophore, a secondary selection via structure-based affinity prediction and molecular docking, and a rescoring treatment for the final selection. MD simulations and MM/GBSA calculations were performed to check the agreement between different prediction methods for these inhibitors. 331 experimental ligands were collected, and 170 were used to build the structure-activity relationship. Eight representative ligand structural models were employed in similarity searching and pharmacophore screening over 14 million compounds. The process for selecting proper molecular descriptors provides a successful sample which can be used as a general strategy in QSAR modelling. The rescoring used in this work presents an alternative useful treatment for ranking and selection.
Collapse
Affiliation(s)
- Ahmed Elhadi
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Dan Zhao
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Noman Ali
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Fusheng Sun
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Shijun Zhong
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China.
| |
Collapse
|