1
|
Saylor DM, Elder RM, Duelge K, Ranasinghe Arachchige NPR, Simon DD, Wickramasekara S, Young JA. Inter-laboratory study for extraction testing of medical devices. J Pharm Biomed Anal 2025; 252:116496. [PMID: 39405789 DOI: 10.1016/j.jpba.2024.116496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 11/07/2024]
Abstract
Biocompatibility evaluation of medical devices often relies on chemical testing according to ISO 10993-18 as a critical component for consideration. However, the precision associated with these non-targeted chemical characterization assessments has not been well established. Therefore, we have conducted a study to characterize intra-laboratory (repeatability) and inter-laboratory (reproducibility) variability associated with chemical testing of extractables from polymeric materials. To accomplish this, this study focused on two polymers, each with nine chemicals that were intentionally compounded into the materials. Eight different laboratories performed extraction testing in two solvents and subsequently characterized the extracts using gas chromatography and liquid chromatography methods. Analysis of the resulting data revealed the central 90 % range for the repeatability and reproducibility relative standard deviations are (0.09, 0.22) and (0.30, 0.85), respectively, for the participating laboratory methods. This finding implies that if the same sample was tested by two different laboratories using the same extraction conditions, there is 95 % confidence for 95 % of systems that the test results could exhibit differences up to 240 %. While the study was not designed to evaluate the relative impact of specific underlying factors that may contribute to variability in quantitation, the data obtained suggest the variability associated with analytical method alone is a substantial contribution to the overall variability. The relatively large reproducibility limits we observed may have significant implications where variability in extraction measurements can impact aspects of biocompatibility risk evaluation, such as exposure dose estimation and chemical equivalence assessments.
Collapse
Affiliation(s)
- David M Saylor
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States.
| | - Robert M Elder
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | - Kaleb Duelge
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | | | - David D Simon
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| | | | - Joshua A Young
- Center for Devices and Radiological Health, FDA, Silver Spring, MD 20993, United States
| |
Collapse
|
2
|
Ma L, Yan Y, Dai S, Shao D, Yi S, Wang J, Li J, Yan J. Research on prediction of human oral bioavailability of drugs based on improved deep forest. J Mol Graph Model 2024; 133:108851. [PMID: 39232489 DOI: 10.1016/j.jmgm.2024.108851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/22/2024] [Accepted: 08/26/2024] [Indexed: 09/06/2024]
Abstract
Human oral bioavailability is a crucial factor in drug discovery. In recent years, researchers have constructed a variety of different prediction models. However, given the limited size of human oral bioavailability data sets, the challenge of making accurate predictions with small sample sizes has become a critical issue in the field. The deep forest model, with its adaptively determinable number of cascade levels, can perform exceptionally well even on small-scale data. However, the original deep forest suffers unbalanced multi-grained scanning process and premature stopping of cascade forest training. In this paper, we propose a human oral bioavailability predict method based on an improved deep forest, called balanced multi-grained scanning mapping cascade forest (bgmc-forest). Firstly, the mordred descriptor method is selected to feature extraction, then enhanced features are obtained by the improved balanced multi-grained scanning, which solves the problem of missing features at both ends. And finally, the prediction results are obtained by feature mapping cascaded forests, which is based on principal component analysis and cascade forests, ensures the effectiveness of the cascade forest. The superiority of the model constructed in this paper is demonstrated through comparative experiments, while the effectiveness of the improved module is verified through ablation experiments. Finally the decision-making process of the model is explained by the shapley additive explanations interpretation algorithm.
Collapse
Affiliation(s)
- Lei Ma
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Yukun Yan
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Shaoxing Dai
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Dangguo Shao
- Kunming University of Science and Technology, Kunming, CN 650500, China.
| | - Sanli Yi
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jiawei Wang
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jingtao Li
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jiangkai Yan
- Kunming University of Science and Technology, Kunming, CN 650500, China
| |
Collapse
|
3
|
Boichenko DS, Kolomoets NI, Boiko DA, Galushko AS, Posvyatenko AV, Kolesnikov AE, Egorova KS, Ananikov VP. Build-a-Bio-Strip: An Online Platform for Rapid Toxicity Assessment in Chemical Synthesis. J Chem Inf Model 2024. [PMID: 39488853 DOI: 10.1021/acs.jcim.4c01381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
The increasing need to understand and control the environmental impact of chemical processes has revealed the challenge in efficient evaluation of toxicity of the vast number of chemical compounds and their varying effects on biological systems. In this study, we introduce "Build-a-bio-Strip", a novel online service designed to carry out a quick initial analysis of the toxic impact of chemical processes. This platform enables users to automatically generate toxicity characteristics of chemical reactions using their own data on cytotoxicity or median lethal doses of the substances involved or computational predictions based on SMILES strings. The service calculates the toxicity metrics such as bio-Factors and cytotoxicity potentials, which can be used to identify the substances with significant contributions to the overall toxicity of a particular process. This facilitates the selection of safer synthetic routes and the optimization of chemical processes from a toxicity perspective. "Build-a-bio-Strip" represents a step toward safer and more sustainable chemical practices. It is available free-of-charge at http://app.ananikovlab.ai:8080/.
Collapse
Affiliation(s)
- Dmitry S Boichenko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory GSP-1, 1-3, Moscow 119991, Russia
| | - Nikita I Kolomoets
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Daniil A Boiko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexey S Galushko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexandra V Posvyatenko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Ministry of Health of Russian Federation, Moscow 117198, Russia
| | - Andrey E Kolesnikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Ksenia S Egorova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| |
Collapse
|
4
|
Park J, Sorourifar F, Muthyala MR, Houser AM, Tuttle M, Paulson JA, Zhang S. Zero-Shot Discovery of High-Performance, Low-Cost Organic Battery Materials Using Machine Learning. J Am Chem Soc 2024. [PMID: 39484799 DOI: 10.1021/jacs.4c11663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Organic electrode materials (OEMs), composed of abundant elements such as carbon, nitrogen, and oxygen, offer sustainable alternatives to conventional electrode materials that depend on finite metal resources. The vast structural diversity of organic compounds provides a virtually unlimited design space; however, exploring this space through Edisonian trial-and-error approaches is costly and time-consuming. In this work, we develop a new framework, SPARKLE, that combines computational chemistry, molecular generation, and machine learning to achieve zero-shot predictions of OEMs that simultaneously balance reward (specific energy), risk (solubility), and cost (synthesizability). We demonstrate that SPARKLE significantly outperforms alternative black-box machine learning algorithms on interpolation and extrapolation tasks. By deploying SPARKLE over a design space of more than 670,000 organic compounds, we identified ≈5000 novel OEM candidates. Twenty-seven of them were synthesized and fabricated into coin-cell batteries for experimental testing. Among SPARKLE-discovered OEMs, 62.9% exceeded benchmark performance metrics, representing a 3-fold improvement over OEMs selected by human intuition alone (20.8% based on six years of prior lab experience). The top-performing OEMs among the 27 candidates exhibit specific energy and cycling stability that surpass the state-of-the-art while being synthesizable at a fraction of the cost.
Collapse
Affiliation(s)
- Jaehyun Park
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio 43210, United States
| | - Farshud Sorourifar
- Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
| | - Madhav R Muthyala
- Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
| | - Abigail M Houser
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio 43210, United States
| | - Madison Tuttle
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio 43210, United States
| | - Joel A Paulson
- Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
| | - Shiyu Zhang
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio 43210, United States
| |
Collapse
|
5
|
Baruah O, Parasar U, Borphukan A, Phukan B, Bharali P, Nagamani S, Mahanta HJ. Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules. Comput Biol Chem 2024; 113:108270. [PMID: 39481232 DOI: 10.1016/j.compbiolchem.2024.108270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/30/2024] [Accepted: 10/23/2024] [Indexed: 11/02/2024]
Abstract
The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (https://github.com/hridoy69/HIA).
Collapse
Affiliation(s)
- Orchid Baruah
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Upashya Parasar
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Anirban Borphukan
- Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India
| | - Bikram Phukan
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India
| | - Pankaj Bharali
- Centre for Infectious Diseases, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India; Academy of Scientific and Innovation Research (AcSIR), Gazhiabad, Uttar Pradesh 201002, India.
| |
Collapse
|
6
|
Han Z, Xia Z, Xia J, Tetko IV, Wu S. The state-of-the-art machine learning model for Plasma Protein Binding Prediction: computational modeling with OCHEM and experimental validation. Eur J Pharm Sci 2024; 204:106946. [PMID: 39490636 DOI: 10.1016/j.ejps.2024.106946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds.
Collapse
Affiliation(s)
- Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhonghua Xia
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany.
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| |
Collapse
|
7
|
Park J, Lee W, Kim J. Large-Scale Construction and Analysis of Amorphous Porous Polymer Network Materials. ACS APPLIED MATERIALS & INTERFACES 2024; 16:57190-57199. [PMID: 39388380 DOI: 10.1021/acsami.4c13221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
In recent decades, data-driven methodologies have emerged as irreplaceable tools in materials science, particularly for elucidating structure-property relationships and facilitating the discovery of novel materials. However, despite the rapid development witnessed in other domains, amorphous materials have received relatively less attention in this context. The disordered atomic structure of amorphous materials resulting from irreversible reactions between building blocks has posed a difficulty in structural modeling, leading to a lack of databases that accurately reflect the amorphous nature of these materials. In this work, a database composed of 10,237 porous polymer networks (PPNs) was constructed from self-assembly simulations, resulting in the largest database of PPNs considering their amorphous characteristics. Through the distinct differences observed in comparison with existing databases, we emphasize that carefully considering the structural disorder of PPNs is essential for accurately characterizing their chemical behaviors. Machine learning models trained on the constructed database have confirmed that the macroscopic properties of amorphous PPNs can be predicted solely from the atomic structures of their monomers, implying that the characteristics of previously unseen PPNs can be assessed without the need for additional self-assembly simulations.
Collapse
Affiliation(s)
- Junkil Park
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Wonseok Lee
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jihan Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
8
|
Yang B, Schaefer AJ, Small BL, Leseberg JA, Bischof SM, Webster-Gardiner MS, Ess DH. Experimentally-based Fe-catalyzed ethene oligomerization machine learning model provides highly accurate prediction of propagation/termination selectivity. Chem Sci 2024:d4sc03433c. [PMID: 39449687 PMCID: PMC11495513 DOI: 10.1039/d4sc03433c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 10/09/2024] [Indexed: 10/26/2024] Open
Abstract
Linear α-olefins (1-alkenes) are critical comonomers for ethene copolymerization. A major impediment in the development of new homogeneous Fe catalysts for ethene oligomerization to produce comonomers and other important commercial products is the prediction of propagation versus termination rates that control the α-olefin distribution (e.g., 1-butene through 1-decene), which is often referred to as a K-value. Because the transition states for propagation versus termination are generally separated by less than a one kcal mol-1 difference in energy, this selectivity cannot be accurately predicted by either DFT or wavefunction methods (even DLPNO-CCSD(T)). Therefore, we developed a sub-kcal mol-1 accuracy machine learning model based on several hundred experimental selectivity values and straightforward 2D chemical and physical features that enables the prediction of α-olefin distribution K-values. As part of our model, we developed a new ad hoc feature that boosted the model performance. This machine learning model captures the effects of a broad range of ligand architectures and chemically nonintuitive trends in oligomerization selectivity. Our machine learning model was experimentally validated by prediction of a K-value for a new Fe phosphaneyl-pyridinyl-quinoline catalyst followed by experimental measurement that showed precise agreement. In addition to quantitative predictions, we demonstrate how this machine learning model can provide qualitative catalyst design using proximity of pairs type analysis.
Collapse
Affiliation(s)
- Bo Yang
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| | - Anthony J Schaefer
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| | - Brooke L Small
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | - Julie A Leseberg
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | - Steven M Bischof
- Research & Technology, Chevron Phillips Chemical 1862 Kingwood Drive Kingwood Texas 77339 USA
| | | | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University Provo Utah 84602 USA
| |
Collapse
|
9
|
Ambe K, Nakamori M, Tohno R, Suzuki K, Sasaki T, Tohkin M, Yoshinari K. Machine Learning-Based In Silico Prediction of the Inhibitory Activity of Chemical Substances Against Rat and Human Cytochrome P450s. Chem Res Toxicol 2024. [PMID: 39427263 DOI: 10.1021/acs.chemrestox.4c00168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2024]
Abstract
The prediction of cytochrome P450 inhibition by a computational (quantitative) structure-activity relationship approach using chemical structure information and machine learning would be useful for toxicity research as a simple and rapid in silico tool. However, there are few in silico models focusing on the species differences between rat and human in the P450s inhibition. This study aimed to establish in silico models to classify chemical substances as inhibitors or non-inhibitors of various rat and human P450s, using only molecular descriptors. Using the in-house test results from our in vitro experiments, we used 326 substances for model construction and internal validation data. Apart from the 326 substances, 60 substances were used as external validation data set. We focused on seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). Most of the models established using XGBoost showed an area under the receiver operating characteristic curve (ROC-AUC) of 0.8 or more in the internal validation. When we set an applicability domain for the models and confirmed their generalization performance through external validation, most of the models showed an ROC-AUC of 0.7 or more. Interestingly, for CYP1A1 and CYP1A2, we discovered that a human P450 inhibitory activity model can predict rat P450 inhibitory activity and vice versa. These models are the first attempts to predict inhibitory activity against a wide variety of P450s in both rats and humans using chemical structure information. Our experimental results and in silico models would be helpful to support information for species similarities and differences in chemical-induced toxicity.
Collapse
Affiliation(s)
- Kaori Ambe
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Mizuki Nakamori
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Riku Tohno
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kotaro Suzuki
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Takamitsu Sasaki
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| | - Masahiro Tohkin
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| |
Collapse
|
10
|
Martinez AD, Navajas-Guerrero A, Bediaga-Bañeres H, Sánchez-Bodón J, Ortiz P, Vilas-Vilela JL, Moreno-Benitez I, Gil-Lopez S. AI-Driven Insight into Polycarbonate Synthesis from CO 2: Database Construction and Beyond. Polymers (Basel) 2024; 16:2936. [PMID: 39458764 PMCID: PMC11511479 DOI: 10.3390/polym16202936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/16/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
Recent advancements in materials science have garnered significant attention within the research community. Over the past decade, substantial efforts have been directed towards the exploration of innovative methodologies for developing new materials. These efforts encompass enhancements to existing products or processes and the design of novel materials. Of particular significance is the synthesis of specific polymers through the copolymerization of epoxides with CO2. However, several uncertainties emerge in this chemical process, including challenges associated with successful polymerization and the properties of the resulting materials. These uncertainties render the design of new polymers a trial-and-error endeavor, often resulting in failed outcomes that entail significant financial, human resource, and time investments due to unsuccessful experimentation. Artificial Intelligence (AI) emerges as a promising technology to mitigate these drawbacks during the experimental phase. Nonetheless, the availability of high-quality data remains crucial, posing particular challenges in the context of polymeric materials, mainly because of the stochastic nature of polymers, which impedes their homogeneous representation, and the variation in their properties based on their processing. In this study, the first dataset linking the structure of the epoxy comonomer, the catalyst employed, and the experimental conditions of polymerization to the reaction's success is described. A novel analytical pipeline based on ML to effectively exploit the constructed database is introduced. The initial results underscore the importance of addressing the dimensionality problem. The outcomes derived from the proposed analytical pipeline, which infer the molecular weight, polydispersity index, and conversion rate, demonstrate promising adjustment values for all target parameters. The best results are measured in terms of the (Determination Coefficient) R2 between real and predicted values for all three target magnitudes. The best proposed solution provides a R2 equal to 0.79, 0.86, and 0.93 for the molecular weight, polydispersity index, and conversion rate, respectively. The proposed analytical pipeline is automatized (including AutoML techniques for ML models hyperparameter tuning), allowing easy scalability as the database grows, laying the foundation for future research.
Collapse
Affiliation(s)
- Aritz D. Martinez
- TECNALIA, Basque Research & Technology Alliance (BRTA), Technological Park of Bizkaia, 48160 Derio, Spain; (A.D.M.); (A.N.-G.); (P.O.); (S.G.-L.)
| | - Adriana Navajas-Guerrero
- TECNALIA, Basque Research & Technology Alliance (BRTA), Technological Park of Bizkaia, 48160 Derio, Spain; (A.D.M.); (A.N.-G.); (P.O.); (S.G.-L.)
| | - Harbil Bediaga-Bañeres
- Grupo de Química Macromolecular (LABQUIMAC), Departamento de Química Física, Facultad de Ciencia y Tecnología, Universidad del País Vasco UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (J.S.-B.); (J.L.V.-V.)
| | - Julia Sánchez-Bodón
- Grupo de Química Macromolecular (LABQUIMAC), Departamento de Química Física, Facultad de Ciencia y Tecnología, Universidad del País Vasco UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (J.S.-B.); (J.L.V.-V.)
| | - Pablo Ortiz
- TECNALIA, Basque Research & Technology Alliance (BRTA), Technological Park of Bizkaia, 48160 Derio, Spain; (A.D.M.); (A.N.-G.); (P.O.); (S.G.-L.)
| | - Jose Luis Vilas-Vilela
- Grupo de Química Macromolecular (LABQUIMAC), Departamento de Química Física, Facultad de Ciencia y Tecnología, Universidad del País Vasco UPV/EHU, 48940 Leioa, Spain; (H.B.-B.); (J.S.-B.); (J.L.V.-V.)
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, UPV/EHU Science Park, 48940 Leioa, Spain
| | - Isabel Moreno-Benitez
- Grupo de Química Macromolecular (LABQUIMAC), Departamento de Química Orgánica e Inorgánica, Facultad de Ciencia y Tecnología, Universidad del País Vasco UPV/EHU, 48940 Leioa, Spain
| | - Sergio Gil-Lopez
- TECNALIA, Basque Research & Technology Alliance (BRTA), Technological Park of Bizkaia, 48160 Derio, Spain; (A.D.M.); (A.N.-G.); (P.O.); (S.G.-L.)
| |
Collapse
|
11
|
Tahir MH, Farrukh A, Alqahtany FZ, Badshah A, Shaaban IA, Assiri MA. Accelerated discovery of polymer donors for organic solar cells through machine learning: From library creation to performance forecasting. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 326:125298. [PMID: 39447304 DOI: 10.1016/j.saa.2024.125298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 09/10/2024] [Accepted: 10/16/2024] [Indexed: 10/26/2024]
Abstract
The design of novel polymer donors for organic solar cells has been a major research focus for decades, but discovering unique materials remains challenging due to the high cost of experimentation. In this study, machine learning models are employed to predict power conversion efficiency (PCE), Mordred descriptors are used for model training. Among the four machine learning models evaluated, the gradient boosting regressor emerged as the best-performing model. Additionally, a chemical library of polymer donors was generated and analyzed using various measures. 30 donors with highest PCE are selected and their synthetic accessibility is evaluated. Similarity analysis has indicated much resemblance in selected polymer donors.
Collapse
Affiliation(s)
- Mudassir Hussain Tahir
- Research Faculty of Agriculture, Field Science Center for Northern Biosphere, Hokkaido University, Sapporo, Hokkaido 060-8589, 060-0811, Japan
| | - Aftab Farrukh
- Department of Physics, PMAS-Arid Agriculture University, Rawalpindi 46300, Pakistan
| | - Faleh Zafer Alqahtany
- Department of Chemistry, College of Science, University of Bisha, Bisha, Saudi Arabia
| | - Amir Badshah
- Department of Chemistry, Kohat University of Science and Technology, Kohat 26000, Pakistan.
| | - Ibrahim A Shaaban
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| | - Mohammed A Assiri
- Department of Chemistry, Faculty of Science, King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia; Research Center for Advanced Materials Science (RCAMS), King Khalid University, P.O. Box 960, Abha 61421, Saudi Arabia
| |
Collapse
|
12
|
Collins JW, Ebrahimkhani M, Ramirez D, Deiloff J, Gonzalez M, Abedi M, Philippe-Venec L, Cole BM, Moore B, Nwankwo JO. Attentive graph neural network models for the prediction of blood brain barrier permeability. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.12.617907. [PMID: 39463958 PMCID: PMC11507759 DOI: 10.1101/2024.10.12.617907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The blood brain barrier's (BBB) unique endothelial cells and tight junctions selectively regulate passage of molecules to the central nervous system (CNS) to prevent pathogen entry and maintain neural homeostasis. Various neurological conditions and neurodegenerative diseases benefit from small molecules capable of BBB penetration (BBBP) to elicit a therapeutic effect. Predicting BBBP often involves in silico assessment of molecular properties such as lipophilicity (log P ) and polar surface area (PSA) using the CNS multiparameter optimization (MPO) method. This study curated an open-source dataset to benchmark rigorously machine learning (ML) and neural network (NN) models with each other and with MPO for predicting BBBP. Our analysis demonstrated that AI models, especially attentive NNs using stereochemical features, significantly outperform MPO in predicting BBBP. An attentive graph neural network (GNN), we refer to as CANDID-CNS™, achieved a 0.23-0.26 higher AUROC score than MPO on full test sets, and a 0.17-0.19 higher score on stereoisomers filtered subsets. Regarding stereoisomers that differ in BBBP, which MPO cannot distinguish, attentive GNNs correctly classify these with AUROC and MCC metrics comparable to or better than MPO's AUROC and MCC on less difficult test molecules. These findings suggest that integrating attentive GNN models into pharmaceutical drug discovery processes can substantially improve prediction rates, and thereby reduce the timeline, cost, and increase probability of success of designing brain penetrant therapeutics for the treatment of a wide variety of neurological and neurodegenerative diseases.
Collapse
|
13
|
Malm L, Liigand J, Aalizadeh R, Alygizakis N, Ng K, Fro̷kjær EE, Nanusha MY, Hansen M, Plassmann M, Bieber S, Letzel T, Balest L, Abis PP, Mazzetti M, Kasprzyk-Hordern B, Ceolotto N, Kumari S, Hann S, Kochmann S, Steininger-Mairinger T, Soulier C, Mascolo G, Murgolo S, Garcia-Vara M, López de Alda M, Hollender J, Arturi K, Coppola G, Peruzzo M, Joerss H, van der Neut-Marchand C, Pieke EN, Gago-Ferrero P, Gil-Solsona R, Licul-Kucera V, Roscioli C, Valsecchi S, Luckute A, Christensen JH, Tisler S, Vughs D, Meekel N, Talavera Andújar B, Aurich D, Schymanski EL, Frigerio G, Macherius A, Kunkel U, Bader T, Rostkowski P, Gundersen H, Valdecanas B, Davis WC, Schulze B, Kaserzon S, Pijnappels M, Esperanza M, Fildier A, Vulliet E, Wiest L, Covaci A, Macan Schönleben A, Belova L, Celma A, Bijlsma L, Caupos E, Mebold E, Le Roux J, Troia E, de Rijke E, Helmus R, Leroy G, Haelewyck N, Chrastina D, Verwoert M, Thomaidis NS, Kruve A. Quantification Approaches in Non-Target LC/ESI/HRMS Analysis: An Interlaboratory Comparison. Anal Chem 2024; 96:16215-16226. [PMID: 39353203 PMCID: PMC11483430 DOI: 10.1021/acs.analchem.4c02902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/16/2024] [Accepted: 09/16/2024] [Indexed: 10/04/2024]
Abstract
Nontargeted screening (NTS) utilizing liquid chromatography electrospray ionization high-resolution mass spectrometry (LC/ESI/HRMS) is increasingly used to identify environmental contaminants. Major differences in the ionization efficiency of compounds in ESI/HRMS result in widely varying responses and complicate quantitative analysis. Despite an increasing number of methods for quantification without authentic standards in NTS, the approaches are evaluated on limited and diverse data sets with varying chemical coverage collected on different instruments, complicating an unbiased comparison. In this interlaboratory comparison, organized by the NORMAN Network, we evaluated the accuracy and performance variability of five quantification approaches across 41 NTS methods from 37 laboratories. Three approaches are based on surrogate standard quantification (parent-transformation product, structurally similar or close eluting) and two on predicted ionization efficiencies (RandFor-IE and MLR-IE). Shortly, HPLC grade water, tap water, and surface water spiked with 45 compounds at 2 concentration levels were analyzed together with 41 calibrants at 6 known concentrations by the laboratories using in-house NTS workflows. The accuracy of the approaches was evaluated by comparing the estimated and spiked concentrations across quantification approaches, instrumentation, and laboratories. The RandFor-IE approach performed best with a reported mean prediction error of 15× and over 83% of compounds quantified within 10× error. Despite different instrumentation and workflows, the performance was stable across laboratories and did not depend on the complexity of water matrices.
Collapse
Affiliation(s)
- Louise Malm
- Department
of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 11418 Stockholm, Sweden
| | | | - Reza Aalizadeh
- Laboratory
of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
- Department
of Environmental Health Sciences, Yale School of Public Health, Yale University, New Haven, Connecticut 06510, United States
| | - Nikiforos Alygizakis
- Laboratory
of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
- Environmental
Institute, Okružná
784/42, 97241 Koš, Slovak Republic
| | - Kelsey Ng
- Environmental
Institute, Okružná
784/42, 97241 Koš, Slovak Republic
- RECETOX,
Faculty of Science, Masaryk University, Kamenice 753/5, Building D29, 62500 Brno, Czech Republic
| | - Emil Egede Fro̷kjær
- Environmental
Metabolomics Lab, Aarhus University, Frederiksborgsvej 399, 4000 Roskilde, Denmark
| | - Mulatu Yohannes Nanusha
- Environmental
Metabolomics Lab, Aarhus University, Frederiksborgsvej 399, 4000 Roskilde, Denmark
| | - Martin Hansen
- Environmental
Metabolomics Lab, Aarhus University, Frederiksborgsvej 399, 4000 Roskilde, Denmark
| | - Merle Plassmann
- Department
of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 11418 Stockholm, Sweden
| | - Stefan Bieber
- Analytisches
Forschungsinstitut für Non-Target Screening GmbH (AFIN-TS), Am Mittleren Moos 48, 86167 Augsburg, Germany
| | - Thomas Letzel
- Analytisches
Forschungsinstitut für Non-Target Screening GmbH (AFIN-TS), Am Mittleren Moos 48, 86167 Augsburg, Germany
| | - Lydia Balest
- Acquedotto
Pugliese SpA - Direzione Laboratori e Controllo Igienico Sanitario
(DIRLC), 70123 Bari, Italy
| | - Pier Paolo Abis
- Acquedotto
Pugliese SpA - Direzione Laboratori e Controllo Igienico Sanitario
(DIRLC), 70123 Bari, Italy
| | - Michele Mazzetti
- Agenzia
Regionale per l’Ambiente Toscana, Via G. Marradi 114, 57126 Livorno, Italy
| | - Barbara Kasprzyk-Hordern
- Department
of Chemistry, University of Bath, Bath BA2 7AY, U.K.
- Institute
for Sustainability, Bath BA2 7AY, U.K.
| | - Nicola Ceolotto
- Department
of Chemistry, University of Bath, Bath BA2 7AY, U.K.
- Institute
for Sustainability, Bath BA2 7AY, U.K.
| | - Sangeeta Kumari
- Department
of Chemistry, Vienna, BOKU University, Muthgasse 18, 1190 Vienna, Austria
| | - Stephan Hann
- Department
of Chemistry, Vienna, BOKU University, Muthgasse 18, 1190 Vienna, Austria
| | - Sven Kochmann
- Department
of Chemistry, Vienna, BOKU University, Muthgasse 18, 1190 Vienna, Austria
| | | | - Coralie Soulier
- BRGM, 3 avenue Claude
Guillemin, BP36009, 45060 Orléans Cedex 2, France
| | - Giuseppe Mascolo
- Water Research
Institute (IRSA), National Research Council
(CNR), Via F. De Blasio,
5, 70132 Bari, Italy
- Research
Institute for Geo-Hydrological Protection (IRPI), National Research Council (CNR), Via Amendola, 122/I, 70126 Bari, Italy
| | - Sapia Murgolo
- Water Research
Institute (IRSA), National Research Council
(CNR), Via F. De Blasio,
5, 70132 Bari, Italy
| | - Manuel Garcia-Vara
- Water,
Environmental and Food Chemistry Unit, Institute
of Environmental Assessment and Water Research, C/Jordi Girona 18-26, ES 08034 Barcelona, Spain
| | - Miren López de Alda
- Water,
Environmental and Food Chemistry Unit, Institute
of Environmental Assessment and Water Research, C/Jordi Girona 18-26, ES 08034 Barcelona, Spain
| | - Juliane Hollender
- Eawag,
Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dübendorf, Switzerland
| | - Katarzyna Arturi
- Eawag,
Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dübendorf, Switzerland
| | - Gianluca Coppola
- White
Lab Srl, Via Mons. Rodolfi
22, 36022 San Giuseppe
de Cassola (VI), Italy
| | - Massimo Peruzzo
- White
Lab Srl, Via Mons. Rodolfi
22, 36022 San Giuseppe
de Cassola (VI), Italy
| | - Hanna Joerss
- Department
for Organic Environmental Chemistry, Helmholtz
Centre Hereon, Max-Planck-Str.
1, 21502 Geesthacht, Germany
| | | | - Eelco N. Pieke
- Het Waterlaboratorium, J.W. Lucasweg 2, 2031 BE Haarlem, The Netherlands
| | - Pablo Gago-Ferrero
- Human Exposure
to Organic Pollutants Unit, Institute of
Environmental Assessment and Water Research, C/Jordi Girona 18-26, ES 08034 Barcelona, Spain
| | - Ruben Gil-Solsona
- Human Exposure
to Organic Pollutants Unit, Institute of
Environmental Assessment and Water Research, C/Jordi Girona 18-26, ES 08034 Barcelona, Spain
| | - Viktória Licul-Kucera
- Institute
for Analytical Research, Hochschulen Fresenius gem. Trägergesellschaft mbH, 65510 Idstein, Germany
- Institute
for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1012 WP Amsterdam, Netherlands
| | - Claudio Roscioli
- Water Research
Institute (IRSA), National Research Council
of Italy (CNR), via del
Mulino, 19, 20861 Brugherio, MB, Italy
| | - Sara Valsecchi
- Water Research
Institute (IRSA), National Research Council
of Italy (CNR), via del
Mulino, 19, 20861 Brugherio, MB, Italy
| | - Austeja Luckute
- Analytical
Chemistry Group, Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsenvej 40, 1871 Frederiksberg, Denmark
| | - Jan H. Christensen
- Analytical
Chemistry Group, Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsenvej 40, 1871 Frederiksberg, Denmark
| | - Selina Tisler
- Analytical
Chemistry Group, Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsenvej 40, 1871 Frederiksberg, Denmark
| | - Dennis Vughs
- KWR Water
Research Institute, Groningenhaven 7, 3433 PE Nieuwegein, The Netherlands
| | - Nienke Meekel
- KWR Water
Research Institute, Groningenhaven 7, 3433 PE Nieuwegein, The Netherlands
| | - Begoña Talavera Andújar
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6, Avenue
du Swing, L-4367 Belvaux, Luxembourg
| | - Dagny Aurich
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6, Avenue
du Swing, L-4367 Belvaux, Luxembourg
| | - Emma L. Schymanski
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6, Avenue
du Swing, L-4367 Belvaux, Luxembourg
| | - Gianfranco Frigerio
- Luxembourg
Centre for Systems Biomedicine (LCSB), University
of Luxembourg, 6, Avenue
du Swing, L-4367 Belvaux, Luxembourg
- Center
for Omics Sciences (COSR), IRCCS San Raffaele
Scientific Institute, 20132 Milan, Italy
| | - André Macherius
- Bavarian
Environment Agency, Bürgermeister-Ulrich-Str. 160, 86179 Augsburg, Germany
| | - Uwe Kunkel
- Bavarian
Environment Agency, Bürgermeister-Ulrich-Str. 160, 86179 Augsburg, Germany
| | - Tobias Bader
- Laboratory
for Operation Control and Research, Zweckverband
Landeswasserversorgung, Am Spitzigen Berg 1, 89129 Langenau, Germany
| | | | | | | | - W. Clay Davis
- US National
Institute of Standards and Technology, 331 Fort Johnson Rd, 29412 Charleston, South Carolina, United States
| | - Bastian Schulze
- Queensland
Alliance for Environmental Health Sciences, The University of Queensland, Woolloongabba, Queensland 4102, Australia
| | - Sarit Kaserzon
- Queensland
Alliance for Environmental Health Sciences, The University of Queensland, Woolloongabba, Queensland 4102, Australia
| | - Martijn Pijnappels
- Ministry
of Infrastructure and Water Management, Rijkswaterstaat Laboratory, Zuiderwagenplein 2, 8224 AD Lelystad, The Netherlands
| | - Mar Esperanza
- SUEZ-CIRSEE, 38 rue
du president Wilson, 78230 Le Pecq, France
| | - Aurélie Fildier
- Universite
Claude Bernard Lyon 1, CNRS, ISA, UMR5280, 5 rue de la Doua, F-69100 Villeurbanne, France
| | - Emmanuelle Vulliet
- Universite
Claude Bernard Lyon 1, CNRS, ISA, UMR5280, 5 rue de la Doua, F-69100 Villeurbanne, France
| | - Laure Wiest
- Universite
Claude Bernard Lyon 1, CNRS, ISA, UMR5280, 5 rue de la Doua, F-69100 Villeurbanne, France
| | - Adrian Covaci
- Toxicological
Centre, University of Antwerp, Universiteitsplein 1, 2610 Antwerp, Belgium
| | | | - Lidia Belova
- Toxicological
Centre, University of Antwerp, Universiteitsplein 1, 2610 Antwerp, Belgium
| | - Alberto Celma
- Environmental
and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, 12006 Castelló, Spain
- Department
of Aquatic Sciences and Assessment, Swedish
University of Agricultural Sciences, 75007 Uppsala, Sweden
| | - Lubertus Bijlsma
- Environmental
and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, 12006 Castelló, Spain
| | - Emilie Caupos
- LEESU, Univ Paris Est Creteil, Ecole des
Ponts, F-94010 Creteil, France
- Univ Paris
Est Creteil, CNRS, OSU-EFLUVE, F-94010 Creteil, France
| | | | - Julien Le Roux
- LEESU, Univ Paris Est Creteil, Ecole des
Ponts, F-94010 Creteil, France
| | - Eugenie Troia
- IBED Environmental
Chemistry and Mass Spectrometry Laboratories, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Eva de Rijke
- IBED Environmental
Chemistry and Mass Spectrometry Laboratories, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Rick Helmus
- IBED Environmental
Chemistry and Mass Spectrometry Laboratories, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Gaëla Leroy
- VEOLIA
Recherche et Innovation, Chemin de la Digue, 78600 Maisons-Laffitte, France
| | - Niels Haelewyck
- Vlaamse
Milieumaatschappij, Raymonde de Larochelaan 1, 9051 Gent, Sint-Denijs-Westerem, Belgium
| | - David Chrastina
- T. G.
Masaryk Water Research Institute, p. r. i., Macharova 5, 70200 Ostrava, Czech Republic
| | - Milan Verwoert
- WLN, Rijksstraatweg
85, 9756 AD Glimmen,
Groningen, The Netherlands
| | - Nikolaos S. Thomaidis
- Laboratory
of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
| | - Anneli Kruve
- Department
of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 11418 Stockholm, Sweden
- Department
of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 11418 Stockholm, Sweden
| |
Collapse
|
14
|
Zhao J, Hermans E, Sepassi K, Tistaert C, Bergström CAS, Ahmad M, Larsson P. Effect of Data Quality and Data Quantity on the Estimation of Intrinsic Solubility: Analysis Based on a Single-Source Data Set. Mol Pharm 2024; 21:5261-5271. [PMID: 39267585 PMCID: PMC11462503 DOI: 10.1021/acs.molpharmaceut.4c00685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 09/05/2024] [Accepted: 09/05/2024] [Indexed: 09/17/2024]
Abstract
Aqueous solubility is one of the most important physicochemical properties of drug molecules and a major driving force for oral drug absorption. To date, the performance of in silico models for the estimation of solubility for novel chemical space is limited. To investigate possible reasons and remedies for this, the Johnson and Johnson in-house aqueous solubility data with over 40,000 compounds was leveraged. All data were generated through the same high-throughput assay, providing a unique opportunity to explore the relationship between data quality, quantity, and model estimations. Six intrinsic solubility data sets with different sizes and noise levels were generated by making use of three different approaches: (i) inclusion or exclusion of amorphous solid residue, (ii) measured or experimental log D to identify the intrinsic solubility, and (iii) adopting or omitting a quality check process in the data processing workflow. A random forest regressor was trained on the data sets with three different sets of descriptors calculated from RDKit, ADMET predictor, or Mordred, and the performances were evaluated with nested cross-validation as well as ten refined test sets. The models confirm, as expected, that with the same data set size, high-quality data leads to better model performance; however, also, models trained with larger data sets containing analytical variability can give equally accurate estimations compared to models trained with small, clean, and diverse data sets. However, noise introduced by including the presence of amorphous solid postsolubility measurement in the training data set cannot be overcome by increasing data size, as they are introducing a biased systematic positive error in the data set, confirming the importance of critical data review. Finally, two top-performing models were tested on the first test set from the second solubility challenge, achieving RMSE values of 0.74 and 0.72 and log S ± 0.5 of 46 and 48%, respectively. These results demonstrated improved performance compared to those reported in the findings of the competition, highlighting that a single-source curated data set can enhance the prediction of intrinsic solubility.
Collapse
Affiliation(s)
- Jiaxi Zhao
- Department
of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| | - Eline Hermans
- Pharmaceutical
& Material Sciences, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | - Kia Sepassi
- Discovery
Pharmaceutics, Janssen Research & Development,
LLC, La Jolla, California 92121, United States
| | - Christophe Tistaert
- Pharmaceutical
& Material Sciences, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | | | - Mazen Ahmad
- In
Silico Discovery, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | - Per Larsson
- Department
of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| |
Collapse
|
15
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
16
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
17
|
Esposito J, Kakar J, Khokhar T, Noll-Walker T, Omar F, Christen A, James Cleaves H, Sandora M. Comparing the complexity of written and molecular symbolic systems. Biosystems 2024; 244:105297. [PMID: 39154841 DOI: 10.1016/j.biosystems.2024.105297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 08/11/2024] [Accepted: 08/11/2024] [Indexed: 08/20/2024]
Abstract
Symbolic systems (SSs) are uniquely products of living systems, such that symbolism and life may be inextricably intertwined phenomena. Within a given SS, there is a range of symbol complexity over which signaling is functionally optimized. This range exists relative to a complex and potentially infinitely large background of latent, unused symbol space. Understanding how symbol sets sample this latent space is relevant to diverse fields including biochemistry and linguistics. We quantitatively explored the graphic complexity of two biosemiotic systems: genetically encoded amino acids (GEAAs) and written language. Molecular and graphical notions of complexity are highly correlated for GEAAs and written language. Symbol sets are generally neither minimally nor maximally complex relative to their latent spaces, but exist across an objectively definable distribution, with the GEAAs having especially low complexity. The selection pressures guiding these disparate systems are explicable by symbol production and disambiguation efficiency. These selection pressures may be universal, offer a quantifiable metric for comparison, and suggest that all life in the Universe may discover optimal symbol set complexity distributions with respect to their latent spaces. If so, the "complexity" of individual components of SSs may not be as strong a biomarker as symbol set complexity distribution.
Collapse
Affiliation(s)
- Julia Esposito
- Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Jyotika Kakar
- Blue Marble Space Institute of Science, Seattle, WA, USA; Department of Computer Engineering, University of Mumbai, MH, India
| | - Tasneem Khokhar
- Blue Marble Space Institute of Science, Seattle, WA, USA; Department of Physics and Astronomy, University of California, Irvine, CA, USA
| | | | - Fatima Omar
- Blue Marble Space Institute of Science, Seattle, WA, USA; Jodrell Bank Centre for Astrophysics, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Anna Christen
- Blue Marble Space Institute of Science, Seattle, WA, USA
| | - H James Cleaves
- Department of Chemistry, Howard University, Washington, DC, 20059, USA; Blue Marble Space Institute of Science, Seattle, WA, USA; Earth-Life Science Institute, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan.
| | | |
Collapse
|
18
|
Vyas SK, Das A, Suryanarayana Murty U, Dixit VA. Sulfotransferase-mediated phase II drug metabolism prediction of substrates and sites using accessibility and reactivity-based algorithms. Mol Inform 2024; 43:e202400008. [PMID: 39110066 DOI: 10.1002/minf.202400008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/18/2024] [Accepted: 06/24/2024] [Indexed: 10/16/2024]
Abstract
Sulphotransferases (SULTs) are a major phase II metabolic enzyme class contributing ~20 % to the Phase II metabolism of FDA-approved drugs. Ignoring the potential for SULT-mediated metabolism leaves a strong potential for drug-drug interactions, often causing late-stage drug discovery failures or black-boxed warnings on FDA labels. The existing models use only accessibility descriptors and machine learning (ML) methods for class and site of sulfonation (SOS) predictions for SULT. In this study, a variety of accessibility, reactivity, and hybrid models and algorithms have been developed to make accurate substrate and SOS predictions. Unlike the literature models, reactivity parameters for the aliphatic or aromatic hydroxyl groups (R/Ar-O-H), the Bond Dissociation Energy (BDE) gave accurate models with a True Positive Rate (TPR)=0.84 for SOS predictions. We offer mechanistic insights to explain these novel findings that are not recognized in the literature. The accessibility parameters like the ratio of Chemgauss4 Score (CGS) and Molecular Weight (MW) CGS/MW and distance from cofactor (Dis) were essential for class predictions and showed TPR=0.72. Substrates consistently had lower BDE, Dis, and CGS/MW than non-substrates. Hybrid models also performed acceptablely for SOS predictions. Using the best models, Algorithms gave an acceptable performance in class prediction: TPR=0.62, False Positive Rate (FPR)=0.24, Balanced accuracy (BA)=0.69, and SOS prediction: TPR=0.98, FPR=0.60, and BA=0.69. A rule-based method was added to improve the predictive performance, which improved the algorithm TPR, FPR, and BA. Validation using an external dataset of drug-like compounds gave class prediction: TPR=0.67, FPR=0.00, and SOS prediction: TPR=0.80 and FPR=0.44 for the best Algorithm. Comparisons with standard ML models also show that our algorithm shows higher predictive performance for classification on external datasets. Overall, these models and algorithms (SOS predictor) give accurate substrate class and site (SOS) predictions for SULT-mediated Phase II metabolism and will be valuable to the drug discovery community in academia and industry. The SOS predictor is freely available for academic/non-profit research via the GitHub link.
Collapse
Affiliation(s)
- Shivam Kumar Vyas
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| | - Avik Das
- Department of Pharmacy, Birla Institute of Technology and Sciences Pilani (BITS-Pilani), Vidya Vihar Campus 41, Pilani, Rajasthan, 333031, India
- Current address: Department of Primary Intelligence, IQVIA, Sarjapur-Marathahalli Outer Ring Road Embassy Tech Square, Bangalore, 560103 Karnataka, India
| | - Upadhyayula Suryanarayana Murty
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| | - Vaibhav A Dixit
- Department of Medicinal Chemistry, Department of Pharmaceuticals, Ministry of Chemicals & Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), P.O.: Changsari, Dist: Kamrup, Pin, National Institute of Pharmaceutical Education and Research, Guwahati, (NIPER Guwahati), Guwahati, Assam, 781101, India
| |
Collapse
|
19
|
Monem S, Hassanien AE, Abdel-Hamid AH. A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models. J Cheminform 2024; 16:110. [PMID: 39334437 PMCID: PMC11438216 DOI: 10.1186/s13321-024-00903-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 09/08/2024] [Indexed: 09/30/2024] Open
Abstract
This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System (SMILES) features, molecular graph features, fingerprint features, and drug-target features. On the other hand, cell line features are captured through four views: gene expression features, copy number features, mutation features, and proteomics features. To prevent overfitting of the model, two techniques are employed. First, each view feature of a drug is paired with each corresponding cell line view and input into a multi-task attention deep learning model. This multi-task model is trained to simultaneously predict both the synergy score value and synergy class label. This process results in sixteen input view features being fed into the multi-task model, producing sixteen prediction values. Subsequently, these prediction values are utilized as inputs for an ensemble model, which outputs the final prediction value. The 'MVME' model is assessed using the O'Neil dataset, which includes 38 distinct drugs combined across 39 distinct cancer cell lines to output 22,737 drug combination pairs. For the synergy score value, the proposed model scores a mean square error (MSE) of 206.57, a root mean square error (RMSE) of 14.30, and a Pearson score of 0.76. For the synergy class label, the model scores 0.90 for accuracy, 0.96 for precision, 0.57 for kappa, 0.96 for the area under the ROC curve (ROC-AUC), and 0.88 for the area under the precision-recall curve (PR-AUC).
Collapse
Affiliation(s)
- Samar Monem
- Mathematics and Computer Science Department, Faculty of Science, Beni-Suef University, Beni-Suef, 62521, Egypt.
| | | | - Alaa H Abdel-Hamid
- Mathematics and Computer Science Department, Faculty of Science, Beni-Suef University, Beni-Suef, 62521, Egypt
| |
Collapse
|
20
|
Bachorz RA, Nowak D, Ratajewski M. QSPRmodeler - An open source application for molecular predictive analytics. FRONTIERS IN BIOINFORMATICS 2024; 4:1441024. [PMID: 39391332 PMCID: PMC11464749 DOI: 10.3389/fbinf.2024.1441024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024] Open
Abstract
The drug design process can be successfully supported using a variety of in silico methods. Some of these are oriented toward molecular property prediction, which is a key step in the early drug discovery stage. Before experimental validation, drug candidates are usually compared with known experimental data. Technically, this can be achieved using machine learning approaches, in which selected experimental data are used to train the predictive models. The proposed Python software is designed for this purpose. It supports the entire workflow of molecular data processing, starting from raw data preparation followed by molecular descriptor creation and machine learning model training. The predictive capabilities of the resulting models were carefully validated internally and externally. These models can be easily applied to new compounds, including within more complex workflows involving generative approaches.
Collapse
Affiliation(s)
- Rafał A. Bachorz
- Institute of Medical Biology, Polish Academy of Sciences, Łódź, Poland
| | - Damian Nowak
- Institute of Medical Biology, Polish Academy of Sciences, Łódź, Poland
- Department of Quantum Chemistry, Faculty of Chemistry, Adam Mickiewicz University, Poznań, Poland
| | - Marcin Ratajewski
- Institute of Medical Biology, Polish Academy of Sciences, Łódź, Poland
| |
Collapse
|
21
|
Ramahi A, Shinde V, Pearce T, Sinka C. Virtual screening of drug materials for pharmaceutical tablet manufacturability with reference to sticking. Int J Pharm 2024:124722. [PMID: 39293578 DOI: 10.1016/j.ijpharm.2024.124722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/12/2024] [Accepted: 09/13/2024] [Indexed: 09/20/2024]
Abstract
The manufacturing of pharmaceutical solid dosage forms, such as tablets involves a large number of successive processing operations including crystallisation of the drug substance, granulation, drying, milling, mixing of the formulation, and compaction. Each step is fraught with manufacturing problems. Undesired adhesion of powders to the surface of the compaction tooling, known as sticking, is a frequent and highly disruptive problem that occurs at the very end of the process chain when the tablet is formed. As an alternative to the mechanistic approaches to address sticking, we introduce two different machine learning strategies to predict sticking directly from the chemical formula of the drug substance, represented by molecular descriptors. An empirical database for sticking behaviour was developed and used to train the machine learning (ML) algorithms to predict sticking properties from molecular descriptors. The ML model has successfully classified sticking/non-sticking behaviour of powders with 100% separation. Predictions were made for materials in the handbook of Pharmaceutical Excipients and a subset of molecules included in the ChemBL database, demonstrating the potential use of machine learning approaches to screen for sticking propensity early at drug discovery and development stages. This is the first-time molecular descriptors and machine learning were used to predict and screen for sticking behaviour. The method has potential to transform the development of medicines by providing manufacturability information at drug screening stage and is potentially applicable to other manufacturing problems controlled by the chemistry of the drug substance.
Collapse
|
22
|
Xiao Z, Zhu M, Chen J, You Z. Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:15650-15660. [PMID: 39051472 DOI: 10.1021/acs.est.4c02421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure-activity landscape-based (abbreviated as ADSAL) methodology. The TL-MTL-GNN model coupled with the optimal ADSAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.
Collapse
Affiliation(s)
- Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Minghua Zhu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zecang You
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
23
|
Stankovic B, Marinkovic F. A novel procedure for selection of molecular descriptors: QSAR model for mutagenicity of nitroaromatic compounds. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:54603-54617. [PMID: 39207617 DOI: 10.1007/s11356-024-34800-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
Nitroaromatic compounds (NACs) stand out as pervasive organic pollutants, prompting an imperative need to investigate their hazardous effects. Computational chemistry methods play a crucial role in this exploration, offering a safer and more time-efficient approach, mandated by various legislations. In this study, our focus lay on the development of transparent, interpretable, reproducible, and publicly available methodologies aimed at deriving quantitative structure-activity relationship models and testing them by modelling the mutagenicity of NACs against the Salmonella typhimurium TA100 strain. Descriptors were selected from Mordred and RDKit molecular descriptors, along with several quantum chemistry descriptors. For that purpose, the genetic algorithm (GA), as the most widely used method in the literature, and three alternative algorithms (Boruta, Featurewiz, and ForwardSelector) combined with the forward stepwise selection technique were used. The construction of models utilized the multiple linear regression method, with subsequent scrutiny of fitting and predictive performance, reliability, and robustness through various statistical validation criteria. The models were ranked by the Multi-Criteria Decision Making procedure. Findings have revealed that the proposed methodology for descriptor selection outperforms GA, with Featurewiz showing a slight advantage over Boruta and ForwardSelector. These constructed models can serve as valuable tools for the quick and reliable prediction of NACs mutagenicity.
Collapse
Affiliation(s)
- Branislav Stankovic
- Department for Nuclear and Plasma Physics, Vinča Institute of Nuclear Sciences -National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia.
| | | |
Collapse
|
24
|
Tom G, Schmid SP, Baird SG, Cao Y, Darvish K, Hao H, Lo S, Pablo-García S, Rajaonson EM, Skreta M, Yoshikawa N, Corapi S, Akkoc GD, Strieth-Kalthoff F, Seifrid M, Aspuru-Guzik A. Self-Driving Laboratories for Chemistry and Materials Science. Chem Rev 2024; 124:9633-9732. [PMID: 39137296 PMCID: PMC11363023 DOI: 10.1021/acs.chemrev.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Self-driving laboratories (SDLs) promise an accelerated application of the scientific method. Through the automation of experimental workflows, along with autonomous experimental planning, SDLs hold the potential to greatly accelerate research in chemistry and materials discovery. This review provides an in-depth analysis of the state-of-the-art in SDL technology, its applications across various scientific disciplines, and the potential implications for research and industry. This review additionally provides an overview of the enabling technologies for SDLs, including their hardware, software, and integration with laboratory infrastructure. Most importantly, this review explores the diverse range of scientific domains where SDLs have made significant contributions, from drug discovery and materials science to genomics and chemistry. We provide a comprehensive review of existing real-world examples of SDLs, their different levels of automation, and the challenges and limitations associated with each domain.
Collapse
Affiliation(s)
- Gary Tom
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Stefan P. Schmid
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland
| | - Sterling G. Baird
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Yang Cao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Kourosh Darvish
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Han Hao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Stanley Lo
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Sergio Pablo-García
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
| | - Ella M. Rajaonson
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Naruki Yoshikawa
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Samantha Corapi
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Gun Deniz Akkoc
- Forschungszentrum
Jülich GmbH, Helmholtz Institute
for Renewable Energy Erlangen-Nürnberg, Cauerstr. 1, 91058 Erlangen, Germany
- Department
of Chemical and Biological Engineering, Friedrich-Alexander Universität Erlangen-Nürnberg, Egerlandstr. 3, 91058 Erlangen, Germany
| | - Felix Strieth-Kalthoff
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- School of
Mathematics and Natural Sciences, University
of Wuppertal, Gaußstraße
20, 42119 Wuppertal, Germany
| | - Martin Seifrid
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Department
of Materials Science and Engineering, North
Carolina State University, Raleigh, North Carolina 27695, United States of America
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science & Engineering, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research (CIFAR), 661
University Ave, Toronto, Ontario M5G 1M1, Canada
| |
Collapse
|
25
|
Arab I, Laukens K, Bittremieux W. Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set. J Chem Inf Model 2024; 64:6410-6420. [PMID: 39110924 DOI: 10.1021/acs.jcim.4c01102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
26
|
Yang Y, Gan W, Lin L, Wang L, Wu J, Luo J. Identification of Active Molecules against Thrombocytopenia through Machine Learning. J Chem Inf Model 2024; 64:6506-6520. [PMID: 39109515 DOI: 10.1021/acs.jcim.4c00718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Thrombocytopenia, which is associated with thrombopoietin (TPO) deficiency, presents very limited treatment options and can lead to life-threatening complications. Discovering new therapeutic agents against thrombocytopenia has proven to be a challenging task using traditional screening approaches. Fortunately, machine learning (ML) techniques offer a rapid avenue for exploring chemical space, thereby increasing the likelihood of uncovering new drug candidates. In this study, we focused on computational modeling for drug-induced megakaryocyte differentiation and platelet production using ML methods, aiming to gain insights into the structural characteristics of hematopoietic activity. We developed 112 different classifiers by combining eight ML algorithms with 14 molecule features. The top-performing model achieved good results on both 5-fold cross-validation (with an accuracy of 81.6% and MCC value of 0.589) and external validation (with an accuracy of 83.1% and MCC value of 0.642). Additionally, by leveraging the Shapley additive explanations method, the best model provided quantitative assessments of molecular properties and structures that significantly contributed to the predictions. Furthermore, we employed an ensemble strategy to integrate predictions from multiple models and performed in silico predictions for new molecules with potential activity against thrombocytopenia, sourced from traditional Chinese medicine and the Drug Repurposing Hub. The findings of this study could offer valuable insights into the structural characteristics and computational prediction of thrombopoiesis inducers.
Collapse
Affiliation(s)
- Youyou Yang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Wenli Gan
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Lei Lin
- School of Public Health, Southwest Medical University, Luzhou 646000, China
| | - Long Wang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Jianming Wu
- Basic Medical College, Southwest Medical University, Luzhou 646000, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou 646000, China
- State Key Laboratory of Southwestern Chinese Medicine Resources, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, China
| |
Collapse
|
27
|
Khan MZI, Ren JN, Cao C, Ye HYX, Wang H, Guo YM, Yang JR, Chen JZ. Comprehensive hepatotoxicity prediction: ensemble model integrating machine learning and deep learning. Front Pharmacol 2024; 15:1441587. [PMID: 39234116 PMCID: PMC11373136 DOI: 10.3389/fphar.2024.1441587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 07/24/2024] [Indexed: 09/06/2024] Open
Abstract
Background Chemicals may lead to acute liver injuries, posing a serious threat to human health. Achieving the precise safety profile of a compound is challenging due to the complex and expensive testing procedures. In silico approaches will aid in identifying the potential risk of drug candidates in the initial stage of drug development and thus mitigating the developmental cost. Methods In current studies, QSAR models were developed for hepatotoxicity predictions using the ensemble strategy to integrate machine learning (ML) and deep learning (DL) algorithms using various molecular features. A large dataset of 2588 chemicals and drugs was randomly divided into training (80%) and test (20%) sets, followed by the training of individual base models using diverse machine learning or deep learning based on three different kinds of descriptors and fingerprints. Feature selection approaches were employed to proceed with model optimizations based on the model performance. Hybrid ensemble approaches were further utilized to determine the method with the best performance. Results The voting ensemble classifier emerged as the optimal model, achieving an excellent prediction accuracy of 80.26%, AUC of 82.84%, and recall of over 93% followed by bagging and stacking ensemble classifiers method. The model was further verified by an external test set, internal 10-fold cross-validation, and rigorous benchmark training, exhibiting much better reliability than the published models. Conclusion The proposed ensemble model offers a dependable assessment with a good performance for the prediction regarding the risk of chemicals and drugs to induce liver damage.
Collapse
Affiliation(s)
| | - Jia-Nan Ren
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Cheng Cao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Polytechnic Institute, Zhejiang University, Hangzhou, China
| | - Hong-Yu-Xiang Ye
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ya-Min Guo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jin-Rong Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Polytechnic Institute, Zhejiang University, Hangzhou, China
| | - Jian-Zhong Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
28
|
Bosten E, Pardon M, Chen K, Koppen V, Van Herck G, Hellings M, Cabooter D. Assisted Active Learning for Model-Based Method Development in Liquid Chromatography. Anal Chem 2024; 96:13699-13709. [PMID: 38979746 DOI: 10.1021/acs.analchem.4c02700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In recent decades, there has been a growing interest in fully automated methods for tackling complex optimization problems across various fields. Active learning (AL) and its variant, assisted active learning (AAL), incorporating guidance or assistance from external sources into the learning process, play key roles in this automation by enabling the autonomous selection of optimal experimental conditions to efficiently explore the problem space. These approaches are particularly valuable in situations wherein experimentation is costly or time-consuming. This study explores the application of AAL in model-based method development (MD) for liquid chromatography (LC) by using Bayesian statistics to incorporate historical data and analyte information for the generation of initial retention models. The process involves updating the model parameters based on new experiments, coupled with an active data selection method to choose the most informative experiment to run in a subsequent step. This iterative process balances model exploitation and experimental exploration until a satisfactory separation is achieved. The effectiveness of this approach is demonstrated via two practical examples, resulting in optimized separations in a limited number of experiments by optimizing the gradient slope. It is shown that the ability of AAL to leverage past knowledge and compound information to improve accuracy and reduce experimental runs offers a flexible alternative approach to fixed design methods.
Collapse
Affiliation(s)
- Emery Bosten
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, 3000 Leuven, Belgium
- Therapeutics Development & Supply, Janssen Pharmaceutica, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Marie Pardon
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, 3000 Leuven, Belgium
| | - Kai Chen
- Therapeutics Development & Supply, Janssen Pharmaceutica, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Valerie Koppen
- Therapeutics Development & Supply, Janssen Pharmaceutica, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Gerd Van Herck
- Therapeutics Development & Supply, Janssen Pharmaceutica, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Mario Hellings
- Therapeutics Development & Supply, Janssen Pharmaceutica, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Deirdre Cabooter
- Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, 3000 Leuven, Belgium
| |
Collapse
|
29
|
Seal S, Williams D, Hosseini-Gerami L, Mahale M, Carpenter AE, Spjuth O, Bender A. Improved Detection of Drug-Induced Liver Injury by Integrating Predicted In Vivo and In Vitro Data. Chem Res Toxicol 2024; 37:1290-1305. [PMID: 38981058 PMCID: PMC11337212 DOI: 10.1021/acs.chemrestox.4c00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 06/27/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024]
Abstract
Drug-induced liver injury (DILI) has been a significant challenge in drug discovery, often leading to clinical trial failures and necessitating drug withdrawals. Over the last decade, the existing suite of in vitro proxy-DILI assays has generally improved at identifying compounds with hepatotoxicity. However, there is considerable interest in enhancing the in silico prediction of DILI because it allows for evaluating large sets of compounds more quickly and cost-effectively, particularly in the early stages of projects. In this study, we aim to study ML models for DILI prediction that first predict nine proxy-DILI labels and then use them as features in addition to chemical structural features to predict DILI. The features include in vitro (e.g., mitochondrial toxicity, bile salt export pump inhibition) data, in vivo (e.g., preclinical rat hepatotoxicity studies) data, pharmacokinetic parameters of maximum concentration, structural fingerprints, and physicochemical parameters. We trained DILI-prediction models on 888 compounds from the DILI data set (composed of DILIst and DILIrank) and tested them on a held-out external test set of 223 compounds from the DILI data set. The best model, DILIPredictor, attained an AUC-PR of 0.79. This model enabled the detection of the top 25 toxic compounds (2.68 LR+, positive likelihood ratio) compared to models using only structural features (1.65 LR+ score). Using feature interpretation from DILIPredictor, we identified the chemical substructures causing DILI and differentiated cases of DILI caused by compounds in animals but not in humans. For example, DILIPredictor correctly recognized 2-butoxyethanol as nontoxic in humans despite its hepatotoxicity in mice models. Overall, the DILIPredictor model improves the detection of compounds causing DILI with an improved differentiation between animal and human sensitivity and the potential for mechanism evaluation. DILIPredictor required only chemical structures as input for prediction and is publicly available at https://broad.io/DILIPredictor for use via web interface and with all code available for download.
Collapse
Affiliation(s)
- Srijit Seal
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Rd, Cambridge CB2 1EW, United Kingdom
- Imaging
Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, United States
| | - Dominic Williams
- Safety
Innovation, Clinical Pharmacology and Safety Sciences, AstraZeneca, Cambridge CB4 0FZ, United Kingdom
- Quantitative
Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, United Kingdom
| | - Layla Hosseini-Gerami
- Ignota
Laboratories, County Hall, Westminster Bridge Rd, London SE1 7PB, United Kingdom
| | - Manas Mahale
- Bombay
College
of Pharmacy Kalina Santacruz (E), Mumbai 400 098, India
| | - Anne E. Carpenter
- Imaging
Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, United States
| | - Ola Spjuth
- Department
of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, Uppsala SE-75124, Sweden
| | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Rd, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
30
|
Ávila-Avilés RD, Bahena-Culhuac E, Hernández-Hernández JM. (-)-Epicatechin metabolites as a GPER ligands: a theoretical perspective. Mol Divers 2024:10.1007/s11030-024-10968-9. [PMID: 39153018 DOI: 10.1007/s11030-024-10968-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 08/08/2024] [Indexed: 08/19/2024]
Abstract
Diet habits and nutrition quality significantly impact health and disease. Here is delve into the intricate relationship between diet habits, nutrition quality, and their direct impact on health and homeostasis. Focusing on (-)-Epicatechin, a natural flavanol found in various foods like green tea and cocoa, known for its positive effects on cardiovascular health and diabetes prevention. The investigation encompasses the absorption, metabolism, and distribution of (-)-Epicatechin in the human body, revealing a diverse array of metabolites in the circulatory system. Notably, (-)-Epicatechin demonstrates an ability to activate nitric oxide synthase (eNOS) through the G protein-coupled estrogen receptor (GPER). While the precise role of GPER and its interaction with classical estrogen receptors (ERs) remains under scrutiny, the study employs computational methods, including density functional theory, molecular docking, and molecular dynamics simulations, to assess the physicochemical properties and binding affinities of key (-)-Epicatechin metabolites with GPER. DFT analysis revealed distinct physicochemical properties among metabolites, influencing their reactivity and stability. Rigid and flexible molecular docking demonstrated varying binding affinities, with some metabolites surpassing (-)-Epicatechin. Molecular dynamics simulations highlighted potential binding pose variations, while MMGBSA analysis provided insights into the energetics of GPER-metabolite interactions. The outcomes elucidate distinct interactions, providing insights into potential molecular mechanisms underlying the effects of (-)-Epicatechin across varied biological contexts.
Collapse
Affiliation(s)
- Rodolfo Daniel Ávila-Avilés
- Laboratory of Epigenetics of Skeletal Muscle Regeneration, Department of Genetics and Molecular Biology, Centre for Research and Advanced Studies of IPN (CINVESTAV), Mexico City, Mexico
- Transdisciplinary Research for Drug Discovery, Sociedad Mexicana de Epigenética y Medicina Regenerativa A. C. (SMEYMER), Mexico City, Mexico
| | - Erick Bahena-Culhuac
- Laboratory of Epigenetics of Skeletal Muscle Regeneration, Department of Genetics and Molecular Biology, Centre for Research and Advanced Studies of IPN (CINVESTAV), Mexico City, Mexico
- Transdisciplinary Research for Drug Discovery, Sociedad Mexicana de Epigenética y Medicina Regenerativa A. C. (SMEYMER), Mexico City, Mexico
| | - J Manuel Hernández-Hernández
- Laboratory of Epigenetics of Skeletal Muscle Regeneration, Department of Genetics and Molecular Biology, Centre for Research and Advanced Studies of IPN (CINVESTAV), Mexico City, Mexico.
| |
Collapse
|
31
|
Moreira-Filho JT, Ranganath D, Conway M, Schmitt C, Kleinstreuer N, Mansouri K. Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow. J Cheminform 2024; 16:101. [PMID: 39152469 PMCID: PMC11330086 DOI: 10.1186/s13321-024-00894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/06/2024] [Indexed: 08/19/2024] Open
Abstract
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.
Collapse
Affiliation(s)
- José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| | - Dhruv Ranganath
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Charles Schmitt
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| |
Collapse
|
32
|
Beck AG, Fine J, Aggarwal P, Regalado EL, Levorse D, De Jesus Silva J, Sherer EC. Machine learning models and performance dependency on 2D chemical descriptor space for retention time prediction of pharmaceuticals. J Chromatogr A 2024; 1730:465109. [PMID: 38968662 DOI: 10.1016/j.chroma.2024.465109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/07/2024]
Abstract
The predictive modeling of liquid chromatography methods can be an invaluable asset, potentially saving countless hours of labor while also reducing solvent consumption and waste. Tasks such as physicochemical screening and preliminary method screening systems where large amounts of chromatography data are collected from fast and routine operations are particularly well suited for both leveraging large datasets and benefiting from predictive models. Therefore, the generation of predictive models for retention time is an active area of development. However, for these predictive models to gain acceptance, researchers first must have confidence in model performance and the computational cost of building them should be minimal. In this study, a simple and cost-effective workflow for the development of machine learning models to predict retention time using only Molecular Operating Environment 2D descriptors as input for support vector regression is developed. Furthermore, we investigated the relative performance of models based on molecular descriptor space by utilizing uniform manifold approximation and projection and clustering with Gaussian mixture models to identify chemically distinct clusters. Results outlined herein demonstrate that local models trained on clusters in chemical space perform equivalently when compared to models trained on all data. Through 10-fold cross-validation on a comprehensive set containing 67,950 of our company's proprietary analytes, these models achieved coefficients of determination of 0.84 and 3 % error in terms of retention time. This promising statistical significance is found to translate from cross-validation to prospective prediction on an external test set of pharmaceutically relevant analytes. The observed equivalency of global and local modeling of large datasets is retained with METLIN's SMRT dataset, thereby confirming the wider applicability of the developed machine learning workflows for global models.
Collapse
Affiliation(s)
- Armen G Beck
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
| | - Erik L Regalado
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Dorothy Levorse
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | | | - Edward C Sherer
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
33
|
Kong MM, Wei T, Liu B, Xi ZX, Ding JT, Liu X, Li K, Qin TL, Qian ZY, Wu WC, Wu JZ, Li WL. Discovery of novel ULK1 inhibitors through machine learning-guided virtual screening and biological evaluation. Future Med Chem 2024; 16:1821-1837. [PMID: 39145469 PMCID: PMC11485869 DOI: 10.1080/17568919.2024.2385288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 07/16/2024] [Indexed: 08/16/2024] Open
Abstract
Aim: Build a virtual screening model for ULK1 inhibitors based on artificial intelligence.Materials & methods: Build machine learning and deep learning classification models and combine molecular docking and biological evaluation to screen ULK1 inhibitors from 13 million compounds. And molecular dynamics was used to explore the binding mechanism of active compounds.Results & conclusion: Possibly due to less available training data, machine learning models significantly outperform deep learning models. Among them, the Naive Bayes model has the best performance. Through virtual screening, we obtained three inhibitors with IC50 of μM level and they all bind well to ULK1. This study provides an efficient virtual screening model and three promising compounds for the study of ULK1 inhibitors.
Collapse
Affiliation(s)
- Miao-Miao Kong
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision & Brain Health), Wenzhou, Zhejiang, 325000, China
| | - Tao Wei
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Bo Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, SAR, China
| | - Zi-Xuan Xi
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Jun-Tao Ding
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Xin Liu
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Ke Li
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Tian-Li Qin
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Zhen-Yong Qian
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| | - Wen-Can Wu
- The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jian-Zhang Wu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision & Brain Health), Wenzhou, Zhejiang, 325000, China
- The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Wu-Lan Li
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325035, China
| |
Collapse
|
34
|
Siddiqui H, Usmani T. Interpretable AI and Machine Learning Classification for Identifying High-Efficiency Donor-Acceptor Pairs in Organic Solar Cells. ACS OMEGA 2024; 9:34445-34455. [PMID: 39157121 PMCID: PMC11325493 DOI: 10.1021/acsomega.4c02157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 06/08/2024] [Accepted: 06/13/2024] [Indexed: 08/20/2024]
Abstract
To enhance the efficiency of organic solar cells, accurately predicting the efficiency of new pairs of donor and acceptor materials is crucial. Presently, most machine learning studies rely on regression models, which often struggle to establish clear rules for distinguishing between high- and low-performing donor-acceptor pairs. This study proposes a novel approach by integrating interpretable AI, specifically using Shapely values, with four supervised machine learning classification models, namely, support vector machines, decision trees, random forest, and gradient boosting. These models aim to identify high-efficiency donor-acceptor pairs based solely on chemical structures and to extract important features that establish general design principles for distinguishing between high- and low-efficiency pairs. For validation purposes, an unsupervised machine learning algorithm utilizing loading vectors obtained from the principal component analysis is employed to identify crucial features associated with high-efficiency donor-acceptor pairs. Interestingly, the features identified by the supervised machine learning approach were found to be a subset of those identified by the unsupervised method. Noteworthy features include the van der Waals surface area, partial equalization of orbital electronegativity, Moreau-Broto autocorrelation, and molecular substructures. Leveraging these features, a backward-working model can be developed, facilitating exploration across a wide array of materials used in organic solar cells. This innovative approach will help navigate the vast chemical compound space of donor and acceptor materials essential in creating high-efficiency organic solar cells.
Collapse
Affiliation(s)
- Hamza Siddiqui
- Organic PV Lab, Integral University, Lucknow 226026, India
| | - Tahsin Usmani
- Organic PV Lab, Integral University, Lucknow 226026, India
| |
Collapse
|
35
|
Zheng T, Mitchell JBO, Dobson S. Revisiting the Application of Machine Learning Approaches in Predicting Aqueous Solubility. ACS OMEGA 2024; 9:35209-35222. [PMID: 39157153 PMCID: PMC11325511 DOI: 10.1021/acsomega.4c06163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 07/19/2024] [Accepted: 07/22/2024] [Indexed: 08/20/2024]
Abstract
The solubility of chemical substances in water is a critical parameter in pharmaceutical development, environmental chemistry, agrochemistry, and other fields; however, accurately predicting it remains a challenge. This study aims to evaluate and compare the effectiveness of some of the most popular machine learning modeling methods and molecular featurization techniques in predicting aqueous solubility. Although these methods were not implemented in a competitive environment, some of their performance surpassed previous benchmarks, offering gradual but significant improvements. Our results show that methods based on graph convolution and graph attention mechanisms demonstrated exceptional predictive abilities with high-quality data sets, albeit with a sensitivity to data noise and errors. In contrast, models leveraging molecular descriptors not only provided better interpretability but also showed more resilience when dealing with inherent noise and errors in data. Our analysis of over 4000 molecular descriptors used in various models identified that approximately 800 of these descriptors make a significant contribution to solubility prediction. These insights offer guidance and direction for future developments in solubility prediction.
Collapse
Affiliation(s)
- Tianyuan Zheng
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| | - John B. O. Mitchell
- EaStCHEM
School of Chemistry, University of St Andrews, St Andrews, Fife KY16 9ST, U.K.
| | - Simon Dobson
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| |
Collapse
|
36
|
Banerjee A, Sharma A, Kamble P, Garg P. Prediction of Mycobacterium tuberculosis cell wall permeability using machine learning methods. Mol Divers 2024; 28:2317-2329. [PMID: 39133353 DOI: 10.1007/s11030-024-10952-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 07/26/2024] [Indexed: 08/13/2024]
Abstract
Tuberculosis (TB) caused by the bacteria Mycobacterium tuberculosis (M. tb), continues to pose a significant worldwide health threat. The advent of drug-resistant strains of the disease highlights the critical need for novel treatments. The unique cell wall of M. tb provides an extra layer of protection for the bacteria and hence only compounds that can penetrate this barrier can reach their targets within the bacterial cell wall. The creation of a reliable machine learning (ML) model to predict the mycobacterial cell wall permeability of small molecules is presented in this work and four ML algorithms, including Random Forest, Support Vector Machines (SVM), k-nearest Neighbour (k-NN) and Logistic Regression were trained on a dataset of 5368 compounds. RDKit and Mordred toolkits were used to calculate features. To determine the most effective model, various performance metrics were used such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve. The best-performing model was further refined with hyperparameter tuning and tenfold cross-validation. The SVM model with filtering outperformed the other machine learning models and demonstrated 80.26% and 81.13% accuracy on the test and validation datasets, respectively. The study also provided insights into the molecular descriptors that play the most important role in predicting the ability of a molecule to pass the M. tb cell wall, which could guide future compound design. The model is available at https://github.com/PGlab-NIPER/MTB_Permeability .
Collapse
Affiliation(s)
- Aritra Banerjee
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S. A. S. Nagar, Punjab, 160 062, India
| | - Anju Sharma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S. A. S. Nagar, Punjab, 160 062, India
| | - Pradnya Kamble
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S. A. S. Nagar, Punjab, 160 062, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S. A. S. Nagar, Punjab, 160 062, India.
| |
Collapse
|
37
|
Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024; 178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]
Abstract
Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| |
Collapse
|
38
|
Hao Y, Li B, Huang D, Wu S, Wang T, Fu L, Liu X. Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery. Int J Mol Sci 2024; 25:8239. [PMID: 39125808 PMCID: PMC11312053 DOI: 10.3390/ijms25158239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/26/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
Collapse
Affiliation(s)
- Yang Hao
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Bo Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Sijin Wu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Tianjun Wang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Lei Fu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Xin Liu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| |
Collapse
|
39
|
Wu T, Zhou M, Zou J, Chen Q, Qian F, Kurths J, Liu R, Tang Y. AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria. Nat Commun 2024; 15:6288. [PMID: 39060236 PMCID: PMC11282099 DOI: 10.1038/s41467-024-50533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Host defense peptide (HDP)-mimicking polymers are promising therapeutic alternatives to antibiotics and have large-scale untapped potential. Artificial intelligence (AI) exhibits promising performance on large-scale chemical-content design, however, existing AI methods face difficulties on scarcity data in each family of HDP-mimicking polymers (<102), much smaller than public polymer datasets (>105), and multi-constraints on properties and structures when exploring high-dimensional polymer space. Herein, we develop a universal AI-guided few-shot inverse design framework by designing multi-modal representations to enrich polymer information for predictions and creating a graph grammar distillation for chemical space restriction to improve the efficiency of multi-constrained polymer generation with reinforcement learning. Exampled with HDP-mimicking β-amino acid polymers, we successfully simulate predictions of over 105 polymers and identify 83 optimal polymers. Furthermore, we synthesize an optimal polymer DM0.8iPen0.2 and find that this polymer exhibits broad-spectrum and potent antibacterial activity against multiple clinically isolated antibiotic-resistant pathogens, validating the effectiveness of AI-guided design strategy.
Collapse
Affiliation(s)
- Tianyu Wu
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Min Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Jingcheng Zou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Qi Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Feng Qian
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Potsdam, 14473, Germany
- Institut für Physik, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- The Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
| | - Runhui Liu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China.
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yang Tang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
40
|
Xu Y, Ma S, Cui H, Chen J, Xu S, Gong F, Golubovic A, Zhou M, Wang KC, Varley A, Lu RXZ, Wang B, Li B. AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery. Nat Commun 2024; 15:6305. [PMID: 39060305 PMCID: PMC11282250 DOI: 10.1038/s41467-024-50619-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Ionizable lipid nanoparticles (LNPs) are seeing widespread use in mRNA delivery, notably in SARS-CoV-2 mRNA vaccines. However, the expansion of mRNA therapies beyond COVID-19 is impeded by the absence of LNPs tailored for diverse cell types. In this study, we present the AI-Guided Ionizable Lipid Engineering (AGILE) platform, a synergistic combination of deep learning and combinatorial chemistry. AGILE streamlines ionizable lipid development with efficient library design, in silico lipid screening via deep neural networks, and adaptability to diverse cell lines. Using AGILE, we rapidly design, synthesize, and evaluate ionizable lipids for mRNA delivery, selecting from a vast library. Intriguingly, AGILE reveals cell-specific preferences for ionizable lipids, indicating tailoring for optimal delivery to varying cell types. These highlight AGILE's potential in expediting the development of customized LNPs, addressing the complex needs of mRNA delivery in clinical practice, thereby broadening the scope and efficacy of mRNA therapies.
Collapse
Affiliation(s)
- Yue Xu
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Shihao Ma
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Peter Munk Cardiac Centre, University Health Network, Toronto, ON, Canada
| | - Haotian Cui
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Peter Munk Cardiac Centre, University Health Network, Toronto, ON, Canada
| | - Jingan Chen
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| | - Shufen Xu
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Fanglin Gong
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada
| | - Alex Golubovic
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Muye Zhou
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Kevin Chang Wang
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Andrew Varley
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Rick Xing Ze Lu
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada
| | - Bo Wang
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
- Peter Munk Cardiac Centre, University Health Network, Toronto, ON, Canada.
- Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada.
| | - Bowen Li
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada.
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada.
- Department of Chemistry, University of Toronto, Toronto, ON, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
41
|
Li J, Yanagisawa K, Akiyama Y. CycPeptMP: enhancing membrane permeability prediction of cyclic peptides with multi-level molecular features and data augmentation. Brief Bioinform 2024; 25:bbae417. [PMID: 39210505 PMCID: PMC11361855 DOI: 10.1093/bib/bbae417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 07/23/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024] Open
Abstract
Cyclic peptides are versatile therapeutic agents that boast high binding affinity, minimal toxicity, and the potential to engage challenging protein targets. However, the pharmaceutical utility of cyclic peptides is limited by their low membrane permeability-an essential indicator of oral bioavailability and intracellular targeting. Current machine learning-based models of cyclic peptide permeability show variable performance owing to the limitations of experimental data. Furthermore, these methods use features derived from the whole molecule that have traditionally been used to predict small molecules and ignore the unique structural properties of cyclic peptides. This study presents CycPeptMP: an accurate and efficient method to predict cyclic peptide membrane permeability. We designed features for cyclic peptides at the atom-, monomer-, and peptide-levels and seamlessly integrated these into a fusion model using deep learning technology. Additionally, we applied various data augmentation techniques to enhance model training efficiency using the latest data. The fusion model exhibited excellent prediction performance for the logarithm of permeability, with a mean absolute error of $0.355$ and correlation coefficient of $0.883$. Ablation studies demonstrated that all feature levels contributed and were relatively essential to predicting membrane permeability, confirming the effectiveness of augmentation to improve prediction accuracy. A comparison with a molecular dynamics-based method showed that CycPeptMP accurately predicted peptide permeability, which is otherwise difficult to predict using simulations.
Collapse
Affiliation(s)
- Jianan Li
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
| | - Keisuke Yanagisawa
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Middle-Molecule ITbased Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Tokyo 1528550, Japan
| | - Yutaka Akiyama
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Middle-Molecule ITbased Drug Discovery Laboratory (MIDL), Tokyo Institute of Technology, Tokyo 1528550, Japan
| |
Collapse
|
42
|
Androutsos L, Pallante L, Bompotas A, Stojceski F, Grasso G, Piga D, Di Benedetto G, Alexakos C, Kalogeras A, Theofilatos K, Deriu MA, Mavroudi S. Predicting multiple taste sensations with a multiobjective machine learning method. NPJ Sci Food 2024; 8:47. [PMID: 39054312 PMCID: PMC11272927 DOI: 10.1038/s41538-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 07/05/2024] [Indexed: 07/27/2024] Open
Abstract
Taste perception plays a pivotal role in guiding nutrient intake and aiding in the avoidance of potentially harmful substances through five basic tastes - sweet, bitter, umami, salty, and sour. Taste perception originates from molecular interactions in the oral cavity between taste receptors and chemical tastants. Hence, the recognition of taste receptors and the subsequent perception of taste heavily rely on the physicochemical properties of food ingredients. In recent years, several advances have been made towards the development of machine learning-based algorithms to classify chemical compounds' tastes using their molecular structures. Despite the great efforts, there remains significant room for improvement in developing multi-class models to predict the entire spectrum of basic tastes. Here, we present a multi-class predictor aimed at distinguishing bitter, sweet, and umami, from other taste sensations. The development of a multi-class taste predictor paves the way for a comprehensive understanding of the chemical attributes associated with each fundamental taste. It also opens the potential for integration into the evolving realm of multi-sensory perception, which encompasses visual, tactile, and olfactory sensations to holistically characterize flavour perception. This concept holds promise for introducing innovative methodologies in the rational design of foods, including pre-determining specific tastes and engineering complementary diets to augment traditional pharmacological treatments.
Collapse
Affiliation(s)
| | - Lorenzo Pallante
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Torino, 10129, Italy
| | - Agorakis Bompotas
- Industrial Systems Institute, Athena Research Center, 265 04, Patras, Greece
| | - Filip Stojceski
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | - Gianvito Grasso
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | - Dario Piga
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | | | - Christos Alexakos
- Industrial Systems Institute, Athena Research Center, 265 04, Patras, Greece
| | | | | | - Marco A Deriu
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Torino, 10129, Italy
| | - Seferina Mavroudi
- InSyBio PC, Patras, 265 04, Greece
- Department of Nursing, University of Patras, 265 04, Patras, Greece
| |
Collapse
|
43
|
Xu S, Zhu Z, Delafield DG, Rigby MJ, Lu G, Braun M, Puglielli L, Li L. Spatially and temporally probing distinctive glycerophospholipid alterations in Alzheimer's disease mouse brain via high-resolution ion mobility-enabled sn-position resolved lipidomics. Nat Commun 2024; 15:6252. [PMID: 39048572 PMCID: PMC11269705 DOI: 10.1038/s41467-024-50299-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 07/08/2024] [Indexed: 07/27/2024] Open
Abstract
Dysregulated glycerophospholipid (GP) metabolism in the brain is associated with the progression of neurodegenerative diseases including Alzheimer's disease (AD). Routine liquid chromatography-mass spectrometry (LC-MS)-based large-scale lipidomic methods often fail to elucidate subtle yet important structural features such as sn-position, hindering the precise interrogation of GP molecules. Leveraging high-resolution demultiplexing (HRdm) ion mobility spectrometry (IMS), we develop a four-dimensional (4D) lipidomic strategy to resolve GP sn-position isomers. We further construct a comprehensive experimental 4D GP database of 498 GPs identified from the mouse brain and an in-depth extended 4D library of 2500 GPs predicted by machine learning, enabling automated profiling of GPs with detailed acyl chain sn-position assignment. Analyzing three mouse brain regions (hippocampus, cerebellum, and cortex), we successfully identify a total of 592 GPs including 130 pairs of sn-position isomers. Further temporal GPs analysis in the three functional brain regions illustrates their metabolic alterations in AD progression.
Collapse
Affiliation(s)
- Shuling Xu
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Zhijun Zhu
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Daniel G Delafield
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael J Rigby
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Gaoyuan Lu
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Megan Braun
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Luigi Puglielli
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Geriatric Research Education Clinical Center, Veterans Affairs Medical Center, Madison, WI, 53705, USA
| | - Lingjun Li
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Lachman Institute for Pharmaceutical Development, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Wisconsin Center for NanoBioSystems, School of Pharmacy, University of Wisconsin- Madison, Madison, WI, 53705, USA.
| |
Collapse
|
44
|
Liang T, Liu W, Tan K, Wu A, Lu X. Advancing Ionic Liquid Research with pSCNN: A Novel Approach for Accurate Normal Melting Temperature Predictions. ACS OMEGA 2024; 9:31694-31702. [PMID: 39072063 PMCID: PMC11270577 DOI: 10.1021/acsomega.4c02393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/12/2024] [Accepted: 06/25/2024] [Indexed: 07/30/2024]
Abstract
Ionic liquids (ILs), known for their distinct and tunable properties, offer a broad spectrum of potential applications across various fields, including chemistry, materials science, and energy storage. However, practical applications of ILs are often limited by their unfavorable physicochemical properties. Experimental screening becomes impractical due to the vast number of potential IL combinations. Therefore, the development of a robust and efficient model for predicting the IL properties is imperative. As the defining feature, it is of practice significance to establish an accurate yet efficient model to predict the normal melting point of IL (T m), which may facilitate the discovery and design of novel ILs for specific applications. In this study, we presented a pseudo-Siamese convolution neural network (pSCNN) inspired by SCNN and focused on the T m. Utilizing a data set of 3098 ILs, we systematically assess various deep learning models (ANN, pSCNN, and Transformer-CNF), along with molecular descriptors (ECFP fingerprint and Mordred properties), for their performance in predicting the T m of ILs. Remarkably, among the investigated modeling schemes, the pSCNN, coupled with filtered Mordred descriptors, demonstrates superior performance, yielding mean absolute error (MAE) and root-mean-square error (RMSE) values of 24.36 and 31.56 °C, respectively. Feature analysis further highlights the effectiveness of the pSCNN model. Moreover, the pSCNN method, with a pair of inputs, can be extended beyond ionic liquid melting point prediction.
Collapse
Affiliation(s)
- Tao Liang
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Wei Liu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Kai Tan
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Anan Wu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Xin Lu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| |
Collapse
|
45
|
He S, Nader K, Abarrategi JS, Bediaga H, Nocedo-Mena D, Ascencio E, Casanola-Martin GM, Castellanos-Rubio I, Insausti M, Rasulev B, Arrasate S, González-Díaz H. NANO.PTML model for read-across prediction of nanosystems in neurosciences. computational model and experimental case of study. J Nanobiotechnology 2024; 22:435. [PMID: 39044265 PMCID: PMC11267683 DOI: 10.1186/s12951-024-02660-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 06/24/2024] [Indexed: 07/25/2024] Open
Abstract
Neurodegenerative diseases involve progressive neuronal death. Traditional treatments often struggle due to solubility, bioavailability, and crossing the Blood-Brain Barrier (BBB). Nanoparticles (NPs) in biomedical field are garnering growing attention as neurodegenerative disease drugs (NDDs) carrier to the central nervous system. Here, we introduced computational and experimental analysis. In the computational study, a specific IFPTML technique was used, which combined Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) to select the most promising Nanoparticle Neuronal Disease Drug Delivery (N2D3) systems. For the application of IFPTML model in the nanoscience, NANO.PTML is used. IF-process was carried out between 4403 NDDs assays and 260 cytotoxicity NP assays conducting a dataset of 500,000 cases. The optimal IFPTML was the Decision Tree (DT) algorithm which shown satisfactory performance with specificity values of 96.4% and 96.2%, and sensitivity values of 79.3% and 75.7% in the training (375k/75%) and validation (125k/25%) set. Moreover, the DT model obtained Area Under Receiver Operating Characteristic (AUROC) scores of 0.97 and 0.96 in the training and validation series, highlighting its effectiveness in classification tasks. In the experimental part, two samples of NPs (Fe3O4_A and Fe3O4_B) were synthesized by thermal decomposition of an iron(III) oleate (FeOl) precursor and structurally characterized by different methods. Additionally, in order to make the as-synthesized hydrophobic NPs (Fe3O4_A and Fe3O4_B) soluble in water the amphiphilic CTAB (Cetyl Trimethyl Ammonium Bromide) molecule was employed. Therefore, to conduct a study with a wider range of NP system variants, an experimental illustrative simulation experiment was performed using the IFPTML-DT model. For this, a set of 500,000 prediction dataset was created. The outcome of this experiment highlighted certain NANO.PTML systems as promising candidates for further investigation. The NANO.PTML approach holds potential to accelerate experimental investigations and offer initial insights into various NP and NDDs compounds, serving as an efficient alternative to time-consuming trial-and-error procedures.
Collapse
Affiliation(s)
- Shan He
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58108, USA
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
- IKERDATA S.L., ZITEK, UPV/EHU, Rectorate Building, nº 6, Leioa, 48940, Greater Bilbao, Basque Country, Spain
| | - Karam Nader
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
| | - Julen Segura Abarrategi
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
| | - Harbil Bediaga
- IKERDATA S.L., ZITEK, UPV/EHU, Rectorate Building, nº 6, Leioa, 48940, Greater Bilbao, Basque Country, Spain
| | - Deyani Nocedo-Mena
- Faculty of Physical Mathematical Sciences, Autonomous University of Nuevo León, San Nicolás de los Garza, 66455, Nuevo León, México
| | - Estefania Ascencio
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58108, USA
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
- IKERDATA S.L., ZITEK, UPV/EHU, Rectorate Building, nº 6, Leioa, 48940, Greater Bilbao, Basque Country, Spain
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58108, USA
| | - Idoia Castellanos-Rubio
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain.
| | - Maite Insausti
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Leioa, 48940, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58108, USA
| | - Sonia Arrasate
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain.
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa, 48940, Spain
- BIOFISIKA: Basque Center for Biophysics CSIC, University of The Basque Country (UPV/EHU), Barrio Sarriena s/n, Leioa, 48940, Bizkaia, Basque Country, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, 48011, Biscay, Spain
| |
Collapse
|
46
|
Tahıl G, Delorme F, Le Berre D, Monflier É, Sayede A, Tilloy S. Stereoisomers Are Not Machine Learning's Best Friends. J Chem Inf Model 2024; 64:5451-5469. [PMID: 38949069 DOI: 10.1021/acs.jcim.4c00318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.
Collapse
Affiliation(s)
- Gökhan Tahıl
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Fabien Delorme
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Daniel Le Berre
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Éric Monflier
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Adlane Sayede
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Sébastien Tilloy
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| |
Collapse
|
47
|
Ayala-Orozco C, Teimouri H, Medvedeva A, Li B, Lathem A, Li G, Kolomeisky AB, Tour JM. Chemoinformatics Insights on Molecular Jackhammers and Cancer Cells. J Chem Inf Model 2024; 64:5570-5579. [PMID: 38958581 DOI: 10.1021/acs.jcim.4c00806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
One of the most challenging tasks in modern medicine is to find novel efficient cancer therapeutic methods with minimal side effects. The recent discovery of several classes of organic molecules known as "molecular jackhammers" is a promising development in this direction. It is known that these molecules can directly target and eliminate cancer cells with no impact on healthy tissues. However, the underlying microscopic picture remains poorly understood. We present a study that utilizes theoretical analysis together with experimental measurements to clarify the microscopic aspects of jackhammers' anticancer activities. Our physical-chemical approach combines statistical analysis with chemoinformatics methods to design and optimize molecular jackhammers. By correlating specific physical-chemical properties of these molecules with their abilities to kill cancer cells, several important structural features are identified and discussed. Although our theoretical analysis enhances understanding of the molecular interactions of jackhammers, it also highlights the need for further research to comprehensively elucidate their mechanisms and to develop a robust physical-chemical framework for the rational design of targeted anticancer drugs.
Collapse
Affiliation(s)
| | - Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Bowen Li
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Alex Lathem
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Gang Li
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
| | - James M Tour
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Department of Materials Science and NanoEngineering, Rice University, Houston, Texas 77005, United States
- Smalley-Curl Institute, Rice University, Houston, Texas 77005, United States
- Rice Advanced Materials Institute, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
48
|
Zhang M, Hiki Y, Funahashi A, Kobayashi TJ. A deep position-encoding model for predicting olfactory perception from molecular structures and electrostatics. NPJ Syst Biol Appl 2024; 10:76. [PMID: 39019918 PMCID: PMC11255234 DOI: 10.1038/s41540-024-00401-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 06/27/2024] [Indexed: 07/19/2024] Open
Abstract
Predicting olfactory perceptions from odorant molecules is challenging due to the complex and potentially discontinuous nature of the perceptual space for smells. In this study, we introduce a deep learning model, Mol-PECO (Molecular Representation by Positional Encoding of Coulomb Matrix), designed to predict olfactory perceptions based on molecular structures and electrostatics. Mol-PECO learns the efficient embedding of molecules by utilizing the Coulomb matrix, which encodes atomic coordinates and charges, as an alternative of the adjacency matrix and its Laplacian eigenfunctions as positional encoding of atoms. With a comprehensive dataset of odor molecules and descriptors, Mol-PECO outperforms traditional machine learning methods using molecular fingerprints and graph neural networks based on adjacency matrices. The learned embeddings by Mol-PECO effectively capture the odor space, enabling global clustering of descriptors and local retrieval of similar odorants. This work contributes to a deeper understanding of the olfactory sense and its mechanisms.
Collapse
Affiliation(s)
- Mengji Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
- Institute of Industrial Science, The University of Tokyo, Tokyo, Japan.
| | - Yusuke Hiki
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
| | - Akira Funahashi
- Department of Biosciences and Informatics, Keio University, Yokohama, Japan
| | | |
Collapse
|
49
|
Howard JR, Shuluk JR, Bhakare A, Anslyn EV. Data-science-guided calibration curve prediction of an MLCT-based ee determination assay for chiral amines. Chem 2024; 10:2074-2088. [PMID: 39006239 PMCID: PMC11243635 DOI: 10.1016/j.chempr.2024.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Circular dichroism (CD) based enantiomeric excess (ee) determination assays are optical alternatives to chromatographic ee determination in high-throughput screening (HTS) applications. However, the implementation of these assays requires calibration experiments using enantioenriched materials. We present a data-driven approach that circumvents the need for chiral resolution and calibration experiments for an octahedral Fe(II) complex (1) used for the ee determination of α-chiral primary amines. By computationally parameterizing the imine ligands formed in the assay conditions, a model of the circular dichroism (CD) response of the Fe(II) assembly was developed. Using this model, calibration curves were generated for four analytes and compared to experimentally generated curves. In a single-blind ee determination study, the ee values of unknown samples were determined within 9% mean absolute error, which rivals the error using experimentally generated calibration curves.
Collapse
Affiliation(s)
- James R. Howard
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78705 (USA)
| | - Julia R. Shuluk
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78705 (USA)
| | - Arya Bhakare
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78705 (USA)
| | - Eric V. Anslyn
- Department of Chemistry, The University of Texas at Austin, Austin, TX 78705 (USA)
- Lead contact
| |
Collapse
|
50
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|