1
|
Viesi E, Perricone U, Aloy P, Giugno R. APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions. J Cheminform 2025; 17:13. [PMID: 39891207 PMCID: PMC11786462 DOI: 10.1186/s13321-025-00961-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 01/20/2025] [Indexed: 02/03/2025] Open
Abstract
More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated yet, and further exploration could shed light on the impact of air pollution on complex disorders. Therefore, a biological signature that simultaneously captures the chemistry and the biology of small molecules may be beneficial in predicting the behaviour of such ligands towards a protein target. Moreover, the interactivity between biological entities can be represented through combined feature vectors that can be given as input to a machine learning (ML) model to capture the underlying interaction. To this end, we propose a chemogenomic approach, called Air Pollutant Bioactivity (APBIO), which integrates compound bioactivity signatures and target sequence descriptors to train ML classifiers subsequently used to predict potential compound-target interactions (CTIs). We report the performances of the proposed methodology and, via external validation sets, demonstrate its outperformance compared to existing molecular representations in terms of model generalizability. We have also developed a publicly available Streamlit application for APBIO at ap-bio.streamlit.app, allowing users to predict associations between investigated compounds and protein targets.Scientific contributionWe derived ex novo bioactivity signatures for air pollutant molecules to capture their biological behaviour and associations with protein targets. The proposed chemogenomic methodology enables the prediction of novel CTIs for known or similar compounds and targets through well-established and efficient ML models, deepening our insight into the molecular interactions and mechanisms that may have a deleterious impact on human biological systems.
Collapse
Affiliation(s)
- Eva Viesi
- Department of Computer Science, University of Verona, Verona, Italy.
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- NBFC, National Biodiversity Future Center, Palermo, Italy.
| | - Ugo Perricone
- Molecular Informatics Unit, Ri.MED Foundation, Palermo, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| |
Collapse
|
2
|
Schaufelberger L, Blaskovits JT, Laplaza R, Jorner K, Corminboeuf C. Inverse Design of Singlet-Fission Materials with Uncertainty-Controlled Genetic Optimization. Angew Chem Int Ed Engl 2025; 64:e202415056. [PMID: 39321389 PMCID: PMC11735885 DOI: 10.1002/anie.202415056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/20/2024] [Accepted: 09/24/2024] [Indexed: 09/27/2024]
Abstract
Singlet fission has shown potential for boosting the efficiency of solar cells, but the scarcity of suitable molecular materials hinders its implementation. We introduce an uncertainty-controlled genetic algorithm (ucGA) based on ensemble machine learning predictions from different molecular representations that concurrently optimizes excited state energies, synthesizability, and exciton size for the discovery of singlet fission materials. The ucGA allows us to efficiently explore the chemical space spanned by the reFORMED fragment database, which consists of 45,000 cores and 5,000 substituents derived from crystallographic structures assembled in the FORMED repository. Running the ucGA in an exploitative setup performs local optimization on variations of known singlet fission scaffolds, such as acenes. In an explorative mode, hitherto unknown candidates displaying excellent excited state properties for singlet fission are generated. We suggest a class of heteroatom-rich mesoionic compounds as acceptors for charge-transfer mediated singlet fission. When included in larger donor-acceptor systems, these units exhibit localization of the triplet state, favorable diradicaloid character and suitable triplet energies for exciton injection into semiconductor solar cells.
Collapse
Affiliation(s)
- Luca Schaufelberger
- École polytechnique fédérale de Lausanne (EPFL)Institute of Chemical Sciences and EngineeringLausanneSwitzerland, CH-1015
| | - J. Terence Blaskovits
- École polytechnique fédérale de Lausanne (EPFL)Institute of Chemical Sciences and EngineeringLausanneSwitzerland, CH-1015
| | - Ruben Laplaza
- École polytechnique fédérale de Lausanne (EPFL)Institute of Chemical Sciences and EngineeringLausanneSwitzerland, CH-1015
- National Center for Competence in Research – Catalysis (NCCR-Catalysis)École polytechnique fédérale de Lausanne (EPFL)LausanneSwitzerland, CH-1015
| | - Kjell Jorner
- ETH Zürich, Institute of Chemical and BioengineeringDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 1ZürichSwitzerlandCH-8093
| | - Clemence Corminboeuf
- École polytechnique fédérale de Lausanne (EPFL)Institute of Chemical Sciences and EngineeringLausanneSwitzerland, CH-1015
- National Center for Competence in Research – Catalysis (NCCR-Catalysis)École polytechnique fédérale de Lausanne (EPFL)LausanneSwitzerland, CH-1015
| |
Collapse
|
3
|
Lanini J, Huynh MTD, Scebba G, Schneider N, Rodríguez-Pérez R. UNIQUE: A Framework for Uncertainty Quantification Benchmarking. J Chem Inf Model 2024; 64:8379-8386. [PMID: 39542432 PMCID: PMC11600502 DOI: 10.1021/acs.jcim.4c01578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 10/17/2024] [Accepted: 10/30/2024] [Indexed: 11/17/2024]
Abstract
Machine learning (ML) models have become key in decision-making for many disciplines, including drug discovery and medicinal chemistry. ML models are generally evaluated prior to their usage in high-stakes decisions, such as compound synthesis or experimental testing. However, no ML model is robust or predictive in all real-world scenarios. Therefore, uncertainty quantification (UQ) in ML predictions has gained importance in recent years. Many investigations have focused on developing methodologies that provide accurate uncertainty estimates for ML-based predictions. Unfortunately, there is no UQ strategy that consistently provides robust estimates about model's applicability on new samples. Depending on the dataset, prediction task, and algorithm, accurate uncertainty estimations might be unfeasible to obtain. Moreover, the optimum UQ metric also varies across applications, and previous investigations have shown a lack of consistency across benchmarks. Herein, the UNIQUE (UNcertaInty QUantification bEnchmarking) framework is introduced to facilitate a comparison of UQ strategies in ML-based predictions. This Python library unifies the benchmarking of multiple UQ metrics, including the calculation of nonstandard UQ metrics (combining information from the dataset and model), and provides a comprehensive evaluation. In this framework, UQ metrics are evaluated for different application scenarios, e.g., eliminating the predictions with the lowest confidence or obtaining a reliable uncertainty estimate for an acquisition function. Taken together, this library will help to standardize UQ investigations and evaluate new methodologies.
Collapse
Affiliation(s)
- Jessica Lanini
- Novartis Biomedical Research, Novartis Campus, 4002 Basel, Switzerland
| | | | - Gaetano Scebba
- Novartis Biomedical Research, Novartis Campus, 4002 Basel, Switzerland
| | - Nadine Schneider
- Novartis Biomedical Research, Novartis Campus, 4002 Basel, Switzerland
| | | |
Collapse
|
4
|
Ambe K, Nakamori M, Tohno R, Suzuki K, Sasaki T, Tohkin M, Yoshinari K. Machine Learning-Based In Silico Prediction of the Inhibitory Activity of Chemical Substances Against Rat and Human Cytochrome P450s. Chem Res Toxicol 2024; 37:1843-1850. [PMID: 39427263 DOI: 10.1021/acs.chemrestox.4c00168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2024]
Abstract
The prediction of cytochrome P450 inhibition by a computational (quantitative) structure-activity relationship approach using chemical structure information and machine learning would be useful for toxicity research as a simple and rapid in silico tool. However, there are few in silico models focusing on the species differences between rat and human in the P450s inhibition. This study aimed to establish in silico models to classify chemical substances as inhibitors or non-inhibitors of various rat and human P450s, using only molecular descriptors. Using the in-house test results from our in vitro experiments, we used 326 substances for model construction and internal validation data. Apart from the 326 substances, 60 substances were used as external validation data set. We focused on seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). Most of the models established using XGBoost showed an area under the receiver operating characteristic curve (ROC-AUC) of 0.8 or more in the internal validation. When we set an applicability domain for the models and confirmed their generalization performance through external validation, most of the models showed an ROC-AUC of 0.7 or more. Interestingly, for CYP1A1 and CYP1A2, we discovered that a human P450 inhibitory activity model can predict rat P450 inhibitory activity and vice versa. These models are the first attempts to predict inhibitory activity against a wide variety of P450s in both rats and humans using chemical structure information. Our experimental results and in silico models would be helpful to support information for species similarities and differences in chemical-induced toxicity.
Collapse
Affiliation(s)
- Kaori Ambe
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Mizuki Nakamori
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Riku Tohno
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kotaro Suzuki
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Takamitsu Sasaki
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| | - Masahiro Tohkin
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| |
Collapse
|
5
|
Gadaleta D, Garcia de Lomana M, Serrano-Candelas E, Ortega-Vallbona R, Gozalbes R, Roncaglioni A, Benfenati E. Quantitative structure-activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity. J Cheminform 2024; 16:122. [PMID: 39501321 PMCID: PMC11539312 DOI: 10.1186/s13321-024-00917-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/18/2024] [Indexed: 11/08/2024] Open
Abstract
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure-activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural tube closure defects, and cognitive functional defects. Utilizing bioactivity data from the ChEMBL 33 database, various machine learning algorithms, chemical features and methods to assess prediction reliability were compared and applied to develop robust models to predict compound activity. The results demonstrate high predictive performance across multiple targets, with balanced accuracy exceeding 0.80 for the majority of models. Furthermore, stability checks confirmed the consistency of predictive performance across multiple training-test splits. The results obtained by using QSAR predictions to identify known markers of adversities highlighted the utility of the models for risk assessment and for prioritizing compounds for further experimental evaluation.Scientific contributionThe work describes the development of QSAR models as tools for screening chemicals with potential systemic toxicity, thus contributing to resource savings and providing indications for further better-targeted testing. This study provides advances in the field of computational modeling of MIEs and information from AOP which is still relatively young and unexplored. The comprehensive modeling procedure is highly generalizable, and offers a robust framework for predicting a wide range of toxicological endpoints.
Collapse
Affiliation(s)
- Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto Di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.
| | - Marina Garcia de Lomana
- Bayer AG, Machine Learning Research, Research & Development, Pharmaceuticals, Berlin, Germany
| | - Eva Serrano-Candelas
- ProtoQSAR SL, CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, Paterna, Valencia, Spain
| | - Rita Ortega-Vallbona
- ProtoQSAR SL, CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, Paterna, Valencia, Spain
| | - Rafael Gozalbes
- ProtoQSAR SL, CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, Paterna, Valencia, Spain
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto Di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto Di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| |
Collapse
|
6
|
Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024; 19:1297-1307. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. AREAS COVERED In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts, constitutes another challenge that can be alleviated by gaining more prospective experience. EXPERT OPINION AI models are often developed to excel on retrospective benchmarks unlikely to anticipate their prospective performance. As a result, only a few of these models are ever reported to have prospective value (e.g. by discovering potent and innovative drug leads for a therapeutic target). The authors have discussed what can go wrong in practice with AI for drug discovery. The authors hope that this will help inform the decisions of editors, funders investors, and researchers working in this area.
Collapse
Affiliation(s)
- Ghita Ghislat
- Department of Life Sciences, Imperial College London, London, UK
| | | | | | | |
Collapse
|
7
|
Roy J, Roy K. Insights into nanoparticle toxicity against aquatic organisms using multivariate regression, read-across, and ML algorithms: Predictive models for Daphnia magna and Danio rerio. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2024; 276:107114. [PMID: 39396443 DOI: 10.1016/j.aquatox.2024.107114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/14/2024] [Accepted: 10/02/2024] [Indexed: 10/15/2024]
Abstract
The production of nanoparticles (NPs) has recently become more prevalent owing to their numerous applications in the fast-growing nanotechnology industry. Although nanoparticles have growing applications, there is a significant concern over their environmental impact due to their inevitable release into the environment. With the increasing risk to aquatic organisms, D. magna and zebrafish (Danio rerio) have been preferred as important freshwater model organisms for risk assessment and ecotoxicological studies on metal oxide-based nanoparticles (MeOxNPs) in aquatic environments. It is unfeasible to assess the risks associated with every single NP through in vivo or in vitro experiments. As an alternative, in silico approaches are employed to evaluate the NP toxicity. To evaluate such performance, we have collected data from databases and literature reviews to develop models based on multivariate regression, read-across approach (RA), and machine learning (ML) algorithms following the principles of OECD (Organization for Economic Cooperation and Development) for QSAR modeling. This work has aimed to investigate which features are important drivers of nanotoxicity in D. magna and Danio rerio using simple periodic table-derived descriptors. Further, we have examined the effectiveness of read-across-derived similarity measures compared to traditional QSAR models. The results obtained from model 1 infers that nanoparticles' size, the number of metals, the core environment of the metal present in the metal oxide, and the oxidation number of the metal play a key role in the final expression of toxicity of nanoparticles to D. magna. On the other hand, the presence of higher molecular weight, the core of the metal, and the presence of oxygen influence the enzyme inhibition activity. The enzyme inhibition is correlated with the ability of zebrafish embryos to hatch, and therefore, the inhibition of ZHE1 seems to be the factor driving hatch delay. The study emphasized the importance of developing transferable, reproducible, and easily interpretable models for the early identification of nanoparticle features contributing to aquatic toxicity.
Collapse
Affiliation(s)
- Joyita Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
8
|
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler TJ. Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Pharmaceuticals (Basel) 2024; 17:992. [PMID: 39204097 PMCID: PMC11356940 DOI: 10.3390/ph17080992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Computational approaches for small-molecule drug discovery now regularly scale to the consideration of libraries containing billions of candidate small molecules. One promising approach to increased the speed of evaluating billion-molecule libraries is to develop succinct representations of each molecule that enable the rapid identification of molecules with similar properties. Molecular fingerprints are thought to provide a mechanism for producing such representations. Here, we explore the utility of commonly used fingerprints in the context of predicting similar molecular activity. We show that fingerprint similarity provides little discriminative power between active and inactive molecules for a target protein based on a known active-while they may sometimes provide some enrichment for active molecules in a drug screen, a screened data set will still be dominated by inactive molecules. We also demonstrate that high-similarity actives appear to share a scaffold with the query active, meaning that they could more easily be identified by structural enumeration. Furthermore, even when limited to only active molecules, fingerprint similarity values do not correlate with compound potency. In sum, these results highlight the need for a new wave of molecular representations that will improve the capacity to detect biologically active molecules based on their similarity to other such molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, 7034 Trondheim, Norway
| | - Jeremiah Gaiser
- School of Information, University of Arizona, Tucson, AZ 85721, USA
| | - Daphne Demekas
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | - Amitava Roy
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA;
- Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, MT 59812, USA
| | - Rui Xiong
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J. Wheeler
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
9
|
Lenhof K, Eckhart L, Rolli LM, Lenhof HP. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Brief Bioinform 2024; 25:bbae379. [PMID: 39101498 PMCID: PMC11299037 DOI: 10.1093/bib/bbae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
With the ever-increasing number of artificial intelligence (AI) systems, mitigating risks associated with their use has become one of the most urgent scientific and societal issues. To this end, the European Union passed the EU AI Act, proposing solution strategies that can be summarized under the umbrella term trustworthiness. In anti-cancer drug sensitivity prediction, machine learning (ML) methods are developed for application in medical decision support systems, which require an extraordinary level of trustworthiness. This review offers an overview of the ML landscape of methods for anti-cancer drug sensitivity prediction, including a brief introduction to the four major ML realms (supervised, unsupervised, semi-supervised, and reinforcement learning). In particular, we address the question to what extent trustworthiness-related properties, more specifically, interpretability and reliability, have been incorporated into anti-cancer drug sensitivity prediction methods over the previous decade. In total, we analyzed 36 papers with approaches for anti-cancer drug sensitivity prediction. Our results indicate that the need for reliability has hardly been addressed so far. Interpretability, on the other hand, has often been considered for model development. However, the concept is rather used intuitively, lacking clear definitions. Thus, we propose an easily extensible taxonomy for interpretability, unifying all prevalent connotations explicitly or implicitly used within the field.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| |
Collapse
|
10
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
11
|
Balraadjsing S, J G M Peijnenburg W, Vijver MG. Building species trait-specific nano-QSARs: Model stacking, navigating model uncertainties and limitations, and the effect of dataset size. ENVIRONMENT INTERNATIONAL 2024; 188:108764. [PMID: 38788418 DOI: 10.1016/j.envint.2024.108764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024]
Abstract
A strong need exists for broadly applicable nano-QSARs, capable of predicting toxicological outcomes towards untested species and nanomaterials, under different environmental conditions. Existing nano-QSARs are generally limited to only a few species but the inclusion of species characteristics into models can aid in making them applicable to multiple species, even when toxicity data is not available for biological species. Species traits were used to create classification- and regression machine learning models to predict acute toxicity towards aquatic species for metallic nanomaterials. Afterwards, the individual classification- and regression models were stacked into a meta-model to improve performance. Additionally, the uncertainty and limitations of the models were assessed in detail (beyond the OECD principles) and it was investigated whether models would benefit from the addition of more data. Results showed a significant improvement in model performance following model stacking. Investigation of model uncertainties and limitations highlighted the discrepancy between the applicability domain and accuracy of predictions. Data points outside of the assessed chemical space did not have higher likelihoods of generating inadequate predictions or vice versa. It is therefore concluded that the applicability domain does not give complete insight into the uncertainty of predictions and instead the generation of prediction intervals can help in this regard. Furthermore, results indicated that an increase of the dataset size did not improve model performance. This implies that larger dataset sizes may not necessarily improve model performance while in turn also meaning that large datasets are not necessarily required for prediction of acute toxicity with nano-QSARs.
Collapse
Affiliation(s)
- Surendra Balraadjsing
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands.
| | - Willie J G M Peijnenburg
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands; Centre for Safety of Substances and Products, National Institute of Public Health and the Environment (RIVM), PO Box 1, 3720 BA Bilthoven, the Netherlands
| | - Martina G Vijver
- Institute of Environmental Sciences (CML), Leiden University, PO Box 9518, 2300 RA Leiden, the Netherlands
| |
Collapse
|
12
|
Barbosa H, Espinoza GZ, Amaral M, de Castro Levatti EV, Abiuzi MB, Veríssimo GC, Fernandes PDO, Maltarollo VG, Tempone AG, Honorio KM, Lago JHG. Andrographolide: A Diterpenoid from Cymbopogon schoenanthus Identified as a New Hit Compound against Trypanosoma cruzi Using Machine Learning and Experimental Approaches. J Chem Inf Model 2024; 64:2565-2576. [PMID: 38148604 DOI: 10.1021/acs.jcim.3c01410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
American Trypanosomiasis, also known as Chagas disease, is caused by the protozoan Trypanosoma cruzi and exhibits limited options for treatment. Natural products offer various structurally complex metabolites with biological activities, including those with anti-T. cruzi potential. The discovery and development of prototypes based on natural products frequently display multiple phases that could be facilitated by machine learning techniques to provide a fast and efficient method for selecting new hit candidates. Using Random Forest and k-Nearest Neighbors, two models were constructed to predict the biological activity of natural products from plants against intracellular amastigotes of T. cruzi. The diterpenoid andrographolide was identified from a virtual screening as a promising hit compound. Hereafter, it was isolated from Cymbopogon schoenanthus and chemically characterized by spectral data analysis. Andrographolide was evaluated against trypomastigote and amastigote forms of T. cruzi, showing IC50 values of 29.4 and 2.9 μM, respectively, while the standard drug benznidazole displayed IC50 values of 17.7 and 5.0 μM, respectively. Additionally, the isolated compound exhibited a reduced cytotoxicity (CC50 = 92.8 μM) against mammalian cells and afforded a selectivity index (SI) of 32, similar to that of benznidazole (SI = 39). From the in silico analyses, we can conclude that andrographolide fulfills many requirements implemented by DNDi to be a hit compound. Therefore, this work successfully obtained machine learning models capable of predicting the activity of compounds against intracellular forms of T. cruzi.
Collapse
Affiliation(s)
- Henrique Barbosa
- Center for Natural and Human Sciences, Federal University of ABC, São Paulo 09210-180, Brazil
| | | | - Maiara Amaral
- Laboratory of Pathophysiology, Butantan Institute, São Paulo 05503-900, Brazil
| | | | | | - Gabriel Correa Veríssimo
- Department of Pharmaceutical Products, Federal University of Minas Gerais, Minas Gerais, 31270-901, Brazil
| | | | | | | | - Kathia Maria Honorio
- Center for Natural and Human Sciences, Federal University of ABC, São Paulo 09210-180, Brazil
- School of Arts, Science, and Humanities, University of São Paulo, São Paulo 03828-000, Brazil
| | | |
Collapse
|
13
|
Jimenes-Vargas K, Pazos A, Munteanu CR, Perez-Castillo Y, Tejera E. Prediction of compound-target interaction using several artificial intelligence algorithms and comparison with a consensus-based strategy. J Cheminform 2024; 16:27. [PMID: 38449058 PMCID: PMC10919000 DOI: 10.1186/s13321-024-00816-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/15/2024] [Indexed: 03/08/2024] Open
Abstract
For understanding a chemical compound's mechanism of action and its side effects, as well as for drug discovery, it is crucial to predict its possible protein targets. This study examines 15 developed target-centric models (TCM) employing different molecular descriptions and machine learning algorithms. They were contrasted with 17 third-party models implemented as web tools (WTCM). In both sets of models, consensus strategies were implemented as potential improvement over individual predictions. The findings indicate that TCM reach f1-score values greater than 0.8. Comparing both approaches, the best TCM achieves values of 0.75, 0.61, 0.25 and 0.38 for true positive/negative rates (TPR, TNR) and false negative/positive rates (FNR, FPR); outperforming the best WTCM. Moreover, the consensus strategy proves to have the most relevant results in the top 20 % of target profiles. TCM consensus reach TPR and FNR values of 0.98 and 0; while on WTCM reach values of 0.75 and 0.24. The implemented computational tool with the TCM and their consensus strategy at: https://bioquimio.udla.edu.ec/tidentification01/ . Scientific Contribution: We compare and discuss the performances of 17 public compound-target interaction prediction models and 15 new constructions. We also explore a compound-target interaction prioritization strategy using a consensus approach, and we analyzed the challenging involved in interactions modeling.
Collapse
Affiliation(s)
- Karina Jimenes-Vargas
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito, 170504, Ecuador.
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain.
| | - Alejandro Pazos
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruna (CHUAC), 15006, A Coruna, Spain
| | - Cristian R Munteanu
- Departament of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruna (CHUAC), 15006, A Coruna, Spain
| | | | - Eduardo Tejera
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito, 170504, Ecuador.
| |
Collapse
|
14
|
Mora JR, Marquez EA, Pérez-Pérez N, Contreras-Torres E, Perez-Castillo Y, Agüero-Chapin G, Martinez-Rios F, Marrero-Ponce Y, Barigye SJ. Rethinking the applicability domain analysis in QSAR models. J Comput Aided Mol Des 2024; 38:9. [PMID: 38351144 DOI: 10.1007/s10822-024-00550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/05/2024] [Indexed: 02/16/2024]
Abstract
Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.
Collapse
Affiliation(s)
- Jose R Mora
- Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, 170901, Ecuador
| | - Edgar A Marquez
- Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, Baja California, México
| | - Noel Pérez-Pérez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito (USFQ), Quito, Ecuador
| | - Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, Porto, 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, Porto, 4169- 007, Portugal
| | - Felix Martinez-Rios
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, 28049, Spain.
| |
Collapse
|
15
|
Martinez-Mayorga K, Rosas-Jiménez JG, Gonzalez-Ponce K, López-López E, Neme A, Medina-Franco JL. The pursuit of accurate predictive models of the bioactivity of small molecules. Chem Sci 2024; 15:1938-1952. [PMID: 38332817 PMCID: PMC10848664 DOI: 10.1039/d3sc05534e] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
Property prediction is a key interest in chemistry. For several decades there has been a continued and incremental development of mathematical models to predict properties. As more data is generated and accumulated, there seems to be more areas of opportunity to develop models with increased accuracy. The same is true if one considers the large developments in machine and deep learning models. However, along with the same areas of opportunity and development, issues and challenges remain and, with more data, new challenges emerge such as the quality and quantity and reliability of the data, and model reproducibility. Herein, we discuss the status of the accuracy of predictive models and present the authors' perspective of the direction of the field, emphasizing on good practices. We focus on predictive models of bioactive properties of small molecules relevant for drug discovery, agrochemical, food chemistry, natural product research, and related fields.
Collapse
Affiliation(s)
- Karina Martinez-Mayorga
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José G Rosas-Jiménez
- Department of Theoretical Biophysics, IMPRS on Cellular Biophysics Max-von-Laue Strasse 3 Frankfurt am Main 60438 Germany
| | - Karla Gonzalez-Ponce
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
| | - Edgar López-López
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City 07000 Mexico
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| | - Antonio Neme
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| |
Collapse
|
16
|
Serafim MSM, Kronenberger T, Rocha REO, Rosa ADRA, Mello TLG, Poso A, Ferreira RS, Abrahão JS, Kroon EG, Mota BEF, Maltarollo VG. Aminopyrimidine Derivatives as Multiflavivirus Antiviral Compounds Identified from a Consensus Virtual Screening Approach. J Chem Inf Model 2024; 64:393-411. [PMID: 38194508 DOI: 10.1021/acs.jcim.3c01505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Around three billion people are at risk of infection by the dengue virus (DENV) and potentially other flaviviruses. Worldwide outbreaks of DENV, Zika virus (ZIKV), and yellow fever virus (YFV), the lack of antiviral drugs, and limitations on vaccine usage emphasize the need for novel antiviral research. Here, we propose a consensus virtual screening approach to discover potential protease inhibitors (NS3pro) against different flavivirus. We employed an in silico combination of a hologram quantitative structure-activity relationship (HQSAR) model and molecular docking on characterized binding sites followed by molecular dynamics (MD) simulations, which filtered a data set of 7.6 million compounds to 2,775 hits. Lastly, docking and MD simulations selected six final potential NS3pro inhibitors with stable interactions along the simulations. Five compounds had their antiviral activity confirmed against ZIKV, YFV, DENV-2, and DENV-3 (ranging from 4.21 ± 0.14 to 37.51 ± 0.8 μM), displaying aggregator characteristics for enzymatic inhibition against ZIKV NS3pro (ranging from 28 ± 7 to 70 ± 7 μM). Taken together, the compounds identified in this approach may contribute to the design of promising candidates to treat different flavivirus infections.
Collapse
Affiliation(s)
- Mateus Sá Magalhães Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Thales Kronenberger
- Institute of Pharmacy, Pharmaceutical/Medicinal Chemistry and Tübingen Center for Academic Drug Discovery (TüCAD2), Eberhard Karls University Tübingen, Auf der Morgenstelle 8, Tübingen 72076, Germany
- Excellence Cluster "Controlling Microbes to Fight Infections" (CMFI), Tübingen 72076, Germany
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Rafael Eduardo Oliveira Rocha
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Amanda Del Rio Abreu Rosa
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Thaysa Lara Gonçalves Mello
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Antti Poso
- Institute of Pharmacy, Pharmaceutical/Medicinal Chemistry and Tübingen Center for Academic Drug Discovery (TüCAD2), Eberhard Karls University Tübingen, Auf der Morgenstelle 8, Tübingen 72076, Germany
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 70211, Finland
- Department of Medical Oncology and Pneumology, University Hospital of Tübingen, Tübingen 70211, Germany
| | - Rafaela Salgado Ferreira
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Jonatas Santos Abrahão
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Erna Geessien Kroon
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Bruno Eduardo Fernandes Mota
- Departamento de Análises Clínicas e Toxicológicas, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
17
|
Trinh XT, Chien PN, Long NV, Van Anh LT, Giang NN, Nam SY, Myung Y. Development of predictive models for lymphedema by using blood tests and therapy data. Sci Rep 2023; 13:19720. [PMID: 37957217 PMCID: PMC10643602 DOI: 10.1038/s41598-023-46567-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/02/2023] [Indexed: 11/15/2023] Open
Abstract
Lymphedema is a disease that refers to tissue swelling caused by an accumulation of protein-rich fluid that is usually drained through the lymphatic system. Detection of lymphedema is often based on expensive diagnoses such as bioimpedance spectroscopy, shear wave elastography, computed tomography, etc. In current machine learning models for lymphedema prediction, reliance on observable symptoms reported by patients introduces the possibility of errors in patient-input data. Moreover, these symptoms are often absent during the initial stages of lymphedema, creating challenges in its early detection. Identifying lymphedema before these observable symptoms manifest would greatly benefit patients by potentially minimizing the discomfort caused by these symptoms. In this study, we propose to use new data, such as complete blood count, serum, and therapy data, to develop predictive models for lymphedema. This approach aims to compensate for the limitations of using only observable symptoms data. We collected data from 2137 patients, including 356 patients with lymphedema and 1781 patients without lymphedema, with the lymphedema status of each patient confirmed by clinicians. The data for each patient included: (1) a complete blood count (CBC) test, (2) a serum test, and (3) therapy information. We used various machine learning algorithms (i.e. random forest, gradient boosting, decision tree, logistic regression, and artificial neural network) to develop predictive models on the training dataset (i.e. 80% of the data) and evaluated the models on the external validation dataset (i.e. 20% of the data). After selecting the best predictive models, we created a web application to aid medical doctors and clinicians in the rapid screening of lymphedema patients. A dataset of 2137 patients was assembled from Seoul National University Bundang Hospital. Predictive models based on the random forest algorithm exhibited satisfactory performance (balanced accuracy = 87.0 ± 0.7%, sensitivity = 84.3 ± 0.6%, specificity = 89.1 ± 1.5%, precision = 97.4 ± 0.7%, F1 score = 90.4 ± 0.4%, and AUC = 0.931 ± 0.007). We developed a web application to facilitate the swift screening of lymphedema among medical practitioners: https://snubhtxt.shinyapps.io/SNUBH_Lymphedema . Our study introduces a novel tool for the early detection of lymphedema and establishes the foundation for future investigations into predicting different stages of the condition.
Collapse
Affiliation(s)
- Xuan-Tung Trinh
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea
| | - Pham Ngoc Chien
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea
| | - Nguyen-Van Long
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea
| | - Le Thi Van Anh
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea
| | - Nguyen Ngan Giang
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea
- Department of Medical Device Development, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Sun-Young Nam
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea.
| | - Yujin Myung
- Department of Plastic and Reconstructive Surgery, Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea.
| |
Collapse
|
18
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
19
|
Abdullah Z, Chee HY, Yusof R, Mohd Fauzi F. Finding Lead Compounds for Dengue Antivirals from a Collection of Old Drugs through In Silico Target Prediction and Subsequent In Vitro Validation. ACS OMEGA 2023; 8:32483-32497. [PMID: 37720780 PMCID: PMC10500654 DOI: 10.1021/acsomega.3c02607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 07/14/2023] [Indexed: 09/19/2023]
Abstract
Dengue virus (DENV) infection is one of the most widely spread flavivirus infections. Despite the fatality it could cause, no antiviral treatment is currently available to treat the disease. Hence, this study aimed to repurpose old drugs as novel DENV NS3 inhibitors. Ligand-based (L-B) and proteochemometric (PCM) prediction models were built using 62,354 bioactivity data to screen for potential NS3 inhibitors. Selected drugs were then subjected to the foci forming unit reduction assay (FFURA) and protease inhibition assay. Finally, molecular docking was performed to validate these results. The in silico studies revealed that both models performed well in the internal and external validations. However, the L-B model showed better accuracy in the external validation in terms of its sensitivity (0.671). In the in vitro validation, all drugs (zileuton, trimethadione, and linalool) were able to moderately inhibit the viral activities at the highest concentration tested. Zileuton showed comparable results with linalool when tested at 2 mM against the DENV NS3 protease, with a reduction of protease activity at 17.89 and 18.42%, respectively. Two new compounds were also proposed through the combination of the selected drugs, which are ziltri (zilueton + trimethadione) and zilool (zileuton + linalool). The molecular docking study confirms the in vitro observations where all drugs and proposed compounds were able to achieve binding affinity ≥ -4.1 kcal/mol, with ziltri showing the highest affinity at -7.7 kcal/mol, surpassing the control, panduratin A. The occupation of both S1 and S2 subpockets of NS2B-NS3 may be essential and a reason for the lower binding energy shown by the proposed compounds compared to the screened drugs. Based on the results, this study provided five potential new lead compounds (ziltri, zilool, zileuton, linalool, and trimethadione) for DENV that could be modified further.
Collapse
Affiliation(s)
- Zafirah
Liyana Abdullah
- Department
of Pharmaceutical Life Sciences, Faculty of Pharmacy, Universiti Teknologi MARA Selangor, Puncak Alam Campus, 42300 Bandar Puncak Alam, Selangor, Malaysia
| | - Hui-Yee Chee
- Department
of Medical Microbiology, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Rohana Yusof
- Department
of Molecular Medicine, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Fazlin Mohd Fauzi
- Department
of Pharmacology and Pharmaceutical Chemistry, Faculty of Pharmacy, UiTM Selangor, Puncak Alam Campus, 42300 Bandar Puncak Alam, Selangor, Malaysia
- Collaborative
Drug Discovery Research, Faculty of Pharmacy, Universiti Teknologi MARA Selangor, Puncak Alam Campus, 42300 Bandar Puncak Alam, Selangor, Malaysia
| |
Collapse
|
20
|
Rojas C, Ballabio D, Consonni V, Suárez-Estrella D, Todeschini R. Classification-based machine learning approaches to predict the taste of molecules: A review. Food Res Int 2023; 171:113036. [PMID: 37330849 DOI: 10.1016/j.foodres.2023.113036] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/02/2023] [Accepted: 05/22/2023] [Indexed: 06/19/2023]
Abstract
The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.
Collapse
Affiliation(s)
- Cristian Rojas
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Diego Suárez-Estrella
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| |
Collapse
|
21
|
Sahoo AK, Baskaran SP, Chivukula N, Kumar K, Samal A. Analysis of structure-activity and structure-mechanism relationships among thyroid stimulating hormone receptor binding chemicals by leveraging the ToxCast library. RSC Adv 2023; 13:23461-23471. [PMID: 37546222 PMCID: PMC10401517 DOI: 10.1039/d3ra04452a] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 07/31/2023] [Indexed: 08/08/2023] Open
Abstract
The thyroid stimulating hormone receptor (TSHR) is crucial in thyroid hormone production in humans, and dysregulation in TSHR activation can lead to adverse health effects such as hypothyroidism and Graves' disease. Further, animal studies have shown that binding of endocrine disrupting chemicals (EDCs) with TSHR can lead to developmental toxicity. Hence, several such chemicals have been screened for their adverse physiological effects in human cell lines via high-throughput assays in the ToxCast project. The invaluable data generated by the ToxCast project has enabled the development of toxicity predictors, but they can be limited in their predictive ability due to the heterogeneity in structure-activity relationships among chemicals. Here, we systematically investigated the heterogeneity in structure-activity as well as structure-mechanism relationships among the TSHR binding chemicals from ToxCast. By employing a structure-activity similarity (SAS) map, we identified 79 activity cliffs among 509 chemicals in TSHR agonist dataset and 69 activity cliffs among 650 chemicals in the TSHR antagonist dataset. Further, by using the matched molecular pair (MMP) approach, we find that the resultant activity cliffs (MMP-cliffs) are a subset of activity cliffs identified via the SAS map approach. Subsequently, by leveraging ToxCast mechanism of action (MOA) annotations for chemicals common to both TSHR agonist and TSHR antagonist datasets, we identified 3 chemical pairs as strong MOA-cliffs and 19 chemical pairs as weak MOA-cliffs. In conclusion, the insights from this systematic investigation of the TSHR binding chemicals are likely to inform ongoing efforts towards development of better predictive toxicity models for characterization of the chemical exposome.
Collapse
Affiliation(s)
- Ajaya Kumar Sahoo
- The Institute of Mathematical Sciences (IMSc) Chennai 600113 India
- Homi Bhabha National Institute (HBNI) Mumbai 400094 India
| | - Shanmuga Priya Baskaran
- The Institute of Mathematical Sciences (IMSc) Chennai 600113 India
- Homi Bhabha National Institute (HBNI) Mumbai 400094 India
| | - Nikhil Chivukula
- The Institute of Mathematical Sciences (IMSc) Chennai 600113 India
- Homi Bhabha National Institute (HBNI) Mumbai 400094 India
| | - Kishan Kumar
- The Institute of Mathematical Sciences (IMSc) Chennai 600113 India
| | - Areejit Samal
- The Institute of Mathematical Sciences (IMSc) Chennai 600113 India
- Homi Bhabha National Institute (HBNI) Mumbai 400094 India
| |
Collapse
|
22
|
Oršolić D, Šmuc T. Dynamic applicability domain (dAD): compound-target binding affinity estimates with local conformal prediction. Bioinformatics 2023; 39:btad465. [PMID: 37594752 PMCID: PMC10457664 DOI: 10.1093/bioinformatics/btad465] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 04/26/2023] [Accepted: 08/17/2023] [Indexed: 08/19/2023] Open
Abstract
MOTIVATION Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment. Approaches, such as the conformal predictor framework equip conventional models with a more rigorous assessment of confidence for individual point predictions. In this article, we extend the inductive conformal prediction framework for interaction data, in particular the compound-target binding affinity prediction task. The new framework is based on dynamically defined calibration sets that are specific for each testing pair and provides prediction assessment in the context of calibration pairs from its compound-target neighbourhood, enabling improved estimates based on the local properties of the prediction model. RESULTS The effectiveness of the approach is benchmarked on several publicly available datasets and tested in realistic use-case scenarios with increasing levels of difficulty on a complex compound-target binding affinity space. We demonstrate that in such scenarios, novel approach combining applicability domain paradigm with conformal prediction framework, produces superior confidence assessment with valid and more informative prediction regions compared to other 'state-of-the-art' conformal prediction approaches. AVAILABILITY AND IMPLEMENTATION Dataset and the code are available on GitHub (https://github.com/mlkr-rbi/dAD).
Collapse
Affiliation(s)
- Davor Oršolić
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| |
Collapse
|
23
|
Pacureanu L, Bora A, Crisan L. New Insights on the Activity and Selectivity of MAO-B Inhibitors through In Silico Methods. Int J Mol Sci 2023; 24:ijms24119583. [PMID: 37298535 DOI: 10.3390/ijms24119583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
To facilitate the identification of novel MAO-B inhibitors, we elaborated a consolidated computational approach, including a pharmacophoric atom-based 3D quantitative structure-activity relationship (QSAR) model, activity cliffs, fingerprint, and molecular docking analysis on a dataset of 126 molecules. An AAHR.2 hypothesis with two hydrogen bond acceptors (A), one hydrophobic (H), and one aromatic ring (R) supplied a statistically significant 3D QSAR model reflected by the parameters: R2 = 0.900 (training set); Q2 = 0.774 and Pearson's R = 0.884 (test set), stability s = 0.736. Hydrophobic and electron-withdrawing fields portrayed the relationships between structural characteristics and inhibitory activity. The quinolin-2-one scaffold has a key role in selectivity towards MAO-B with an AUC of 0.962, as retrieved by ECFP4 analysis. Two activity cliffs showing meaningful potency variation in the MAO-B chemical space were observed. The docking study revealed interactions with crucial residues TYR:435, TYR:326, CYS:172, and GLN:206 responsible for MAO-B activity. Molecular docking is in consensus with and complementary to pharmacophoric 3D QSAR, ECFP4, and MM-GBSA analysis. The computational scenario provided here will assist chemists in quickly designing and predicting new potent and selective candidates as MAO-B inhibitors for MAO-B-driven diseases. This approach can also be used to identify MAO-B inhibitors from other libraries or screen top molecules for other targets involved in suitable diseases.
Collapse
Affiliation(s)
- Liliana Pacureanu
- "Coriolan Dragulescu" Institute of Chemistry, 24 Mihai Viteazu Ave., 300223 Timisoara, Romania
| | - Alina Bora
- "Coriolan Dragulescu" Institute of Chemistry, 24 Mihai Viteazu Ave., 300223 Timisoara, Romania
| | - Luminita Crisan
- "Coriolan Dragulescu" Institute of Chemistry, 24 Mihai Viteazu Ave., 300223 Timisoara, Romania
| |
Collapse
|
24
|
Liu W, Wang Z, Chen J, Tang W, Wang H. Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics. Chem Res Toxicol 2023. [PMID: 37209109 DOI: 10.1021/acs.chemrestox.3c00074] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Machine learning (ML) models for screening endocrine-disrupting chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR) agonists, are essential for sound management of chemicals. Previous models for screening TSHR agonists were built on imbalanced datasets and lacked applicability domain (AD) characterization essential for regulatory application. Herein, an updated TSHR agonist dataset was built, for which the ratio of active to inactive compounds greatly increased to 1:2.6, and chemical spaces of structure-activity landscapes (SALs) were enhanced. Resulting models based on 7 molecular representations and 4 ML algorithms were proven to outperform previous ones. Weighted similarity density (ρs) and weighted inconsistency of activities (IA) were proposed to characterize the SALs, and a state-of-the-art AD characterization methodology ADSAL{ρs, IA} was established. An optimal classifier developed with PubChem fingerprints and the random forest algorithm, coupled with ADSAL{ρs ≥ 0.15, IA ≤ 0.65}, exhibited good performance on the validation set with the area under the receiver operating characteristic curve being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR agonist classes that could not be found previously. The classifier together with the ADSAL{ρs, IA} may serve as efficient tools for screening EDCs, and the AD characterization methodology may be applied to other ML models.
Collapse
Affiliation(s)
- Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
25
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
26
|
Poongavanam V, Kölling F, Giese A, Göller AH, Lehmann L, Meibom D, Kihlberg J. Predictive Modeling of PROTAC Cell Permeability with Machine Learning. ACS OMEGA 2023; 8:5901-5916. [PMID: 36816707 PMCID: PMC9933238 DOI: 10.1021/acsomega.2c07717] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 01/19/2023] [Indexed: 06/18/2023]
Abstract
Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel-Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.
Collapse
Affiliation(s)
| | - Florian Kölling
- Computational
Molecular Design, Bayer AG, 42096Wuppertal, Germany
| | - Anja Giese
- Drug
Discovery Sciences, Bayer AG, 13342Berlin, Germany
| | | | - Lutz Lehmann
- Drug
Discovery Sciences, Bayer AG, 42113Wuppertal, Germany
| | - Daniel Meibom
- Drug
Discovery Sciences, Bayer AG, 42113Wuppertal, Germany
| | - Jan Kihlberg
- Department
of Chemistry-BMC, Box 576, Uppsala University, 75123Uppsala, Sweden
| |
Collapse
|
27
|
Kato M, Yanai T. Pulled fly balls are harder to catch: a game analysis with a machine learning approach. SPORTS ENGINEERING 2022. [DOI: 10.1007/s12283-022-00373-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractTwo hypotheses were tested: (1) the deflecting motion of fly balls caused by aerodynamic effects varies between the pull side and opposite side of the fair territory, and (2) the probability of flyout is lower on the pull side than the opposite side in Japan’s professional baseball games. From all radar-tracking outputs of official games in 2018–2019, fly balls that resulted in outs or base hits were selected for analysis (N = 25,413), and indices representing horizontal and vertical deflecting motions of fly balls were computed and compared between the pull side and opposite side. A machine learning algorithm was used to construct a model to predict the probability of flyout from the kinematic characteristics of fly balls. Flyout zones where the probability of flyout was > 0.6 were computed for a systematically constructed set of fly balls having identical distribution between the pull side and opposite side. The results showed that: (1) most fly balls landing on the opposite side deflected in the same direction whereas the pulled fly balls deflected to either direction, (2) the pulled low fly balls had greater variability in the deflecting motions than the opposite side counterpart, (3) overall probability of flyout of the low fly balls was lower in the pull side (0.41) than the opposite side (0.49), and (4) the flyout zone of an outfielder in the pull side (mean = 698 m2) for low fly balls was smaller than that of the others (≥ 779 m2). The hypotheses were supported. The pulled low fly balls had substantial variations in the direction and magnitude of deflections, which might have reduced the flyout zone on the pull side.
Collapse
|
28
|
Bertato L, Chirico N, Papa E. Predicting the Bioconcentration Factor in Fish from Molecular Structures. TOXICS 2022; 10:toxics10100581. [PMID: 36287860 PMCID: PMC9610932 DOI: 10.3390/toxics10100581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 05/14/2023]
Abstract
The bioconcentration factor (BCF) is one of the metrics used to evaluate the potential of a substance to bioaccumulate into aquatic organisms. In this work, linear and non-linear regression QSARs were developed for the prediction of log BCF using different computational approaches, and starting from a large and structurally heterogeneous dataset. The new MLR-OLS and ANN regression models have good fitting with R2 values of 0.62 and 0.70, respectively, and comparable external predictivity with R2ext 0.64 and 0.65 (RMSEext of 0.78 and 0.76), respectively. Furthermore, linear and non-linear classification models were developed using the regulatory threshold BCF >2000. A class balanced subset was used to develop classification models which were applied to chemicals not used to create the QSARs. These classification models are characterized by external and internal accuracy up to 84% and 90%, respectively, and sensitivity and specificity up to 90% and 80%, respectively. QSARs presented in this work are validated according to regulatory requirements and their quality is in line with other tools available for the same endpoint and dataset, with the advantage of low complexity and easy application through the software QSAR-ME Profiler. These QSARs can be used as alternatives for, or in combination with, existing models to support bioaccumulation assessment procedures.
Collapse
|
29
|
Morita K, Mizuno T, Kusuhara H. Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning. J Chem Inf Model 2022; 62:3982-3992. [PMID: 35971760 DOI: 10.1021/acs.jcim.2c00765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Adverse events are a serious issue in drug development, and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not strictly match the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events. We provide the analysis code and datasets used in the present study at https://github.com/mizuno-group/AE_prediction.
Collapse
Affiliation(s)
- Katsuhisa Morita
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Tadahaya Mizuno
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroyuki Kusuhara
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| |
Collapse
|
30
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
31
|
Schieferdecker S, Bernal FA, Wojtas KP, Keiff F, Li Y, Dahse HM, Kloss F. Development of Predictive Classification Models for Whole Cell Antimycobacterial Activity of Benzothiazinones. J Med Chem 2022; 65:6748-6763. [PMID: 35502994 DOI: 10.1021/acs.jmedchem.2c00098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Nitrobenzothiazinones (BTZs) are a very potent class of antibiotics against Mycobacterium tuberculosis. However, relationships between their structural properties and whole cell activity remain poorly predictable. Herein, we present the synthesis and antimycobacterial evaluation of a diverse set of BTZs. High potency was predominantly achieved by piperidine and piperazine substitutions, whereupon three compounds were identified as promising candidates, showing preferable metabolic stability. Lack of correlation between potency and calculated binding energies suggested that target inhibition is not the only requirement to obtain suitable antimycobacterial agents. In contrast, prediction of whole cell activity class was successfully accomplished by extensively validated machine learning models. The performance of the superior model was further verified by >70% correct class predictions for a large set of reported BTZs. Our generated model is thus a key prerequisite to streamline lead optimization endeavors, particularly regarding the improvement of overall hit rates in whole cell antimycobacterial assays.
Collapse
Affiliation(s)
- Sebastian Schieferdecker
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Freddy A Bernal
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - K Philip Wojtas
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - François Keiff
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Yan Li
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Hans-Martin Dahse
- Department Infection Biology, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Florian Kloss
- Transfer Group Anti-infectives, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
32
|
Yang ZY, Fu L, Lu AP, Liu S, Hou TJ, Cao DS. Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021; 13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open
Abstract
In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. ![]()
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, People's Republic of China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China. .,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China. .,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China.
| |
Collapse
|
33
|
Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Brief Bioinform 2021; 21:791-802. [PMID: 31220208 PMCID: PMC7299289 DOI: 10.1093/bib/bbz026] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/14/2019] [Accepted: 02/17/2019] [Indexed: 12/11/2022] Open
Abstract
Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Johannes Kirchmair
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
34
|
Meyer H, Pebesma E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13650] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hanna Meyer
- Institute of Landscape Ecology Westfälische Wilhelms‐Universität Münster Münster Germany
| | - Edzer Pebesma
- Institute for Geoinformatics Westfälische Wilhelms‐Universität Münster Münster Germany
| |
Collapse
|
35
|
Zhang X, Zhao P, Wang Z, Xu X, Liu G, Tang Y, Li W. In Silico Prediction of CYP2C8 Inhibition with Machine-Learning Methods. Chem Res Toxicol 2021; 34:1850-1859. [PMID: 34255486 DOI: 10.1021/acs.chemrestox.1c00078] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Cytochrome P450 2C8 (CYP2C8) is a major drug-metabolizing enzyme in humans and is responsible for the metabolism of ∼5% drugs in clinical use. Thus, inhibition of CYP2C8, which causes potential adverse drug events, cannot be neglected. The in vitro drug interaction studies guidelines for industry issued by the FDA also point out that it needs to be determined whether investigated drugs are CYP2C8 inhibitors before clinical trials. However, current studies mainly focus on predicting the inhibitors of other major P450 enzymes, and the importance of CYP2C8 inhibition has been overlooked. Therefore, there is a need to develop models for identifying potential CYP2C8 inhibition. In this study, in silico classification models for predicting CYP2C8 inhibition were built by five machine-learning methods combined with nine molecular fingerprints. The performance of the models built was evaluated by test and external validation sets. The best model had AUC values of 0.85 and 0.90 for the test and external validation sets, respectively. The applicability domain was analyzed based on the molecular similarity and exhibited an impact on the improvement of prediction accuracy. Furthermore, several representative privileged substructures such as 1H-benzo[d]imidazole, 1-phenyl-1H-pyrazole, and quinoline were identified by information gain and substructure frequency analysis. Overall, our results would be helpful for the prediction of CYP2C8 inhibition.
Collapse
Affiliation(s)
- Xiaoxiao Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Piaopiao Zhao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhiyuan Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Xuan Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
36
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
37
|
KC GB, Bocci G, Verma S, Hassan MM, Holmes J, Yang JJ, Sirimulla S, Oprea TI. A machine learning platform to estimate anti-SARS-CoV-2 activities. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00335-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
38
|
Morger A, Svensson F, Arvidsson McShane S, Gauraha N, Norinder U, Spjuth O, Volkamer A. Assessing the calibration in toxicological in vitro models with conformal prediction. J Cheminform 2021; 13:35. [PMID: 33926567 PMCID: PMC8082859 DOI: 10.1186/s13321-021-00511-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/10/2021] [Indexed: 11/11/2022] Open
Abstract
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Staffan Arvidsson McShane
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Niharika Gauraha
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden.,Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.
| |
Collapse
|
39
|
Halder AK, Dias Soeiro Cordeiro MN. QSAR-Co-X: an open source toolkit for multitarget QSAR modelling. J Cheminform 2021; 13:29. [PMID: 33858509 PMCID: PMC8048082 DOI: 10.1186/s13321-021-00508-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 03/31/2021] [Indexed: 12/02/2022] Open
Abstract
Quantitative structure activity relationships (QSAR) modelling is a well-known computational tool, often used in a wide variety of applications. Yet one of the major drawbacks of conventional QSAR modelling is that models are set up based on a limited number of experimental and/or theoretical conditions. To overcome this, the so-called multitasking or multitarget QSAR (mt-QSAR) approaches have emerged as new computational tools able to integrate diverse chemical and biological data into a single model equation, thus extending and improving the reliability of this type of modelling. We have developed QSAR-Co-X, an open source python–based toolkit (available to download at https://github.com/ncordeirfcup/QSAR-Co-X) for supporting mt-QSAR modelling following the Box-Jenkins moving average approach. The new toolkit embodies several functionalities for dataset selection and curation plus computation of descriptors, for setting up linear and non-linear models, as well as for a comprehensive results analysis. The workflow within this toolkit is guided by a cohort of multiple statistical parameters and graphical outputs onwards assessing both the predictivity and the robustness of the derived mt-QSAR models. To monitor and demonstrate the functionalities of the designed toolkit, four case-studies pertaining to previously reported datasets are examined here. We believe that this new toolkit, along with our previously launched QSAR-Co code, will significantly contribute to make mt-QSAR modelling widely and routinely applicable. ![]()
Collapse
Affiliation(s)
- Amit Kumar Halder
- LAQV@REQUIMTE/Faculty of Sciences, University of Porto, 4169-007, Porto, Portugal.
| | | |
Collapse
|
40
|
Mazzolari A, Sommaruga L, Pedretti A, Vistoli G. MetaTREE, a Novel Database Focused on Metabolic Trees, Predicts an Important Detoxification Mechanism: The Glutathione Conjugation. Molecules 2021; 26:2098. [PMID: 33917533 PMCID: PMC8038802 DOI: 10.3390/molecules26072098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 03/22/2021] [Accepted: 03/30/2021] [Indexed: 02/07/2023] Open
Abstract
(1) Background: Data accuracy plays a key role in determining the model performances and the field of metabolism prediction suffers from the lack of truly reliable data. To enhance the accuracy of metabolic data, we recently proposed a manually curated database collected by a meta-analysis of the specialized literature (MetaQSAR). Here we aim to further increase data accuracy by focusing on publications reporting exhaustive metabolic trees. This selection should indeed reduce the number of false negative data. (2) Methods: A new metabolic database (MetaTREE) was thus collected and utilized to extract a dataset for metabolic data concerning glutathione conjugation (MT-dataset). After proper pre-processing, this dataset, along with the corresponding dataset extracted from MetaQSAR (MQ-dataset), was utilized to develop binary classification models using a random forest algorithm. (3) Results: The comparison of the models generated by the two collected datasets reveals the better performances reached by the MT-dataset (MCC raised from 0.63 to 0.67, sensitivity from 0.56 to 0.58). The analysis of the applicability domain also confirms that the model based on the MT-dataset shows a more robust predictive power with a larger applicability domain. (4) Conclusions: These results confirm that focusing on metabolic trees represents a convenient approach to increase data accuracy by reducing the false negative cases. The encouraging performances shown by the models developed by the MT-dataset invites to use of MetaTREE for predictive studies in the field of xenobiotic metabolism.
Collapse
Affiliation(s)
- Angelica Mazzolari
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli 25, I-20133 Milano, Italy; (L.S.); (A.P.); (G.V.)
| | | | | | | |
Collapse
|
41
|
Huang DZ, Baber JC, Bahmanyar SS. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction. Expert Opin Drug Discov 2021; 16:1045-1056. [PMID: 33739897 DOI: 10.1080/17460441.2021.1901685] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) has seen a massive resurgence in recent years with wide successes in computer vision, natural language processing, and games. The similar creation of robust and accurate AI models for ADME/Tox endpoint and activity prediction would be revolutionary to drug discovery pipelines. There have been numerous demonstrations of successful applications, but a key challenge remains: how generalizable are these predictive models? AREAS COVERED The authors present a summary of current promising components of AI models in the context of early drug discovery where ADME/Tox endpoint and activity prediction is the main driver of the iterative drug design process. Following that is a review of applicability domains and dataset construction considerations which determine generalizability bottlenecks for AI deployment. Further reviewed is the role of promising learning frameworks - multitask, transfer, and meta learning - which leverage auxiliary data to overcome issues of generalizability. EXPERT OPINION The authors conclude that the most promising direction toward integrating reliable and informative AI models into the drug discovery pipeline is a conjunction of learned feature representations, deep learning, and novel learning frameworks. Such a solution would address the sparse and incomplete datasets that are available for key endpoints related to drug discovery.
Collapse
Affiliation(s)
| | - J Christian Baber
- Scientific Informatics, Global Head of Scientific Informatics, Scientific Informatics, Takeda Pharmaceuticals, Cambridge, MA, USA
| | - Sogole Sami Bahmanyar
- Computational Chemistry, Director of Computational Sciences, Computational Chemistry, Takeda Pharmaceuticals, San Diego, USA
| |
Collapse
|
42
|
Lima MNN, Borba JVB, Cassiano GC, Mottin M, Mendonça SS, Silva AC, Tomaz KCP, Calit J, Bargieri DY, Costa FTM, Andrade CH. Artificial Intelligence Applied to the Rapid Identification of New Antimalarial Candidates with Dual-Stage Activity. ChemMedChem 2021; 16:1093-1103. [PMID: 33247522 DOI: 10.1002/cmdc.202000685] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2002] [Revised: 11/16/2020] [Indexed: 01/06/2023]
Abstract
Increasing reports of multidrug-resistant malaria parasites urge the discovery of new effective drugs with different chemical scaffolds. Protein kinases play a key role in many cellular processes such as signal transduction and cell division, making them interesting targets in many diseases. Protein kinase 7 (PK7) is an orphan kinase from the Plasmodium genus, essential for the sporogonic cycle of these parasites. Here, we applied a robust and integrative artificial intelligence-assisted virtual-screening (VS) approach using shape-based and machine learning models to identify new potential PK7 inhibitors with in vitro antiplasmodial activity. Eight virtual hits were experimentally evaluated, and compound LabMol-167 inhibited ookinete conversion of Plasmodium berghei and blood stages of Plasmodium falciparum at nanomolar concentrations with low cytotoxicity in mammalian cells. As PK7 does not have an essential role in the Plasmodium blood stage and our virtual screening strategy aimed for both PK7 and blood-stage inhibition, we conducted an in silico target fishing approach and propose that this compound might also inhibit P. falciparum PK5, acting as a possible dual-target inhibitor. Finally, docking studies of LabMol-167 with P. falciparum PK7 and PK5 proteins highlighted key interactions for further hit-to lead optimization.
Collapse
Affiliation(s)
- Marilia N N Lima
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Joyce V B Borba
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil.,Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Gustavo C Cassiano
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil.,Global Health and Tropical Medicine (GHTM), Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Melina Mottin
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Sabrina S Mendonça
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Arthur C Silva
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Kaira C P Tomaz
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Juliana Calit
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 05508-000, São Paulo, SP, Brazil
| | - Daniel Y Bargieri
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 05508-000, São Paulo, SP, Brazil
| | - Fabio T M Costa
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Carolina H Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil.,Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| |
Collapse
|
43
|
Cooper K, Baddeley C, French B, Gibson K, Golden J, Lee T, Pierre S, Weiss B, Yang J. Novel Development of Predictive Feature Fingerprints to Identify Chemistry-Based Features for the Effective Drug Design of SARS-CoV-2 Target Antagonists and Inhibitors Using Machine Learning. ACS OMEGA 2021; 6:4857-4877. [PMID: 33644594 PMCID: PMC7905939 DOI: 10.1021/acsomega.0c05303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 01/25/2021] [Indexed: 05/04/2023]
Abstract
A unique approach to bioactivity and chemical data curation coupled with random forest analyses has led to a series of target-specific and cross-validated predictive feature fingerprints (PFF) that have high predictability across multiple therapeutic targets and disease stages involved in the severe acute respiratory syndrome due to coronavirus 2 (SARS-CoV-2)-induced COVID-19 pandemic, which include plasma kallikrein, human immunodeficiency virus (HIV)-protease, nonstructural protein (NSP)5, NSP12, Janus kinase (JAK) family, and AT-1. The approach was highly accurate in determining the matched target for the different compound sets and suggests that the models could be used for virtual screening of target-specific compound libraries. The curation-modeling process was successfully applied to a SARS-CoV-2 phenotypic screen and could be used for predictive bioactivity estimation and prioritization for clinical trial selection; virtual screening of drug libraries for the repurposing of drug molecules; and analysis and direction of proprietary data sets.
Collapse
Affiliation(s)
- Kelvin Cooper
- KC
Pharma Consulting, 1513
Harbor Drive, Sarasota, Florida 34239, United States
| | - Christopher Baddeley
- CAS,
A Division of the American Chemical Society, 2540 Olentangy River Road, Columbus, Ohio 43210-3012, United States
| | - Bernie French
- Tasseogen
Inc., 300 Mainsail Drive, Westerville, Ohio 43018, United States
| | - Katherine Gibson
- CAS,
A Division of the American Chemical Society, 2540 Olentangy River Road, Columbus, Ohio 43210-3012, United States
| | - James Golden
- WorldQuant
Predictive, 575 Fifth
Avenue, New York, New York 10017, United States
| | - Thiam Lee
- WorldQuant
Predictive, 575 Fifth
Avenue, New York, New York 10017, United States
| | - Sadrach Pierre
- WorldQuant
Predictive, 575 Fifth
Avenue, New York, New York 10017, United States
| | - Brent Weiss
- CAS,
A Division of the American Chemical Society, 2540 Olentangy River Road, Columbus, Ohio 43210-3012, United States
| | - Jason Yang
- WorldQuant
Predictive, 575 Fifth
Avenue, New York, New York 10017, United States
| |
Collapse
|
44
|
Jorner K, Brinck T, Norrby PO, Buttar D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 2021; 12:1163-1175. [PMID: 36299676 PMCID: PMC9528810 DOI: 10.1039/d0sc04896h] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022] Open
Abstract
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol-1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints.
Collapse
Affiliation(s)
- Kjell Jorner
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| | - Tore Brinck
- Applied Physical Chemistry, Department of Chemistry, CBH, KTH Royal Institute of Technology Stockholm Sweden
| | - Per-Ola Norrby
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg Sweden
| | - David Buttar
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| |
Collapse
|
45
|
Zhao P, Peng Y, Xu X, Wang Z, Wu Z, Li W, Tang Y, Liu G. In silico prediction of mitochondrial toxicity of chemicals using machine learning methods. J Appl Toxicol 2021; 41:1518-1526. [PMID: 33469990 DOI: 10.1002/jat.4141] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 12/15/2020] [Accepted: 12/30/2020] [Indexed: 12/16/2022]
Abstract
Mitochondria are important organelles in human cells, providing more than 95% of the energy. However, some drugs and environmental chemicals could induce mitochondrial dysfunction, which might cause complex diseases and even worsen the condition of patients with mitochondrial damage. Some drugs have been withdrawn from the market due to their severe mitochondrial toxicity, such as troglitazone. Therefore, there is an urgent need to develop models that could accurately predict the mitochondrial toxicity of chemicals. In this paper, suitable data were obtained from literature and databases first. Then nine types of fingerprints were used to characterize these compounds. Finally, different algorithms were used to build models. Meanwhile, the applicability domain of the prediction models was defined. We have also explored the structural alerts of mitochondrial toxicity, which would be helpful for medicinal chemists to better predict mitochondrial toxicity and further optimize lead compounds.
Collapse
Affiliation(s)
- Piaopiao Zhao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yayuan Peng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xuan Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Zhiyuan Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
46
|
Ambe K, Ohya K, Takada W, Suzuki M, Tohkin M. In Silico Approach to Predict Severe Cutaneous Adverse Reactions Using the Japanese Adverse Drug Event Report Database. Clin Transl Sci 2021; 14:756-763. [PMID: 33417306 PMCID: PMC7993315 DOI: 10.1111/cts.12944] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 10/08/2020] [Indexed: 11/30/2022] Open
Abstract
Severe cutaneous adverse reactions (SCARs), such as Stevens–Johnson syndrome/toxic epidermal necrolysis and drug‐induced hypersensitivity syndrome, are rare and occasionally fatal. However, it is difficult to detect SCARs at the drug development stage, necessitating a new approach for prediction. Therefore, in this study, using the chemical structure information of SCAR‐causative drugs from the Japanese Adverse Drug Event Report (JADER) database, we tried to develop a predictive classification model of SCAR through deep learning. In the JADER database from 2004 to 2017, we defined 185 SCAR‐positive drugs and 195 SCAR‐negative drugs using proportional reporting ratios as the signal detection method, and the total number of reports. These SCAR‐positive and SCAR‐negative drugs were randomly divided into the training dataset for model construction and the test dataset for evaluation. The model performance was evaluated in the independent test dataset inside the applicability domain (AD), which is the chemical space for reliable prediction results. Using the deep learning model with molecular descriptors as the drug structure information, the area under the curve was 0.76 for the 148 drugs of the test dataset inside the AD. The method developed in the present study allows for utilizing the JADER database for SCAR classification, with potential to improve screening efficiency in the development of new drugs. This method may also help to noninvasively identify the causative drug, and help assess the causality between drugs and SCARs in postmarketing surveillance.
Collapse
Affiliation(s)
- Kaori Ambe
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya, Japan
| | - Kazuyuki Ohya
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya, Japan
| | - Waki Takada
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya, Japan
| | - Masaharu Suzuki
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya, Japan
| | - Masahiro Tohkin
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya, Japan
| |
Collapse
|
47
|
Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00236-4] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
48
|
Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A. Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int J Mol Sci 2020; 21:E5542. [PMID: 32756326 PMCID: PMC7432167 DOI: 10.3390/ijms21155542] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/27/2020] [Accepted: 07/30/2020] [Indexed: 01/28/2023] Open
Abstract
Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
Collapse
Affiliation(s)
- Assima Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur I. Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Ramil I. Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur R. Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
| | - Igor I. Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
- Faculty of Physics, Moscow State University, 119234 Moscow, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
49
|
Quantification of extra virgin olive oil adulteration using smartphone videos. Talanta 2020; 216:120920. [DOI: 10.1016/j.talanta.2020.120920] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 03/09/2020] [Accepted: 03/10/2020] [Indexed: 11/22/2022]
|
50
|
Tang W, Chen J, Hong H. Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. CHEMOSPHERE 2020; 253:126768. [PMID: 32464767 DOI: 10.1016/j.chemosphere.2020.126768] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/08/2020] [Accepted: 04/08/2020] [Indexed: 06/11/2023]
Abstract
Humans and animals may be exposed to tens of thousands of natural and synthetic chemicals during their lifespan. It is difficult to assess risk for all the chemicals with experimental toxicity tests. An alternative approach is to use computational toxicology methods such as quantitative structure-activity relationship (QSAR) modeling. Mitochondrial toxicity is involved in many diseases such as cancer, neurodegeneration, type 2 diabetes, cardiovascular diseases and autoimmune diseases. Thus, it is important to rapidly and efficiently identify chemicals with mitochondrial toxicity. In this study, five machine learning algorithms and twelve types of molecular fingerprints were employed to generate QSAR discriminant models for mitochondrial toxicity. A threshold moving method was adopted to resolve the imbalance issue in the training data. Consensus of the models by an averaging probability strategy improved prediction performance. The best model has correct classification rates of 81.8% and 88.3% in ten-fold cross validation and external validation, respectively. Substructures such as phenol, carboxylic acid, nitro and arylchloride were found informative through analysis of information gain and frequency of substructures. The results demonstrate that resolving imbalance in training and building consensus models can improve classification rates for mitochondrial toxicity prediction.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA
| |
Collapse
|