1
|
Pandey SK, Roy K. Development of a read-across-derived classification model for the predictions of mutagenicity data and its comparison with traditional QSAR models and expert systems. Toxicology 2023; 500:153676. [PMID: 37993082 DOI: 10.1016/j.tox.2023.153676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/06/2023] [Accepted: 11/17/2023] [Indexed: 11/24/2023]
Abstract
Mutagenicity is considered an important endpoint from the regulatory, environmental and medical points of view. Due to the wide number of compounds that may be of concern and the enormous expenses (in terms of time, money, and animals) associated with rodent mutagenicity bioassays, this endpoint is a major target for the development of alternative approaches for screening and prediction. The majority of old-aged expert systems and quantitative structure-activity relationship (QSAR) models may show reduced performance over time for their application on newer chemical candidates; thus, researchers constantly try to improve the modeling strategies. In our report, we initially performed traditional classification-based linear discriminant analysis (LDA) QSAR modeling using the benchmark Ames dataset of diverse chemicals (6512 compounds) to recognize the relationship between the molecules and their potential mutagenic behavior. The classical LDA QSAR model is developed from a selected set of 2D descriptors. The LDA QSAR model was developed by using a total of 31 descriptors identified from the analysis of the most discriminating features. Additionally, we have used similarity-derived features obtained from the read-across (RA) to develop an RA-based QSAR model. The developed RA-based LDA QSAR model has better predictivity, transferability, and interpretability compared to the LDA QSAR model, and it uses a very small number of descriptors compared to the classical QSAR model. Different machine learning (ML) models were also developed using the descriptors appearing in the read-across-based LDA QSAR model for comparative studies. We have checked the prediction quality of 216 true external set compounds using the novel similarity-derived RA model. The performance of the OECD toolbox is also compared with the RA-derived LDA QSAR model for a true external set. The current study aimed to explore the significance of the read-across-based algorithm and its application to the most current experimental mutagenicity data to complement already available expert systems.
Collapse
Affiliation(s)
- Sapna Kumari Pandey
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
2
|
Seal S, Yang H, Trapotsi MA, Singh S, Carreras-Puigvert J, Spjuth O, Bender A. Merging bioactivity predictions from cell morphology and chemical fingerprint models using similarity to training data. J Cheminform 2023; 15:56. [PMID: 37268960 DOI: 10.1186/s13321-023-00723-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 04/20/2023] [Indexed: 06/04/2023] Open
Abstract
The applicability domain of machine learning models trained on structural fingerprints for the prediction of biological endpoints is often limited by the lack of diversity of chemical space of the training data. In this work, we developed similarity-based merger models which combined the outputs of individual models trained on cell morphology (based on Cell Painting) and chemical structure (based on chemical fingerprints) and the structural and morphological similarities of the compounds in the test dataset to compounds in the training dataset. We applied these similarity-based merger models using logistic regression models on the predictions and similarities as features and predicted assay hit calls of 177 assays from ChEMBL, PubChem and the Broad Institute (where the required Cell Painting annotations were available). We found that the similarity-based merger models outperformed other models with an additional 20% assays (79 out of 177 assays) with an AUC > 0.70 compared with 65 out of 177 assays using structural models and 50 out of 177 assays using Cell Painting models. Our results demonstrated that similarity-based merger models combining structure and cell morphology models can more accurately predict a wide range of biological assay outcomes and further expanded the applicability domain by better extrapolating to new structural and morphology spaces.
Collapse
Affiliation(s)
- Srijit Seal
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Maria-Anna Trapotsi
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Satvik Singh
- Department of Applied Mathematics and Theoretical Physics (DAMTP), University of Cambridge, Cambridge, UK
| | - Jordi Carreras-Puigvert
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| | - Andreas Bender
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Lemée P, Fessard V, Habauzit D. Prioritization of mycotoxins based on mutagenicity and carcinogenicity evaluation using combined in silico QSAR methods. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 323:121284. [PMID: 36804886 DOI: 10.1016/j.envpol.2023.121284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 02/01/2023] [Accepted: 02/12/2023] [Indexed: 06/18/2023]
Abstract
Mycotoxins and their metabolites are a family of compounds that contains a great diversity of both structure and biological properties. Information on their toxicity is spread within several databases and in scientific literature. Due to the number of molecules and their structure diversity, the cost and time required for hazard evaluation of each compound is unrealistic. In that purpose, new approach methodologies (NAMs) can be applied to evaluate such large set of molecules. Among them, quantitative structure-activity relationship (QSAR) in silico models could be useful to predict the mutagenic and carcinogenic properties of mycotoxins. First, a complete list of 904 mycotoxins and metabolites was built. Then, some known mycotoxins were used to determine the best QSAR tools for mutagenicity and carcinogenicity predictions. The best tool was further applied to the whole list of 904 mycotoxins. At the end, 95 mycotoxins were identified as both mutagen and carcinogen and should be prioritized for further evaluation.
Collapse
Affiliation(s)
- Pierre Lemée
- ANSES (French Agency for Food, Environmental and Occupational Health & Safety), Toxicology of Contaminants Unit, Fougères, France
| | - Valérie Fessard
- ANSES (French Agency for Food, Environmental and Occupational Health & Safety), Toxicology of Contaminants Unit, Fougères, France
| | - Denis Habauzit
- ANSES (French Agency for Food, Environmental and Occupational Health & Safety), Toxicology of Contaminants Unit, Fougères, France.
| |
Collapse
|
4
|
Qazi S, Khanna K, Raza K. Dihydroquercetin (DHQ) has the potential to promote apoptosis in ovarian cancer cells: An in silico and in vitro study. J Mol Struct 2023. [DOI: 10.1016/j.molstruc.2022.134093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
5
|
Luque Ruiz I, Gómez-Nieto MÁ. Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:587-615. [PMID: 31469296 DOI: 10.1080/1062936x.2019.1644666] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 07/14/2019] [Indexed: 06/10/2023]
Abstract
The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.
Collapse
Affiliation(s)
- I Luque Ruiz
- Department of Computing and Numerical Analysis, Campus de Rabanales, University of Córdoba , Córdoba , Spain
| | - M Á Gómez-Nieto
- Department of Computing and Numerical Analysis, Campus de Rabanales, University of Córdoba , Córdoba , Spain
| |
Collapse
|
6
|
Luque Ruiz I, Gómez-Nieto MÁ. Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index. J Chem Inf Model 2019; 59:2785-2804. [DOI: 10.1021/acs.jcim.9b00264] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Irene Luque Ruiz
- Department of Computing and Numerical Analysis, University of Córdoba, Albert Einstein Building, Campus de Rabanales, E-14071, Córdoba, Spain
| | - Miguel Ángel Gómez-Nieto
- Department of Computing and Numerical Analysis, University of Córdoba, Albert Einstein Building, Campus de Rabanales, E-14071, Córdoba, Spain
| |
Collapse
|
7
|
Ruiz IL, Gómez-Nieto MÁ. Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes. Molecules 2018; 23:molecules23112756. [PMID: 30356020 PMCID: PMC6278359 DOI: 10.3390/molecules23112756] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 10/14/2018] [Accepted: 10/22/2018] [Indexed: 11/30/2022] Open
Abstract
The reliability of a QSAR classification model depends on its capacity to achieve confident predictions of new compounds not considered in the building of the model. The results of this external validation process show the applicability domain (AD) of the QSAR model and, therefore, the robustness of the model to predict the property/activity of new molecules. In this paper we propose the use of the rivality and modelability indexes for the study of the characteristics of the datasets to be correctly modeled by a QSAR algorithm and to predict the reliability of the built model to prognosticate the property/activity of new molecules. The calculation of these indexes has a very low computational cost, not requiring the building of a model, thus being good tools for the analysis of the datasets in the first stages of the building of QSAR classification models. In our study, we have selected two benchmark datasets with similar number of molecules but with very different modelability and we have corroborated the capacity of the predictability of the rivality and modelability indexes regarding the classification models built using Support Vector Machine and Random Forest algorithms with 5-fold cross-validation and leave-one-out techniques. The results have shown the excellent ability of both indexes to predict outliers and the applicability domain of the QSAR classification models. In all cases, these values accurately predicted the statistic parameters of the QSAR models generated by the algorithms.
Collapse
Affiliation(s)
- Irene Luque Ruiz
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| | - Miguel Ángel Gómez-Nieto
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| |
Collapse
|
8
|
Schyman P, Liu R, Desai V, Wallqvist A. vNN Web Server for ADMET Predictions. Front Pharmacol 2017; 8:889. [PMID: 29255418 PMCID: PMC5722789 DOI: 10.3389/fphar.2017.00889] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 11/20/2017] [Indexed: 11/23/2022] Open
Abstract
In drug development, early assessments of pharmacokinetic and toxic properties are important stepping stones to avoid costly and unnecessary failures. Considerable progress has recently been made in the development of computer-based (in silico) models to estimate such properties. Nonetheless, such models can be further improved in terms of their ability to make predictions more rapidly, easily, and with greater reliability. To address this issue, we have used our vNN method to develop 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction models. These models quickly assess some of the most important properties of potential drug candidates, including their cytotoxicity, mutagenicity, cardiotoxicity, drug-drug interactions, microsomal stability, and likelihood of causing drug-induced liver injury. Here we summarize the ability of each of these models to predict such properties and discuss their overall performance. All of these ADMET models are publically available on our website (https://vnnadmet.bhsai.org/), which also offers the capability of using the vNN method to customize and build new models.
Collapse
Affiliation(s)
- Patric Schyman
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, United States
| | - Ruifeng Liu
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, United States
| | - Valmik Desai
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, United States
| | - Anders Wallqvist
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD, United States
| |
Collapse
|
9
|
Gadaleta D, Porta N, Vrontaki E, Manganelli S, Manganaro A, Sello G, Honma M, Benfenati E. Integrating computational methods to predict mutagenicity of aromatic azo compounds. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2017; 35:239-257. [PMID: 29027864 DOI: 10.1080/10590501.2017.1391521] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Azo dyes have several industrial uses. However, these azo dyes and their degradation products showed mutagenicity, inducing damage in environmental and human systems. Computational methods are proposed as cheap and rapid alternatives to predict the toxicity of azo dyes. A benchmark dataset of Ames data for 354 azo dyes was employed to develop three classification strategies using knowledge-based methods and docking simulations. Results were compared and integrated with three models from the literature, developing a series of consensus strategies. The good results confirm the usefulness of in silico methods as a support for experimental methods to predict the mutagenicity of azo compounds.
Collapse
Affiliation(s)
- Domenico Gadaleta
- a Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Milano , Italy
| | - Nicola Porta
- a Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Milano , Italy
| | - Eleni Vrontaki
- a Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Milano , Italy
- b Laboratory of Organic Chemistry, Department of Chemistry , National and Kapodistrian University of Athens , Athens , Greece
| | - Serena Manganelli
- a Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Milano , Italy
| | | | - Guido Sello
- d Department of Chemistry , University of Milano , Milan , Italy
| | - Masamitsu Honma
- e Division of Genetics & Mutagenesis National Institute of Health Sciences , Setagaya-ku , Tokyo , Japan
| | - Emilio Benfenati
- a Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Milano , Italy
| |
Collapse
|
10
|
Liu R, AbdulHameed MDM, Wallqvist A. Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes. J Chem Inf Model 2017; 57:2194-2202. [PMID: 28796500 DOI: 10.1021/acs.jcim.7b00281] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very large number of models. To address this issue, we examined the performance of a variable nearest neighbor (v-NN) method that uses information on near neighbors conforming to the principle that similar structures have similar activities. Using a data set of gene expression signatures of 13 150 compounds derived from cell-based measurements in the NIH Library of Integrated Network-based Cellular Signatures program, we were able to make predictions for 62% of the compounds in a 10-fold cross validation test, with a correlation coefficient of 0.61 between the predicted and experimentally derived signatures-a reproducibility rivaling that of high-throughput gene expression measurements. To evaluate the utility of the predicted gene expression signatures, we compared the predicted and experimentally derived signatures in their ability to identify drugs known to cause specific liver, kidney, and heart injuries. Overall, the predicted and experimentally derived signatures had similar receiver operating characteristics, whose areas under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively, across the three organ injury models. However, detailed analyses of enrichment curves indicate that signatures predicted from multiple near neighbors outperformed those derived from experiments, suggesting that averaging information from near neighbors may help improve the signal from gene expression measurements. Our results demonstrate that the v-NN method can serve as a practical approach for modeling large-scale, genomewide, chemical-induced, gene expression changes.
Collapse
Affiliation(s)
- Ruifeng Liu
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command , Fort Detrick, Maryland 21702, United States
| | - Mohamed Diwan M AbdulHameed
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command , Fort Detrick, Maryland 21702, United States
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command , Fort Detrick, Maryland 21702, United States
| |
Collapse
|
11
|
Gütlein M, Kramer S. Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J Cheminform 2016; 8:60. [PMID: 27853484 PMCID: PMC5088672 DOI: 10.1186/s13321-016-0173-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. RESULTS We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de). CONCLUSIONS Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition. GRAPHICAL ABSTRACT
Collapse
Affiliation(s)
- Martin Gütlein
- Chair of Data Mining, Institute of Computer Science, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Stefan Kramer
- Chair of Data Mining, Institute of Computer Science, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| |
Collapse
|
12
|
Manganelli S, Benfenati E, Manganaro A, Kulkarni S, Barton-Maclaren TS, Honma M. New Quantitative Structure-Activity Relationship Models Improve Predictability of Ames Mutagenicity for Aromatic Azo Compounds. Toxicol Sci 2016; 153:316-26. [PMID: 27413112 DOI: 10.1093/toxsci/kfw125] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Existing Quantitative Structure-Activity Relationship (QSAR) models have limited predictive capabilities for aromatic azo compounds. In this study, 2 new models were built to predict Ames mutagenicity of this class of compounds. The first one made use of descriptors based on simplified molecular input-line entry system (SMILES), calculated with the CORAL software. The second model was based on the k-nearest neighbors algorithm. The statistical quality of the predictions from single models was satisfactory. The performance further improved when the predictions from these models were combined. The prediction results from other QSAR models for mutagenicity were also evaluated. Most of the existing models were found to be good at finding toxic compounds but resulted in many false positive predictions. The 2 new models specific for this class of compounds avoid this problem thanks to a larger set of related compounds as training set and improved algorithms.
Collapse
Affiliation(s)
- Serena Manganelli
- *Department of Environmental Health Sciences, Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, via La Masa 19, Milano 20156, Italy
| | - Emilio Benfenati
- *Department of Environmental Health Sciences, Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, via La Masa 19, Milano 20156, Italy
| | - Alberto Manganaro
- *Department of Environmental Health Sciences, Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, via La Masa 19, Milano 20156, Italy
| | - Sunil Kulkarni
- Existing Substances Risk Assessment Bureau, Health Canada, Ottawa, Ontario, Canada
| | | | - Masamitsu Honma
- Division of Genetics & Mutagenesis National Institute of Health Sciences 1-18-1 Kamiyoga, Setagaya-Ku, Tokyo 158-8501, Japan
| |
Collapse
|
13
|
Schyman P, Liu R, Wallqvist A. General Purpose 2D and 3D Similarity Approach to Identify hERG Blockers. J Chem Inf Model 2016; 56:213-22. [DOI: 10.1021/acs.jcim.5b00616] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Patric Schyman
- DoD Biotechnology
High Performance
Computing Software Applications Institute, Telemedicine and Advanced
Technology Research Center, U.S. Army Medical Research and Materiel Command, 2405 Whittier Drive, Frederick, Maryland 21702, United States
| | - Ruifeng Liu
- DoD Biotechnology
High Performance
Computing Software Applications Institute, Telemedicine and Advanced
Technology Research Center, U.S. Army Medical Research and Materiel Command, 2405 Whittier Drive, Frederick, Maryland 21702, United States
| | - Anders Wallqvist
- DoD Biotechnology
High Performance
Computing Software Applications Institute, Telemedicine and Advanced
Technology Research Center, U.S. Army Medical Research and Materiel Command, 2405 Whittier Drive, Frederick, Maryland 21702, United States
| |
Collapse
|
14
|
Muegge I, Mukherjee P. An overview of molecular fingerprint similarity search in virtual screening. Expert Opin Drug Discov 2015; 11:137-48. [PMID: 26558489 DOI: 10.1517/17460441.2016.1117070] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
INTRODUCTION A central premise of medicinal chemistry is that structurally similar molecules exhibit similar biological activities. Molecular fingerprints encode properties of small molecules and assess their similarities computationally through bit string comparisons. Based on the similarity to a biologically active template, molecular fingerprint methods allow for identifying additional compounds with a higher chance of displaying similar biological activities against the same target - a process commonly referred to as virtual screening (VS). AREAS COVERED This article focuses on fingerprint similarity searches in the context of compound selection for enhancing hit sets, comparing compound decks, and VS. In addition, the authors discuss the application of fingerprints in predictive modeling. EXPERT OPINION Fingerprint similarity search methods are especially useful in VS if only a few unrelated ligands are known for a given target and therefore more complex and information rich methods such as pharmacophore searches or structure-based design are not applicable. In addition, fingerprint methods are used in characterizing properties of compound collections such as chemical diversity, density in chemical space, and content of biologically active molecules (biodiversity). Such assessments are important for deciding what compounds to experimentally screen, to purchase, or to assemble in a virtual compound deck for in silico screening or de novo design.
Collapse
Affiliation(s)
- Ingo Muegge
- a Boehringer Ingelheim Pharmaceuticals , Department of Small Molecule Discovery Research , Ridgefield , CT , USA
| | - Prasenjit Mukherjee
- a Boehringer Ingelheim Pharmaceuticals , Department of Small Molecule Discovery Research , Ridgefield , CT , USA
| |
Collapse
|
15
|
Liu R, Schyman P, Wallqvist A. Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability. J Chem Inf Model 2015; 55:1566-75. [PMID: 26170251 DOI: 10.1021/acs.jcim.5b00255] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To lower the possibility of late-stage failures in the drug development process, an up-front assessment of absorption, distribution, metabolism, elimination, and toxicity is commonly implemented through a battery of in silico and in vitro assays. As in vitro data is accumulated, in silico quantitative structure-activity relationship (QSAR) models can be trained and used to assess compounds even before they are synthesized. Even though it is generally recognized that QSAR model performance deteriorates over time, rigorous independent studies of model performance deterioration is typically hindered by the lack of publicly available large data sets of structurally diverse compounds. Here, we investigated predictive properties of QSAR models derived from an assembly of publicly available human liver microsomal (HLM) stability data using variable nearest neighbor (v-NN) and random forest (RF) methods. In particular, we evaluated the degree of time-dependent model performance deterioration. Our results show that when evaluated by 10-fold cross-validation with all available HLM data randomly distributed among 10 equal-sized validation groups, we achieved high-quality model performance from both machine-learning methods. However, when we developed HLM models based on when the data appeared and tried to predict data published later, we found that neither method produced predictive models and that their applicability was dramatically reduced. On the other hand, when a small percentage of randomly selected compounds from data published later were included in the training set, performance of both machine-learning methods improved significantly. The implication is that 1) QSAR model quality should be analyzed in a time-dependent manner to assess their true predictive power and 2) it is imperative to retrain models with any up-to-date experimental data to ensure maximum applicability.
Collapse
Affiliation(s)
- Ruifeng Liu
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, MCMR-TT, 504 Scott Street, Fort Detrick, Maryland 21702-5012, United States
| | - Patric Schyman
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, MCMR-TT, 504 Scott Street, Fort Detrick, Maryland 21702-5012, United States
| | - Anders Wallqvist
- DoD Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, MCMR-TT, 504 Scott Street, Fort Detrick, Maryland 21702-5012, United States
| |
Collapse
|
16
|
Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model 2015; 55:1098-107. [DOI: 10.1021/acs.jcim.5b00110] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Robert P. Sheridan
- Cheminformatics Department, RY800B-305, Merck Research Laboratories, Rahway, New Jersey 07065, United States
| |
Collapse
|
17
|
An evaluation of in-house and off-the-shelf in silico models: implications on guidance for mutagenicity assessment. Regul Toxicol Pharmacol 2015; 71:388-97. [PMID: 25656493 DOI: 10.1016/j.yrtph.2015.01.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 01/20/2015] [Accepted: 01/21/2015] [Indexed: 11/22/2022]
Abstract
The evaluation of impurities for genotoxicity using in silico models are commonplace and have become accepted by regulatory agencies. Recently, the ICH M7 Step 4 guidance was published and requires two complementary models for genotoxicity assessments. Over the last ten years, many companies have developed their own internal genotoxicity models built using both public and in-house chemical structures and bacterial mutagenicity data. However, the proprietary nature of internal structures prevents sharing of data and the full OECD compliance of such models. This analysis investigated whether using in-house internal compounds for training models is needed and substantially impacts the results of in silico genotoxicity assessments, or whether using commercial-off-the-shelf (COTS) packages such as Derek Nexus or Leadscope provide adequate performance. We demonstrated that supplementation of COTS packages with a Support Vector Machine (SVM) QSAR model trained on combined in-house and public data does, in fact, improve coverage and accuracy, and reduces the number of compounds needing experimental assessment, i.e., the liability load. This result indicates that there is added value in models trained on both internal and public structures and incorporating such models as part of a consensus approach improves the overall evaluation. Lastly, we optimized an in silico consensus decision-making approach utilizing two COTS models and an internal (SVM) model to minimize false negatives.
Collapse
|
18
|
Ovchinnikova SI, Bykov AA, Tsivadze AY, Dyachkov EP, Kireeva NV. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment. J Cheminform 2014; 6:20. [PMID: 24868246 PMCID: PMC4018504 DOI: 10.1186/1758-2946-6-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 04/28/2014] [Indexed: 12/04/2022] Open
Abstract
Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model's applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.
Collapse
Affiliation(s)
- Svetlana I Ovchinnikova
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Arseniy A Bykov
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Aslan Yu Tsivadze
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
| | - Evgeny P Dyachkov
- Kurnakov Institute of General and Inorganic Chemistry RAS, Leninsky pr-t 31, 119071 Moscow, Russia
| | - Natalia V Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| |
Collapse
|