1
|
Shah SK, Chaple DD, Masand VH, Jawarkar RD, Chaudhari S, Abiramasundari A, Zaki MEA, Al-Hussain SA. Multi-Target In-Silico modeling strategies to discover novel angiotensin converting enzyme and neprilysin dual inhibitors. Sci Rep 2024; 14:15991. [PMID: 38987327 PMCID: PMC11237057 DOI: 10.1038/s41598-024-66230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/28/2024] [Indexed: 07/12/2024] Open
Abstract
Cardiovascular diseases, including heart failure, stroke, and hypertension, affect 608 million people worldwide and cause 32% of deaths. Combination therapy is required in 60% of patients, involving concurrent Renin-Angiotensin-Aldosterone-System (RAAS) and Neprilysin inhibition. This study introduces a novel multi-target in-silico modeling technique (mt-QSAR) to evaluate the inhibitory potential against Neprilysin and Angiotensin-converting enzymes. Using both linear (GA-LDA) and non-linear (RF) algorithms, mt-QSAR classification models were developed using 983 chemicals to predict inhibitory effects on Neprilysin and Angiotensin-converting enzymes. The Box-Jenkins method, feature selection method, and machine learning algorithms were employed to obtain the most predictive model with ~ 90% overall accuracy. Additionally, the study employed virtual screening of designed scaffolds (Chalcone and its analogues, 1,3-Thiazole, 1,3,4-Thiadiazole) applying developed mt-QSAR models and molecular docking. The identified virtual hits underwent successive filtration steps, incorporating assessments of drug-likeness, ADMET profiles, and synthetic accessibility tools. Finally, Molecular dynamic simulations were then used to identify and rank the most favourable compounds. The data acquired from this study may provide crucial direction for the identification of new multi-targeted cardiovascular inhibitors.
Collapse
Affiliation(s)
- Sapan K Shah
- Department of Pharmaceutical Chemistry, Priyadarshini J. L. College of Pharmacy, Hingna Road, Nagpur, 440016, Maharashtra, India.
| | - Dinesh D Chaple
- Department of Pharmaceutical Chemistry, Priyadarshini J. L. College of Pharmacy, Hingna Road, Nagpur, 440016, Maharashtra, India
| | - Vijay H Masand
- Department of Chemistry, Vidya Bharati Mahavidyalaya, Amravati, 444602, Maharashtra, India
| | - Rahul D Jawarkar
- Department of Medicinal Chemistry and Drug Discovery, Dr. Rajendra Gode Institute of Pharmacy, University Mardi Road, Amravati, 444603, India
| | - Somdatta Chaudhari
- Department of Pharmaceutical Chemistry, Modern College of Pharmacy, Nigdi, Pune, India
| | | | - Magdi E A Zaki
- Department of Chemistry, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, 11623, Saudi Arabia.
| | - Sami A Al-Hussain
- Department of Chemistry, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, 11623, Saudi Arabia
| |
Collapse
|
2
|
Ancajas CMF, Oyedele AS, Butt CM, Walker AS. Advances, opportunities, and challenges in methods for interrogating the structure activity relationships of natural products. Nat Prod Rep 2024. [PMID: 38912779 DOI: 10.1039/d4np00009a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Time span in literature: 1985-early 2024Natural products play a key role in drug discovery, both as a direct source of drugs and as a starting point for the development of synthetic compounds. Most natural products are not suitable to be used as drugs without further modification due to insufficient activity or poor pharmacokinetic properties. Choosing what modifications to make requires an understanding of the compound's structure-activity relationships. Use of structure-activity relationships is commonplace and essential in medicinal chemistry campaigns applied to human-designed synthetic compounds. Structure-activity relationships have also been used to improve the properties of natural products, but several challenges still limit these efforts. Here, we review methods for studying the structure-activity relationships of natural products and their limitations. Specifically, we will discuss how synthesis, including total synthesis, late-stage derivatization, chemoenzymatic synthetic pathways, and engineering and genome mining of biosynthetic pathways can be used to produce natural product analogs and discuss the challenges of each of these approaches. Finally, we will discuss computational methods including machine learning methods for analyzing the relationship between biosynthetic genes and product activity, computer aided drug design techniques, and interpretable artificial intelligence approaches towards elucidating structure-activity relationships from models trained to predict bioactivity from chemical structure. Our focus will be on these latter topics as their applications for natural products have not been extensively reviewed. We suggest that these methods are all complementary to each other, and that only collaborative efforts using a combination of these techniques will result in a full understanding of the structure-activity relationships of natural products.
Collapse
Affiliation(s)
| | | | - Caitlin M Butt
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
3
|
Shirasawa R, Takaki K, Miyao T. Generalizability Improvement of Interpretable Symbolic Regression Models for Quantitative Structure-Activity Relationships. ACS OMEGA 2024; 9:9463-9474. [PMID: 38434845 PMCID: PMC10905595 DOI: 10.1021/acsomega.3c09047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 03/05/2024]
Abstract
In the pursuit of optimal quantitative structure-activity relationship (QSAR) models, two key factors are paramount: the robustness of predictive ability and the interpretability of the model. Symbolic regression (SR) searches for the mathematical expressions that explain a training data set. Thus, the models provided by SR are globally interpretable. We previously proposed an SR method that can generate interpretable expressions by humans. This study introduces an enhanced symbolic regression method, termed filter-induced genetic programming 2 (FIGP2), as an extension of our previously proposed SR method. FIGP2 is designed to improve the generalizability of SR models and to be applicable to data sets in which cost-intensive descriptors are employed. The FIGP2 method incorporates two major improvements: a modified domain filter to eradicate diverging expressions based on optimal calculation and the introduction of a stability metric to penalize expressions that would lead to overfitting. Our retrospective comparative analysis using 12 structure-activity relationship data sets revealed that FIGP2 surpassed the previously proposed SR method and conventional modeling methods, such as support vector regression and multivariate linear regression in terms of predictive performance. Generated mathematical expressions by FIGP2 were relatively simple and not divergent in the domain of function. Taken together, FIGP2 can be used for making interpretable regression models with predictive ability.
Collapse
Affiliation(s)
- Raku Shirasawa
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Advanced Research Laboratory, Technology Infrastructure Center, Technology Platform, Sony Group Corporation, Atsugi Tec., 4-14-1 Asahi-cho, Atsugi-shi, Kanagawa 243-0014, Japan
| | - Katsushi Takaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
4
|
Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
Affiliation(s)
- Lang Lei
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Liangmao Zhang
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Zhibang Han
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Qirui Chen
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Pengcheng Liao
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Dong Wu
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Jun Tai
- Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
| | - Bing Xie
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Yinglong Su
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.
| |
Collapse
|
5
|
Kotli M, Piir G, Maran U. Pesticide effect on earthworm lethality via interpretable machine learning. JOURNAL OF HAZARDOUS MATERIALS 2024; 461:132577. [PMID: 37793249 DOI: 10.1016/j.jhazmat.2023.132577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/15/2023] [Accepted: 09/16/2023] [Indexed: 10/06/2023]
Abstract
Earthworms are among the most important animals (invertebrates) for soil health. Many chemical substances released into nature for agricultural development, such as pesticides, may have unwanted effects on those organisms. However, it is essential to understand the extent of the impact of chemicals on soil health first and then make the proper decisions for regulatory or commercial purposes. We hypothesize that there is an expressible quantitative structure-activity relationship (QSAR) between the structure of pesticide compounds and the acute toxicity effect of earthworm species Eisenia fetida. The description of this relationship allows for a better assessment of the impact of chemicals on the said earthworm. To describe this relationship, a dataset of chemicals was collected from open-access sources to develop a mathematical model. A novel approach, combining genetic algorithm and Bayesian optimization, was used to select structural features into the model and to optimize model parameters. The final QSAR classification model was created with the Random Forest algorithm and exhibited good prediction Accuracy of 0.78 on training set and 0.80 on test set. The model representation follows FAIR principles and is available on QsarDB.org.
Collapse
Affiliation(s)
- Mihkel Kotli
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Geven Piir
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Tartu, Estonia.
| |
Collapse
|
6
|
Takkar P, Singh B, Pani B, Kumar R. Design, synthesis and in silico evaluation of newer 1,4-dihydropyridine based amlodipine bio-isosteres as promising antihypertensive agents. RSC Adv 2023; 13:34239-34248. [PMID: 38020040 PMCID: PMC10664005 DOI: 10.1039/d3ra06387a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023] Open
Abstract
Hypertension remains a major global health concern, prompting ongoing research into innovative therapeutic approaches. This research encompasses the strategic design, synthesis, and computational assessment of a novel series of 1,4-dihydropyridine based scaffolds with the objective of developing promising antihypertensive agents as viable alternatives to the well-established dihydropyridine based drugs such as amlodipine, felodipine, nicardipine, etc. The crystal structure of the lead compound determined using X-ray crystallography offers crucial insights into its 3D-conformation and intermolecular interactions. In silico molecular docking experiments conducted against the calcium channel responsible for blood pressure regulation revealed superior docking scores for all the bioisosteres P1-P14 than the standard amlodipine, indicating their potential for enhanced therapeutic efficacy. Extensive ADMET profiling and structure-activity relationship (SAR) elucidated favourable pharmacokinetic properties and essential structural modifications influencing antihypertensive effectiveness. Specifically, P6-P10, P12 and P14 hybrids were found in accordance with Lipinski rules and exhibited druglikeliness attributes, involving high GI absorption and no BBB permeance. In particular, P7 was found to be crystalline in nature having the highest binding affinity with the concerned calcium channels with excellent ADMET profile. The findings highlight the significance of the presence of triazole tethered aryl/heteroaryl ring in the synthesized hybrids, providing a foundation for further preclinical and clinical translation as antihypertensive medications.
Collapse
Affiliation(s)
- Priya Takkar
- Bio-Organic Laboratory, Department of Chemistry, University of Delhi Delhi 110007 India
| | - Bholey Singh
- Swami Shraddhanand College, Alipur, University of Delhi Delhi 110036 India
| | - Balaram Pani
- Bhaskaracharya College of Applied Sciences, University of Delhi Dwarka Sector-2 New Delhi 110075 India
| | - Rakesh Kumar
- Bio-Organic Laboratory, Department of Chemistry, University of Delhi Delhi 110007 India
| |
Collapse
|
7
|
Ida T, Kojima H, Hori Y. Predicting and analyzing organic reaction pathways by combining machine learning and reaction network approaches. Chem Commun (Camb) 2023; 59:12439-12442. [PMID: 37773321 DOI: 10.1039/d3cc03890d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
A learning model is proposed that predicts both products and reaction pathways by combining machine learning and reaction network approaches. By training 50 fundamental organic reactions, the learning model predicted the products and pathways of 35 test reactions with a top-5 accuracy of 68.6%. The model identified the key fragment structures of the intermediates and could be classified as several basic reaction rules in the context of organic chemistry, such as the Markovnikov rule.
Collapse
Affiliation(s)
- Tomonori Ida
- Division of Material Chemistry, Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa 920-1192, Japan.
| | - Honoka Kojima
- Division of Material Chemistry, Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa 920-1192, Japan.
| | - Yuta Hori
- Center for Computational Sciences, University of Tsukuba, Tsukuba 305-8577, Japan
| |
Collapse
|
8
|
Tamang JSD, Banerjee S, Baidya SK, Ghosh B, Adhikari N, Jha T. Employing comparative QSAR techniques for the recognition of dibenzofuran and dibenzothiophene derivatives toward MMP-12 inhibition. J Biomol Struct Dyn 2023:1-17. [PMID: 37498149 DOI: 10.1080/07391102.2023.2239923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Among various matrix metalloproteinases (MMPs), MMP-12 is one of the potential targets for cancer and other diseases. However, none of the MMP-12 inhibitors has passed the clinical trials to date. Therefore, designing potential MMP-12 inhibitors as new drug molecules can provide effective therapeutic strategies for several diseases. In this study, a series of dibenzofuran and dibenzothiophene derivatives were subjected to different 2D and 3D-QSAR techniques to point out the crucial structural contributions highly influential toward the MMP-12 inhibitory activity. These techniques identified some structural attributes of these compounds that are responsible for influencing their MMP-12 inhibition. The carboxylic group may enhance proper binding with catalytic Zn2+ ion at the MMP-12 active site. Again, the i-propyl sulfonamido carboxylic acid function contributed positively toward MMP-12 inhibition. Moreover, the dibenzofuran moiety conferred stable binding at the S1' pocket for higher MMP-12 inhibition. The steric and hydrophobic groups were found favourable near the furan ring substituted at the dibenzofuran moiety. Besides these ligand-based approaches, molecular docking and molecular dynamic (MD) simulation studies not only elucidated the importance of several aspects of these MMP-12 inhibitors while disclosing the significance of the finding of these QSAR studies and their influences toward MMP-12 inhibition. The MD simulation study also revealed stable and compact binding between such compounds at the MMP-12 active site. Therefore, the findings of these validated ligand-based and structure-based molecular modeling studies can aid the development of selective and potent lead molecules that can be used for the treatment of MMP-12-associated diseases.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jigme Sangay Dorjay Tamang
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - Suvankar Banerjee
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - Sandip Kumar Baidya
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - Balaram Ghosh
- Epigenetic Research Laboratory, Department of Pharmacy, Birla Institute of Technology and Science-Pilani, Hyderabad Campus, Shamirpet, Hyderabad, India
| | - Nilanjan Adhikari
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - Tarun Jha
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
9
|
Akinola LK, Uzairu A, Shallangwa GA, Abechi SE. Development and Validation of Predictive Quantitative Structure-Activity Relationship Models for Estrogenic Activities of Hydroxylated Polychlorinated Biphenyls. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2023; 42:823-834. [PMID: 36692119 DOI: 10.1002/etc.5566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/17/2022] [Accepted: 01/18/2023] [Indexed: 06/17/2023]
Abstract
Disruption of the endocrine system by hydroxylated polychlorinated biphenyls (OH-PCBs) is hypothesized, among other potential mechanisms, to be mediated via nuclear receptor binding. Due to the high cost and lengthy time required to produce high-quality experimental data, empirical data to support the nuclear receptor binding hypothesis are in short supply. In the present study, two quantitative structure-activity relationship models were developed for predicting the estrogenic activities of OH-PCBs. Findings revealed that model I (for the estrogen receptor α dataset) contained five two-dimensional (2D) descriptors belonging to the classes autocorrelation, Burden modified eigenvalues, chi path, and atom type electrotopological state, whereas model II (for the estrogen receptor β dataset) contained three 2D and three 3D descriptors belonging to the classes autocorrelation, atom type electrotopological state, and Radial Distribution Function descriptors. The internal and external validation metrics reported for models I and II indicate that both models are robust, reliable, and suitable for predicting the estrogenic activities of untested OH-PCB congeners. Environ Toxicol Chem 2023;42:823-834. © 2023 SETAC.
Collapse
Affiliation(s)
- Lukman K Akinola
- Department of Chemistry, Ahmadu Bello University, Zaria, Nigeria
- Department of Chemistry, Bauchi State University, Gadau, Nigeria
| | - Adamu Uzairu
- Department of Chemistry, Ahmadu Bello University, Zaria, Nigeria
| | | | - Stephen E Abechi
- Department of Chemistry, Ahmadu Bello University, Zaria, Nigeria
| |
Collapse
|
10
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
11
|
Abstract
Machine learning and artificial intelligence approaches have revolutionized multiple disciplines, including toxicology. This review summarizes representative recent applications of machine learning and artificial intelligence approaches in different areas of toxicology, including physiologically based pharmacokinetic (PBPK) modeling, quantitative structure-activity relationship modeling for toxicity prediction, adverse outcome pathway analysis, high-throughput screening, toxicogenomics, big data and toxicological databases. By leveraging machine learning and artificial intelligence approaches, now it is possible to develop PBPK models for hundreds of chemicals efficiently, to create in silico models to predict toxicity for a large number of chemicals with similar accuracies compared to in vivo animal experiments, and to analyze a large amount of different types of data (toxicogenomics, high-content image data, etc.) to generate new insights into toxicity mechanisms rapidly, which was impossible by manual approaches in the past. To continue advancing the field of toxicological sciences, several challenges should be considered: (1) not all machine learning models are equally useful for a particular type of toxicology data, and thus it is important to test different methods to determine the optimal approach; (2) current toxicity prediction is mainly on bioactivity classification (yes/no), so additional studies are needed to predict the intensity of effect or dose-response relationship; (3) as more data become available, it is crucial to perform rigorous data quality check and develop infrastructure to store, share, analyze, evaluate, and manage big data; and (4) it is important to convert machine learning models to user-friendly interfaces to facilitate their applications by both computational and bench scientists.
Collapse
Affiliation(s)
- Zhoumeng Lin
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.,Center for Environmental and Human Toxicology, University of Florida, FL, 32608, USA
| | - Wei-Chun Chou
- Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.,Center for Environmental and Human Toxicology, University of Florida, FL, 32608, USA
| |
Collapse
|
12
|
Akinola LK, Uzairu A, Shallangwa GA, Abechi SE. Quantitative structure–activity relationship modeling of hydroxylated polychlorinated biphenyls as constitutive androstane receptor agonists. Struct Chem 2022. [DOI: 10.1007/s11224-022-01992-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Kaboudi N, Shayanfar A. Predicting the Drug Clearance Pathway with Structural Descriptors. Eur J Drug Metab Pharmacokinet 2022; 47:363-369. [PMID: 35147854 DOI: 10.1007/s13318-021-00748-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/12/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND OBJECTIVE The clearance, by renal elimination or hepatic metabolism, is one of the most important pharmacokinetic parameters of a drug. It allows the half-life, bioavailability, and drug-drug interactions to be predicted, and it can also affect the dose regimen of a drug. Predicting the clearance pathways of new chemical candidates during drug development is vital in order to minimize the risks of possible side effects and drug interactions. Many in vivo methods have been established to predict drug clearance in humans, and these mainly rely on data from in vivo studies in preclinical species-mainly rats, dogs, and monkeys. They are also time consuming and expensive. The aim of this study was to find the relationship between structural parameters of drugs and their clearance pathways. METHODS The clearance pathway of each drug was obtained from the literature. Various structural descriptors [Abraham solvation parameters, topological polar surface area, numbers of hydrogen-bond donors and acceptors, number of rotatable bonds, molecular weight, logarithm of the partition coefficient (logP), and logarithm of the distribution coefficient at pH 7.4 (logD7.4)] were applied to develop a mechanistic model for predicting clearance pathways. RESULTS The results of this study indicate that compounds with logD7.4 > 1 or with zero or one hydrogen-bond donor undergo hepatic metabolism, whereas the clearance pathway for chemicals with logD7.4 < - 2 is renal elimination. Furthermore, models established using logistic regression based on five structural parameters for compounds with - 2 < logD7.4 < 1 could be used in a clearance pathway prediction tool. The overall prediction accuracies of the first and second models were 84.8% and 84.4%, respectively. CONCLUSION The developed model can be used to find the clearance pathways of new drug candidates with acceptable accuracy. The main descriptors that are used to evaluate this parameter are the hydrophobicity and the number of hydrogen-bonding functional groups of the compound.
Collapse
Affiliation(s)
- Navid Kaboudi
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran.,Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Ali Shayanfar
- Pharmaceutical Analysis Research Center, Tabriz University of Medical Sciences, Tabriz, Iran. .,Editorial Office of Pharmaceutical Sciences Journal, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
14
|
Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021; 64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| |
Collapse
|
15
|
De Jesus Silva J, Bartalucci N, Jelier B, Grosslight S, Gensch T, Schünemann C, Müller B, Kamer PCJ, Copéret C, Sigman MS, Togni A. Development and Molecular Understanding of a Pd‐Catalyzed Cyanation of Aryl Boronic Acids Enabled by High‐Throughput Experimentation and Data Analysis. Helv Chim Acta 2021. [DOI: 10.1002/hlca.202100200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Jordan De Jesus Silva
- Department of Chemistry and Applied Biosciences ETH Zürich Vladimir-Prelog-Weg 1–5 CH-8093 Zürich Switzerland
| | - Niccolò Bartalucci
- Department of Chemistry and Applied Biosciences ETH Zürich Vladimir-Prelog-Weg 1–5 CH-8093 Zürich Switzerland
| | - Benson Jelier
- Department of Chemistry and Applied Biosciences ETH Zürich Vladimir-Prelog-Weg 1–5 CH-8093 Zürich Switzerland
| | - Samantha Grosslight
- Department of Chemistry University of Utah 315 South 1400 East Salt Lake City Utah 84112 United States
| | - Tobias Gensch
- Department of Chemistry University of Utah 315 South 1400 East Salt Lake City Utah 84112 United States
- Department of Chemistry TU Berlin Straße des 17. Juni 135 DE-10623 Berlin Germany
| | - Claas Schünemann
- Leibniz-Institute for Catalysis e. V. Albert-Einstein-Straße 29a DE-18059 Rostock Germany
| | - Bernd Müller
- Leibniz-Institute for Catalysis e. V. Albert-Einstein-Straße 29a DE-18059 Rostock Germany
| | - Paul C. J. Kamer
- Leibniz-Institute for Catalysis e. V. Albert-Einstein-Straße 29a DE-18059 Rostock Germany
| | - Christophe Copéret
- Department of Chemistry and Applied Biosciences ETH Zürich Vladimir-Prelog-Weg 1–5 CH-8093 Zürich Switzerland
| | - Matthew S. Sigman
- Department of Chemistry University of Utah 315 South 1400 East Salt Lake City Utah 84112 United States
| | - Antonio Togni
- Department of Chemistry and Applied Biosciences ETH Zürich Vladimir-Prelog-Weg 1–5 CH-8093 Zürich Switzerland
| |
Collapse
|
16
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
17
|
Gajewicz-Skretna A, Kar S, Piotrowska M, Leszczynski J. The kernel-weighted local polynomial regression (KwLPR) approach: an efficient, novel tool for development of QSAR/QSAAR toxicity extrapolation models. J Cheminform 2021; 13:9. [PMID: 33579384 PMCID: PMC7881668 DOI: 10.1186/s13321-021-00484-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 01/11/2021] [Indexed: 11/10/2022] Open
Abstract
The ability of accurate predictions of biological response (biological activity/property/toxicity) of a given chemical makes the quantitative structure‐activity/property/toxicity relationship (QSAR/QSPR/QSTR) models unique among the in silico tools. In addition, experimental data of selected species can also be used as an independent variable along with other structural as well as physicochemical variables to predict the response for different species formulating quantitative activity–activity relationship (QAAR)/quantitative structure–activity–activity relationship (QSAAR) approach. Irrespective of the models' type, the developed model's quality, and reliability need to be checked through multiple classical stringent validation metrics. Among the validation metrics, error-based metrics are more significant as the basic idea of a good predictive model is to improve the predictions' quality by lowering the predicted residuals for new query compounds. Following the concept, we have checked the predictive quality of the QSAR and QSAAR models employing kernel-weighted local polynomial regression (KwLPR) approach over the traditional linear and non-linear regression-based approaches tools such as multiple linear regression (MLR) and k nearest neighbors (kNN). Five datasets which were previously modeled using linear and non-linear regression method were considered to implement the KwPLR approach, followed by comparison of their validation metrics outcomes. For all five cases, the KwLPR based models reported better results over the traditional approaches. The present study's focus is not to develop a better or improved QSAR/QSAAR model over the previous ones, but to demonstrate the advantage, prediction power, and reliability of the KwLPR algorithm and establishing it as a novel, powerful cheminformatic tool. To facilitate the use of the KwLPR algorithm for QSAR/QSPR/QSTR/QSAAR modeling, the authors provide an in-house developed KwLPR.RMD script under the open-source R programming language. ![]()
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| | - Supratik Kar
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS, 39217, USA
| | - Magdalena Piotrowska
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS, 39217, USA
| |
Collapse
|
18
|
Antiplasmodial activity of sulfonylhydrazones: in vitro and in silico approaches. Future Med Chem 2020; 13:233-250. [PMID: 33295837 DOI: 10.4155/fmc-2020-0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Malaria is still a life-threatening public health issue, and the upsurge of resistant strains requires continuous generation of active molecules. In this work, 35 sulfonylhydrazone derivatives were synthesized and evaluated against Plasmodium falciparum chloroquine-sensitive (3D7) and resistant (W2) strains. The most promising compound, 5b, had an IC50 of 0.22 μM against W2 and was less cytotoxic and 26-fold more selective than chloroquine. The structure-activity relationship model, statistical analysis and molecular modeling studies suggested that antiplasmodial activity was related to hydrogen bond acceptor count, molecular weight and partition coefficient of octanol/water and displacement of frontier orbitals to the heteroaromatic ring beside the imine bond. This study demonstrates that the synthesized molecules with a simple scaffold allow the hit-to-lead process for new antimalarials to commence.
Collapse
|
19
|
Russo DP, Yan X, Shende S, Huang H, Yan B, Zhu H. Virtual Molecular Projections and Convolutional Neural Networks for the End-to-End Modeling of Nanoparticle Activities and Properties. Anal Chem 2020; 92:13971-13979. [PMID: 32970421 DOI: 10.1021/acs.analchem.0c02878] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Digitalizing complex nanostructures into data structures suitable for machine learning modeling without losing nanostructure information has been a major challenge. Deep learning frameworks, particularly convolutional neural networks (CNNs), are especially adept at handling multidimensional and complex inputs. In this study, CNNs were applied for the modeling of nanoparticle activities exclusively from nanostructures. The nanostructures were represented by virtual molecular projections, a multidimensional digitalization of nanostructures, and used as input data to train CNNs. To this end, 77 nanoparticles with various activities and/or physicochemical property results were used for modeling. The resulting CNN model predictions show high correlations with the experimental results. An analysis of a trained CNN quantitatively showed that neurons were able to recognize distinct nanostructure features critical to activities and physicochemical properties. This "end-to-end" deep learning approach is well suited to digitalize complex nanostructures for data-driven machine learning modeling and can be broadly applied to rationally design nanoparticles with desired activities.
Collapse
Affiliation(s)
- Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States
| | - Xiliang Yan
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Sunil Shende
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Department of Computer Science, Rutgers University, 227 Penn Street, Camden, New Jersey 08102, United States
| | | | - Bing Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China.,School of Environmental Science and Engineering, Shandong University, Jinan 250100, China
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Department of Chemistry, Rutgers University, 315 Penn Street, Camden, New Jersey 08102, United States
| |
Collapse
|
20
|
Structural analysis of arylsulfonamide-based carboxylic acid derivatives: a QSAR study to identify the structural contributors toward their MMP-9 inhibition. Struct Chem 2020. [DOI: 10.1007/s11224-020-01635-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
21
|
Zhang X, Xu J, Yang J, Chen L, Zhou H, Liu X, Li H, Lin T, Ying Y. Understanding the learning mechanism of convolutional neural networks in spectral analysis. Anal Chim Acta 2020; 1119:41-51. [DOI: 10.1016/j.aca.2020.03.055] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 02/27/2020] [Accepted: 03/29/2020] [Indexed: 11/16/2022]
|
22
|
Luo Y, Gopaluni B, Xu Y, Cao L, Zhu QX. A Novel Approach to Alarm Causality Analysis Using Active Dynamic Transfer Entropy. Ind Eng Chem Res 2020. [DOI: 10.1021/acs.iecr.9b06262] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yi Luo
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
- Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Bhushan Gopaluni
- Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Yuan Xu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Liang Cao
- Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Qun-Xiong Zhu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| |
Collapse
|
23
|
Barnard AS, Motevalli B, Parker AJ, Fischer JM, Feigl CA, Opletal G. Nanoinformatics, and the big challenges for the science of small things. NANOSCALE 2019; 11:19190-19201. [PMID: 31397835 DOI: 10.1039/c9nr05912a] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The combination of computational chemistry and computational materials science with machine learning and artificial intelligence provides a powerful way of relating structural features of nanomaterials with functional properties. However, combining these fundamentally different scientific approaches is not as straightforward as it seems. Machine learning methods were developed for large data sets with small numbers of consistent features. Typically nanomaterials data sets are small, with high dimensionality and high variance in the feature space, and suffer from numerous destructive biases. None of the established data science or machine learning methods in widespread use today were devised with (nano)materials data sets in mind, but there are ways to overcome these challenges and use them reliably. In this review we will discuss domain-specific constraints on data-driven nanomaterials design, and explore the differences between nanomaterials simulation and nanoinformatics that can be leveraged for greater impact.
Collapse
Affiliation(s)
- A S Barnard
- CSIRO Data61, Docklands, Victoria, Australia.
| | - B Motevalli
- CSIRO Data61, Docklands, Victoria, Australia.
| | - A J Parker
- CSIRO Data61, Docklands, Victoria, Australia.
| | - J M Fischer
- CSIRO Data61, Docklands, Victoria, Australia.
| | - C A Feigl
- CSIRO Data61, Docklands, Victoria, Australia.
| | - G Opletal
- CSIRO Data61, Docklands, Victoria, Australia.
| |
Collapse
|
24
|
Halder AK, Giri AK, Cordeiro MNDS. Multi-Target Chemometric Modelling, Fragment Analysis and Virtual Screening with ERK Inhibitors as Potential Anticancer Agents. Molecules 2019; 24:molecules24213909. [PMID: 31671605 PMCID: PMC6864583 DOI: 10.3390/molecules24213909] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 10/21/2019] [Accepted: 10/25/2019] [Indexed: 02/07/2023] Open
Abstract
Two isoforms of extracellular regulated kinase (ERK), namely ERK-1 and ERK-2, are associated with several cellular processes, the aberration of which leads to cancer. The ERK-1/2 inhibitors are thus considered as potential agents for cancer therapy. Multitarget quantitative structure–activity relationship (mt-QSAR) models based on the Box–Jenkins approach were developed with a dataset containing 6400 ERK inhibitors assayed under different experimental conditions. The first mt-QSAR linear model was built with linear discriminant analysis (LDA) and provided information regarding the structural requirements for better activity. This linear model was also utilised for a fragment analysis to estimate the contributions of ring fragments towards ERK inhibition. Then, the random forest (RF) technique was employed to produce highly predictive non-linear mt-QSAR models, which were used for screening the Asinex kinase library and identify the most potential virtual hits. The fragment analysis results justified the selection of the hits retrieved through such virtual screening. The latter were subsequently subjected to molecular docking and molecular dynamics simulations to understand their possible interactions with ERK enzymes. The present work, which utilises in-silico techniques such as multitarget chemometric modelling, fragment analysis, virtual screening, molecular docking and dynamics, may provide important guidelines to facilitate the discovery of novel ERK inhibitors.
Collapse
Affiliation(s)
- Amit Kumar Halder
- Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal.
| | - Amal Kanta Giri
- Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal.
| | | |
Collapse
|
25
|
Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019; 63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 138] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
26
|
Halder AK, Cordeiro MNDS. Development of Multi-Target Chemometric Models for the Inhibition of Class I PI3K Enzyme Isoforms: A Case Study Using QSAR-Co Tool. Int J Mol Sci 2019; 20:ijms20174191. [PMID: 31461863 PMCID: PMC6747073 DOI: 10.3390/ijms20174191] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 08/23/2019] [Accepted: 08/24/2019] [Indexed: 12/12/2022] Open
Abstract
The present work aims at establishing multi-target chemometric models using the recently launched quantitative structure–activity relationship (QSAR)-Co tool for predicting the activity of inhibitor compounds against different isoforms of phosphoinositide 3-kinase (PI3K) under various experimental conditions. The inhibitors of class I phosphoinositide 3-kinase (PI3K) isoforms have emerged as potential therapeutic agents for the treatment of various disorders, especially cancer. The cell-based enzyme inhibition assay results of PI3K inhibitors were curated from the CHEMBL database. Factors such as the nature and mutation of cell lines that may significantly alter the assay outcomes were considered as important experimental elements for mt-QSAR model development. The models, in turn, were developed using two machine learning techniques as implemented in QSAR-Co: linear discriminant analysis (LDA) and random forest (RF). Both techniques led to models with high accuracy (ca. 90%). Several molecular fragments were extracted from the current dataset, and their quantitative contributions to the inhibitory activity against all the proteins and experimental conditions under study were calculated. This case study also demonstrates the utility of QSAR-Co tool in solving multi-factorial and complex chemometric problems. Additionally, the combination of different in silico methods employed in this work can serve as a valuable guideline to speed up early discovery of PI3K inhibitors.
Collapse
Affiliation(s)
- Amit Kumar Halder
- Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
| | | |
Collapse
|
27
|
Ciallella HL, Zhu H. Advancing Computational Toxicology in the Big Data Era by Artificial Intelligence: Data-Driven and Mechanism-Driven Modeling for Chemical Toxicity. Chem Res Toxicol 2019; 32:536-547. [PMID: 30907586 DOI: 10.1021/acs.chemrestox.8b00393] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In 2016, the Frank R. Lautenberg Chemical Safety for the 21st Century Act became the first US legislation to advance chemical safety evaluations by utilizing novel testing approaches that reduce the testing of vertebrate animals. Central to this mission is the advancement of computational toxicology and artificial intelligence approaches to implementing innovative testing methods. In the current big data era, the terms volume (amount of data), velocity (growth of data), and variety (the diversity of sources) have been used to characterize the currently available chemical, in vitro, and in vivo data for toxicity modeling purposes. Furthermore, as suggested by various scientists, the variability (internal consistency or lack thereof) of publicly available data pools, such as PubChem, also presents significant computational challenges. The development of novel artificial intelligence approaches based on public massive toxicity data is urgently needed to generate new predictive models for chemical toxicity evaluations and make the developed models applicable as alternatives for evaluating untested compounds. In this procedure, traditional approaches (e.g., QSAR) purely based on chemical structures have been replaced by newly designed data-driven and mechanism-driven modeling. The resulting models realize the concept of adverse outcome pathway (AOP), which can not only directly evaluate toxicity potentials of new compounds, but also illustrate relevant toxicity mechanisms. The recent advancement of computational toxicology in the big data era has paved the road to future toxicity testing, which will significantly impact on the public health.
Collapse
|
28
|
Abstract
Beyond finding inhibitors that show high binding affinity to the respective target, there is the challenge of optimizing their properties with respect to metabolic and toxicological issues, as well as further off-target effects. To reduce the experimental effort of synthesizing and testing actual substances in corresponding assays, virtual screening has become an indispensable toolbox in preclinical development. The scope of application covers the prediction of molecular properties including solubility, metabolic liability and binding to antitargets, such as the hERG channel. Furthermore, prediction of binding sites and drugable targets are emerging aspects of virtual screening. Issues involved with the currently applied computational models including machine learning algorithms are outlined, such as limitations to the accuracy of prediction and overfitting.
Collapse
|
29
|
Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017; 57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and
Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
| |
Collapse
|
30
|
Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets. J Chem Inf Model 2017; 57:1773-1792. [PMID: 28715209 DOI: 10.1021/acs.jcim.6b00753] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
Collapse
Affiliation(s)
- Richard L Marchese Robinson
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom.,School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University , James Parsons Building, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Anna Palczewska
- Department of Computing, University of Bradford , Bradford BD7 1DP, United Kingdom
| | - Jan Palczewski
- School of Mathematics, University of Leeds , Leeds LS2 9JT, United Kingdom
| | - Nathan Kidley
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom
| |
Collapse
|
31
|
Koutsoukas A, Monaghan KJ, Li X, Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017; 9:42. [PMID: 29086090 PMCID: PMC5489441 DOI: 10.1186/s13321-017-0226-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/27/2017] [Indexed: 01/03/2023] Open
Abstract
Background In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored. Results We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets. Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexios Koutsoukas
- Department of Electrical Engineering and Computer Sciences, University of Kansas, Lawrence, KS, 66047-7621, USA
| | - Keith J Monaghan
- Department of Electrical Engineering and Computer Sciences, University of Kansas, Lawrence, KS, 66047-7621, USA
| | - Xiaoli Li
- Department of Electrical Engineering and Computer Sciences, University of Kansas, Lawrence, KS, 66047-7621, USA
| | - Jun Huan
- Department of Electrical Engineering and Computer Sciences, University of Kansas, Lawrence, KS, 66047-7621, USA.
| |
Collapse
|
32
|
Structural, Physicochemical and Stereochemical Interpretation of QSAR Models Based on Simplex Representation of Molecular Structure. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
33
|
Hanser T, Barber C, Marchaland JF, Werner S. Applicability domain: towards a more formal definition. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:893-909. [PMID: 27827546 DOI: 10.1080/1062936x.2016.1250229] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/16/2016] [Indexed: 06/06/2023]
Abstract
In recent years the applicability domain (AD) of a prediction system has become an important concern in (Q)SAR modelling, especially in the context of human safety assessment. Today AD is an active research topic, and many methods have been designed to estimate the adequacy of a model and the confidence in its outcome for a given prediction task. Unfortunately, the wide spectrum of techniques developed for this purpose is based on various definitions of the concept of AD, often taking into account different types of information. This variety of methodologies confuses the end users and makes the comparison of the AD for different models almost impossible. In this article, we demonstrate that AD is not a monolithic concept and can be broken down into three well-defined sub-domains assessing confidence at the model, prediction and decision levels, respectively. By leveraging this separation of concerns we have an opportunity to clarify, formalize and extend the definition of AD. We propose a framework that captures this new vision with the aim to initiate a global effort to converge towards a common AD definition within the (Q)SAR community.
Collapse
Affiliation(s)
- T Hanser
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| | - C Barber
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| | | | - S Werner
- a Research Group, Lhasa Limited (UK) , Leeds , UK
| |
Collapse
|
34
|
Falchi F, Bertozzi SM, Ottonello G, Ruda GF, Colombano G, Fiorelli C, Martucci C, Bertorelli R, Scarpelli R, Cavalli A, Bandiera T, Armirotti A. Kernel-Based, Partial Least Squares Quantitative Structure-Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification. Anal Chem 2016; 88:9510-9517. [DOI: 10.1021/acs.analchem.6b02075] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Federico Falchi
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Sine Mandrup Bertozzi
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Giuliana Ottonello
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Gian Filippo Ruda
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Giampiero Colombano
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Claudio Fiorelli
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Cataldo Martucci
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Rosalia Bertorelli
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Rita Scarpelli
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Andrea Cavalli
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
- Department
of Pharmacy and Biotechnology, University of Bologna, Via Belmeloro
6, 40126 Bologna, Italy
| | - Tiziano Bandiera
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| | - Andrea Armirotti
- Drug
Discovery and Development Department, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy
| |
Collapse
|
35
|
Hong H, Shen J, Ng HW, Sakkiah S, Ye H, Ge W, Gong P, Xiao W, Tong W. A Rat α-Fetoprotein Binding Activity Prediction Model to Facilitate Assessment of the Endocrine Disruption Potential of Environmental Chemicals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016; 13:372. [PMID: 27023588 PMCID: PMC4847034 DOI: 10.3390/ijerph13040372] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 03/10/2016] [Accepted: 03/22/2016] [Indexed: 11/21/2022]
Abstract
Endocrine disruptors such as polychlorinated biphenyls (PCBs), diethylstilbestrol (DES) and dichlorodiphenyltrichloroethane (DDT) are agents that interfere with the endocrine system and cause adverse health effects. Huge public health concern about endocrine disruptors has arisen. One of the mechanisms of endocrine disruption is through binding of endocrine disruptors with the hormone receptors in the target cells. Entrance of endocrine disruptors into target cells is the precondition of endocrine disruption. The binding capability of a chemical with proteins in the blood affects its entrance into the target cells and, thus, is very informative for the assessment of potential endocrine disruption of chemicals. α-fetoprotein is one of the major serum proteins that binds to a variety of chemicals such as estrogens. To better facilitate assessment of endocrine disruption of environmental chemicals, we developed a model for α-fetoprotein binding activity prediction using the novel pattern recognition method (Decision Forest) and the molecular descriptors calculated from two-dimensional structures by Mold² software. The predictive capability of the model has been evaluated through internal validation using 125 training chemicals (average balanced accuracy of 69%) and external validations using 22 chemicals (balanced accuracy of 71%). Prediction confidence analysis revealed the model performed much better at high prediction confidence. Our results indicate that the model is useful (when predictions are in high confidence) in endocrine disruption risk assessment of environmental chemicals though improvement by increasing number of training chemicals is needed.
Collapse
Affiliation(s)
- Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Jie Shen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Hui Wen Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Hao Ye
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, 3909 Halls Ferry Road, Vicksburg, MS 39180, USA.
| | - Wenming Xiao
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA.
| |
Collapse
|
36
|
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014; 57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1040] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Collapse
Affiliation(s)
- Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Alexandre Varnek
- Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
| | - Igor I. Baskin
- Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mark Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - John Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - Paola Gramatica
- Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
| | | | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Victor E. Kuz'min
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | | | - Romualdo Benigni
- Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
| | | | - James Rathman
- Altamira LLC, Columbus OH 43235, USA
- Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
| | | | | | - Ann Richard
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
37
|
Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes. J Comput Aided Mol Des 2014; 28:61-73. [DOI: 10.1007/s10822-014-9719-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 01/24/2014] [Indexed: 10/25/2022]
|
38
|
Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 2014; 12:948-62. [PMID: 24287782 DOI: 10.1038/nrd4128] [Citation(s) in RCA: 156] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 'quality' of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure-activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact.
Collapse
Affiliation(s)
- John G Cumming
- Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield SK10 4TG, UK
| | | | | | | | | |
Collapse
|
39
|
Dander A, Mueller LA, Gallasch R, Pabinger S, Emmert-Streib F, Graber A, Dehmer M. [COMMODE] a large-scale database of molecular descriptors using compounds from PubChem. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013; 8:22. [PMID: 24225386 PMCID: PMC3831596 DOI: 10.1186/1751-0473-8-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 10/29/2013] [Indexed: 11/11/2022]
Abstract
Background Molecular descriptors have been extensively used in the field of structure-oriented drug design and structural chemistry. They have been applied in QSPR and QSAR models to predict ADME-Tox properties, which specify essential features for drugs. Molecular descriptors capture chemical and structural information, but investigating their interpretation and meaning remains very challenging. Results This paper introduces a large-scale database of molecular descriptors called COMMODE containing more than 25 million compounds originated from PubChem. About 2500 DRAGON-descriptors have been calculated for all compounds and integrated into this database, which is accessible through a web interface at http://commode.i-med.ac.at.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Matthias Dehmer
- UMIT, Division for Bioinformatics and Translational Research, Eduard Wallnoefer Zentrum 1, A-6060 Hall in Tyrol, Austria.
| |
Collapse
|
40
|
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Mol Inform 2013; 32:843-53. [DOI: 10.1002/minf.201300029] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 07/29/2013] [Indexed: 11/07/2022]
|
41
|
Phatak SS, Stephan CC, Cavasotto CN. High-throughput and in silico screenings in drug discovery. Expert Opin Drug Discov 2013; 4:947-59. [PMID: 23480542 DOI: 10.1517/17460440903190961] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND In the current situation of weak drug pipelines, impending patent expiration of several blockbuster drugs, industry consolidation and changing business models that target special diseases like cancer, diabetes, Alzheimer's and obesity, the pharmaceutical industry is under intense pressure to generate a strong drug pipeline distinguished by better productivity, diversity and cost effectiveness. The goal is discovering high-quality leads in the initial stages of the development cycle, to minimize the costs associated with failures at later ones. OBJECTIVE Thus, there is a great amount of interest in further developing and optimizing high-throughput screening and in silico screening, the two methods responsible for generating most of the lead compounds. Although high-throughput screening is the predominant starting point for discovery programs, in silico methods have gradually made inroads by their more rational approach, to expedite the drug discovery and development process. CONCLUSION Modern drug discovery strategies include both methods in tandem or in an iterative way. This review primarily provides a succinct overview and comparison of experimental and in silico screening techniques, selected case studies where both methods were used in concert to investigate their performance and complementary nature and a statement on the developments in experimental and in silico approaches in the near future.
Collapse
Affiliation(s)
- Sharangdhar S Phatak
- The University of Texas Health Science Center at Houston, School of Health Information Sciences, 7000 Fannin, Suite 860B, Houston, TX 77030, USA +1 713 500 3934 ; +1 713 500 3907 ;
| | | | | |
Collapse
|
42
|
Abstract
Understanding structure-activity relationships (SARs) for a given set of molecules allows one to rationally explore chemical space and develop a chemical series optimizing multiple physicochemical and biological properties simultaneously, for instance, improving potency, reducing toxicity, and ensuring sufficient bioavailability. In silico methods allow rapid and efficient characterization of SARs and facilitate building a variety of models to capture and encode one or more SARs, which can then be used to predict activities for new molecules. By coupling these methods with in silico modifications of structures, one can easily prioritize large screening decks or even generate new compounds de novo and ascertain whether they belong to the SAR being studied. Computational methods can provide a guide for the experienced user by integrating and summarizing large amounts of preexisting data to suggest useful structural modifications. This chapter highlights the different types of SAR modeling methods and how they support the task of exploring chemical space to elucidate and optimize SARs in a drug discovery setting. In addition to considering modeling algorithms, I briefly discuss how to use databases as a source of SAR data to inform and enhance the exploration of SAR trends. I also review common modeling techniques that are used to encode SARs, recent work in the area of structure-activity landscapes, the role of SAR databases, and alternative approaches to exploring SAR data that do not involve explicit model development.
Collapse
Affiliation(s)
- Rajarshi Guha
- NIH Center for Advancing Translational Science, Rockville, MD, USA
| |
Collapse
|
43
|
Lamchouri F, Toufik H, Elmalki Z, Bouzzine SM, Ait Malek H, Hamidi M, Bouachrine M. Quantitative structure–activity relationship of antitumor and neurotoxic β-carbolines alkaloids: nine harmine derivatives. RESEARCH ON CHEMICAL INTERMEDIATES 2012. [DOI: 10.1007/s11164-012-0752-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
44
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
45
|
Powerful Integrative Tool Combining Structure Generator and Chemical Space Visualization. JOURNAL OF COMPUTER AIDED CHEMISTRY 2012. [DOI: 10.2751/jcac.13.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
46
|
Sukumar N, Krein MP, Embrechts MJ. Predictive cheminformatics in drug discovery: statistical modeling for analysis of micro-array and gene expression data. Methods Mol Biol 2012; 910:165-94. [PMID: 22821597 DOI: 10.1007/978-1-61779-965-5_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
Collapse
Affiliation(s)
- N Sukumar
- Rensselaer Exploratory Center for Cheminformatics Research and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA.
| | | | | |
Collapse
|
47
|
Hutter MC. Determining the Degree of Randomness of Descriptors in Linear Regression Equations with Respect to the Data Size. J Chem Inf Model 2011; 51:3099-104. [DOI: 10.1021/ci200403j] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Michael C. Hutter
- Center for Bioinformatics, Campus Building E2.1, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
48
|
Carbon-Mangels M, Hutter MC. Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets. Mol Inform 2011; 30:885-95. [PMID: 27468108 DOI: 10.1002/minf.201100069] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 08/19/2011] [Indexed: 11/12/2022]
Abstract
Classification algorithms suffer from the curse of dimensionality, which leads to overfitting, particularly if the problem is over-determined. Therefore it is of particular interest to identify the most relevant descriptors to reduce the complexity. We applied Bayesian estimates to model the probability distribution of descriptors values used for binary classification using n-fold cross-validation. As a measure for the discriminative power of the classifiers, the symmetric form of the Kullback-Leibler divergence of their probability distributions was computed. We found that the most relevant descriptors possess a Gaussian-like distribution of their values, show the largest divergences, and therefore appear most often in the cross-validation scenario. The results were compared to those of the LASSO feature selection method applied to multiple decision trees and support vector machine approaches for data sets of substrates and nonsubstrates of three Cytochrome P450 isoenzymes, which comprise strongly unbalanced compound distributions. In contrast to decision trees and support vector machines, the performance of Bayesian estimates is less affected by unbalanced data sets. This strategy reveals those descriptors that allow a simple linear separation of the classes, whereas the superior accuracy of decision trees and support vector machines can be attributed to nonlinear separation, which are in turn more prone to overfitting.
Collapse
Affiliation(s)
- Miriam Carbon-Mangels
- Section of Biostatistics, Paul-Ehrlich-Institut, Federal Institute for Vaccines and Biomedicines, Paul-Ehrlich-Straße 51-59, 63225 Langen, Germany
| | - Michael C Hutter
- Center for Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany phone/fax: +49 681 302 70703/70702.
| |
Collapse
|
49
|
Soto AJ, Vazquez GE, Strickert M, Ponzoni I. Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation. Mol Inform 2011; 30:779-89. [DOI: 10.1002/minf.201100053] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 05/26/2011] [Indexed: 11/06/2022]
|
50
|
Abstract
'It is better to be useful than perfect'. This review attempts to critically cover and assess the currently available approaches and tools to answer the crucial question: Is it possible (and if it is, to what extent is it possible) to predict in vivo metabolites and their abundances on the basis of in vitro and preclinical animal studies? In preclinical drug development, it is possible to produce metabolite patterns from a candidate drug by virtual means (i.e., in silico models), but these are not yet validated. However, they may be useful to cover the potential range of metabolites. In vitro metabolite patterns and apparent relative abundances are produced by various in vitro systems employing tissue preparations (mainly liver) and in most cases using liquid chromatography-mass spectrometry analytical techniques for tentative identification. The pattern of the metabolites produced depends on the enzyme source; the most comprehensive source of drug-metabolizing enzymes is cultured human hepatocytes, followed by liver homogenate fortified with appropriate cofactors. For specific purposes, such as the identification of metabolizing enzyme(s), recombinant enzymes can be used. Metabolite data from animal in vitro and in vivo experiments, despite known species differences, may help pinpoint metabolites that are not apparently produced in in vitro human systems, or suggest alternative experimental approaches. The range of metabolites detected provides clues regarding the enzymes attacking the molecule under study. We also discuss established approaches to identify the major enzymes. The last question, regarding reliability and robustness of metabolite extrapolations from in vitro to in vivo, both qualitatively and quantitatively, cannot be easily answered. There are a number of examples in the literature suggesting that extrapolations are generally useful, but there are only a few systematic and comprehensive studies to validate in vitro-in vivo extrapolations. In conclusion, extrapolation from preclinical metabolite data to the in vivo situation is certainly useful, but it is not known to what extent.
Collapse
|