1
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
2
|
Kumar S, Bhowmik R, Oh JM, Abdelgawad MA, Ghoneim MM, Al-Serwi RH, Kim H, Mathew B. Machine learning driven web-based app platform for the discovery of monoamine oxidase B inhibitors. Sci Rep 2024; 14:4868. [PMID: 38418571 PMCID: PMC10901862 DOI: 10.1038/s41598-024-55628-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 02/26/2024] [Indexed: 03/01/2024] Open
Abstract
Monoamine oxidases (MAOs), specifically MAO-A and MAO-B, play important roles in the breakdown of monoamine neurotransmitters. Therefore, MAO inhibitors are crucial for treating various neurodegenerative disorders, including Parkinson's disease (PD), Alzheimer's disease (AD), and amyotrophic lateral sclerosis (ALS). In this study, we developed a novel cheminformatics pipeline by generating three diverse molecular feature-based machine learning-assisted quantitative structural activity relationship (ML-QSAR) models concerning MAO-B inhibition. PubChem fingerprints, substructure fingerprints, and one-dimensional (1D) and two-dimensional (2D) molecular descriptors were implemented to unravel the structural insights responsible for decoding the origin of MAO-B inhibition in 249 non-reductant molecules. Based on a random forest ML algorithm, the final PubChem fingerprint, substructure fingerprint, and 1D and 2D molecular descriptor prediction models demonstrated significant robustness, with correlation coefficients of 0.9863, 0.9796, and 0.9852, respectively. The significant features of each predictive model responsible for MAO-B inhibition were extracted using a comprehensive variance importance plot (VIP) and correlation matrix analysis. The final predictive models were further developed as a web application, MAO-B-pred ( https://mao-b-pred.streamlit.app/ ), to allow users to predict the bioactivity of molecules against MAO-B. Molecular docking and dynamics studies were conducted to gain insight into the atomic-level molecular interactions between the ligand-receptor complexes. These findings were compared with the structural features obtained from the ML-QSAR models, which supported the mechanistic understanding of the binding phenomena. The presented models have the potential to serve as tools for identifying crucial molecular characteristics for the rational design of MAO-B target inhibitors, which may be used to develop effective drugs for neurodegenerative disorders.
Collapse
Affiliation(s)
- Sunil Kumar
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi, India
| | - Ratul Bhowmik
- Department of Pharmaceutical Chemistry, School of Pharmaceutical Education and Research, Jamia Hamdard, New Delhi, India
| | - Jong Min Oh
- Department of Pharmacy, and Research Institute of Life Pharmaceutical Sciences, Sunchon National University, Suncheon, 57922, Republic of Korea
| | - Mohamed A Abdelgawad
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, 72341, Sakaka, Aljouf, Saudi Arabia
| | - Mohammed M Ghoneim
- Department of Pharmacy Practice, College of Pharmacy, AlMaarefa University, 13713, Ad Diriyah, Riyadh, Saudi Arabia
| | - Rasha Hamed Al-Serwi
- Department of Basic Dental Sciences, College of Dentistry, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Hoon Kim
- Department of Pharmacy, and Research Institute of Life Pharmaceutical Sciences, Sunchon National University, Suncheon, 57922, Republic of Korea.
| | - Bijo Mathew
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi, India.
| |
Collapse
|
3
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
4
|
Hunt ML, Blackburn GA, Siriwardena GM, Carrasco L, Rowland CS. Using satellite data to assess spatial drivers of bird diversity. REMOTE SENSING IN ECOLOGY AND CONSERVATION 2023; 9:483-500. [PMID: 38505567 PMCID: PMC10946777 DOI: 10.1002/rse2.322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 11/15/2022] [Accepted: 11/28/2022] [Indexed: 03/21/2024]
Abstract
Birds are useful indicators of overall biodiversity, which continues to decline globally, despite targets to reduce its loss. The aim of this paper is to understand the importance of different spatial drivers for modelling bird distributions. Specifically, it assesses the importance of satellite-derived measures of habitat productivity, heterogeneity and landscape structure for modelling bird diversity across Great Britain. Random forest (RF) regression is used to assess the extent to which a combination of satellite-derived covariates explain woodland and farmland bird diversity and richness. Feature contribution analysis is then applied to assess the relationships between the response variable and the covariates in the final RF models. We show that much of the variation in farmland and woodland bird distributions is explained (R 2 0.64-0.77) using monthly habitat-specific productivity values and landscape structure (FRAGSTATS) metrics. The analysis highlights important spatial drivers of bird species richness and diversity, including high productivity grassland during spring for farmland birds and woodland patch edge length for woodland birds. The feature contribution provides insight into the form of the relationship between the spatial drivers and bird richness and diversity, including when a particular spatial driver affects bird richness positively or negatively. For example, for woodland bird diversity, the May 80th percentile Normalized Difference Vegetation Index (NDVI) for broadleaved woodland has a strong positive effect on bird richness when NDVI is >0.7 and a strong negative effect below. If relationships such as these are stable over time, they offer a useful analytical tool for understanding and comparing the influence of different spatial drivers.
Collapse
Affiliation(s)
- Merryn L. Hunt
- UK Centre for Ecology & Hydrology, Lancaster Environment CentreLancaster UniversityLancasterLA1 4YQUnited Kingdom
| | | | - Gavin M. Siriwardena
- British Trust for Ornithology, The Nunnery, ThetfordNorfolkIP24 2PUUnited Kingdom
| | | | - Clare S. Rowland
- UK Centre for Ecology & Hydrology, Lancaster Environment CentreLancaster UniversityLancasterLA1 4YQUnited Kingdom
| |
Collapse
|
5
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
6
|
Feng D, Baumgartner R. A closer look at the kernels generated by the decision and regression tree ensembles. Stat Biopharm Res 2022. [DOI: 10.1080/19466315.2022.2150680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Dai Feng
- Data and Statistical Sciences, AbbVie Inc., North Chicago, IL, USA
| | | |
Collapse
|
7
|
An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms. BIOLOGY 2022; 11:biology11111654. [PMID: 36421368 PMCID: PMC9687739 DOI: 10.3390/biology11111654] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 10/30/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96% accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3−30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer.
Collapse
|
8
|
Beers AT, Frey SN. Greater sage‐grouse habitat selection varies across the marginal habitat of its lagging range margin. Ecosphere 2022. [DOI: 10.1002/ecs2.4146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Affiliation(s)
- Aidan T. Beers
- Department of Wildland Resources Utah State University Logan Utah USA
| | - Shandra N. Frey
- Department of Wildland Resources Utah State University Logan Utah USA
| |
Collapse
|
9
|
Du S, Wang X, Wang R, Lu L, Luo Y, You G, Wu S. Machine-learning-assisted molecular design of phenylnaphthylamine-type antioxidants. Phys Chem Chem Phys 2022; 24:13399-13410. [PMID: 35608602 DOI: 10.1039/d2cp00083k] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this study, a total of 302 molecular structures of phenylnaphthylamine antioxidants based on N-phenyl-1-naphthylamine and N-phenyl-2-naphthylamine skeletons with various substituents were modeled by exhaustive methods. Antioxidant parameters, including the hydrogen dissociation energy, solubility parameter, and binding energy, were calculated through molecular simulations. Then, a group decomposition scheme was determined to decompose 302 antioxidants. The antioxidant parameters and decomposition results constituted machine-learning data sets. Using an artificial neural network model, a correlation coefficient between the predicted and true values above 0.88 and an average relative error within 6% were achieved. Random forest models were used to analyze the factors affecting antioxidant activity from chemical and physical perspectives; the results showed that amino and alkyl groups were conducive to improving antioxidant performance. Moreover, substituent positions 1, 7, and 10 of N-phenyl-1-naphthylamine and 3, 7, and 10 of N-phenyl-2-naphthylamine were found to be the optimal positions for modifications to improve antioxidant activity. Two potentially efficient phenylnaphthylamine antioxidant structures were proposed and their antioxidant parameters were also calculated; the hydrogen dissociation energy and solubility parameter decreased by more than 9% and 7%, respectively, whereas the binding energy increased by more than 16% compared with the benchmark of N-phenyl-1-naphthylamine. These results indicate that molecular simulation and machine learning could provide alternative tools for the molecular design of new antioxidants.
Collapse
Affiliation(s)
- Shanda Du
- State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Xiujuan Wang
- Key Laboratory of Rubber-Plastics, Ministry of Education/Shandong Provincial Key Laboratory of Rubber-Plastics, Qingdao University of Science & Technology, Qingdao 266042, China
| | - Runguo Wang
- State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Ling Lu
- State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Yanlong Luo
- College of Science, Nanjing Forestry University, Nanjing 210037, China
| | - Guohua You
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Sizhu Wu
- State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
| |
Collapse
|
10
|
Using adverse outcome pathways to contextualise (Q)SAR predictions for reproductive toxicity – A case study with aromatase inhibition. Reprod Toxicol 2022; 108:43-55. [DOI: 10.1016/j.reprotox.2022.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 01/14/2022] [Accepted: 01/21/2022] [Indexed: 12/22/2022]
|
11
|
Dekina S. QSAR analysis of the effect of metal ions on the peptidase activity of Bacillus thuringiensis var. israelensis IMV B-7465. INNOVATIVE BIOSYSTEMS AND BIOENGINEERING 2021. [DOI: 10.20535/ibb.2021.5.4.243373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Background. The catalytic activity of enzymes, which is their most important characteristic, can change significantly under the influence of effectors, for example, metal ions, and is the subject of special studies that are important for biochemistry, biotechnology, medicine, and other branches of science. Usually, the activity of enzymes in the presence of metals is assessed by the change in the rate of the enzymatic reaction. However, conducting similar experimental studies, especially for new enzymes, as in the case of peptidase Bacillus thuringiensis var. israelensis IMV B-7465, requires significant resources and extensive kinetic research. Therefore, it is advisable to use the methods of computer chemistry, the basic task of which is to search for the structure-property relationship, to build a model that can, with a high degree of probability, assess the effect of metal ions on the activity of peptidase.
Objective: Objective: to develop of QSAR models to analyze and prediction the effect of metal ions on the activity of peptidase Bacillus thuringiensis var. israelensis IMV B-7465.
Methods: the effect of metal ions was studied by determining the proteolytic activity of peptidase after joint incubation for 30 min in 0.0167 M Tris-HCl buffer solution (pH 7.5, 37 ° C). Final concentration of metal chlorides Li +; Na +; K +; Cs +; Cu2 +; Be2 +; Mg2 +; Ca2 +; Sr2 +; Ba2 +; Zn2 +; Cd2 +; Hg2 +; Cr3 +; Mn2 +; Co2 +; Ni2 + in the buffer solution was 4 mmol / dm3. To search for the quantitative “structure-property” relationship we used the reference data on the properties of metal ions and trend vector and random forest methods.
Results: the effect of metal ions on the proteolytic activity of peptidase Bacillus thuringiensis var. israelensis IMV B-7465, some metal ions (Li +, Mn2 +, and Co2 +) activated peptidase, while others (Cu2 +, Be2 +, Cd2 +, Hg2 +, Cr3 +) inhibited the enzyme activity. Adequate statistical models without classification errors and prediction errors for the test set were constructed by nonlinear methods of trend-vector and random forest. Both models show that the most important characteristics of metal ions that affect enzyme activity are electronegativity (ENPol), first ionization potential (IP1), the entropy of ions in aqueous solution (S) and the electron affinity energy (Eae).
Conclusions: methods of QSAR analysis in combination with nonlinear methods of trend vector and random forest allow to adequately describe the influence of metal ions on the activity of peptidase Bacillus thuringiensis var. israelensis IMV B-7465 due to descriptors that reflect a certain balance of their electron-donor and electron-acceptor properties (electronegativity, first ionization potential, electron affinity energy) and the degree of the hydrate shell structurization (entropy of solvation). Both statistical methods give similar values of the importance of descriptors, but only the trend vector method allows to analyze the direction of influence of specific characteristics of ions.
Collapse
|
12
|
Mater AC, Coote ML. Explainable Molecular Sets: Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules. J Chem Inf Model 2021; 61:4877-4889. [PMID: 34636543 DOI: 10.1021/acs.jcim.1c00519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years.
Collapse
Affiliation(s)
- Adam C Mater
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, Australia
| | - Michelle L Coote
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, Australia
| |
Collapse
|
13
|
Machine Learning augmented docking studies of aminothioureas at the SARS-CoV-2-ACE2 interface. PLoS One 2021; 16:e0256834. [PMID: 34499662 PMCID: PMC8428716 DOI: 10.1371/journal.pone.0256834] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 08/16/2021] [Indexed: 11/19/2022] Open
Abstract
The current pandemic outbreak clearly indicated the urgent need for tools allowing fast predictions of bioactivity of a large number of compounds, either available or at least synthesizable. In the computational chemistry toolbox, several such tools are available, with the main ones being docking and structure-activity relationship modeling either by classical linear QSAR or Machine Learning techniques. In this contribution, we focus on the comparison of the results obtained using different docking protocols on the example of the search for bioactivity of compounds containing N-N-C(S)-N scaffold at the S-protein of SARS-CoV-2 virus with ACE2 human receptor interface. Based on over 1800 structures in the training set we have predicted binding properties of the complete set of nearly 600000 structures from the same class using the Machine Learning Random Forest Regressor approach.
Collapse
|
14
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
15
|
Ding J, Xu N, Nguyen MT, Qiao Q, Shi Y, He Y, Shao Q. Machine learning for molecular thermodynamics. Chin J Chem Eng 2021. [DOI: 10.1016/j.cjche.2020.10.044] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
16
|
Sato A, Miyao T, Jasial S, Funatsu K. Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations. J Comput Aided Mol Des 2021; 35:179-193. [PMID: 33392949 DOI: 10.1007/s10822-020-00361-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 11/12/2020] [Indexed: 11/27/2022]
Abstract
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets.
Collapse
Affiliation(s)
- Akinori Sato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Swarit Jasial
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Kimito Funatsu
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan.
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo. Bunkyo-ku, Tokyo, 113-8656, Japan.
| |
Collapse
|
17
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
18
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
19
|
Cortés-Ciriano I, Škuta C, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction. J Cheminform 2020; 12:41. [PMID: 33431016 PMCID: PMC7339533 DOI: 10.1186/s13321-020-00444-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 01/22/2023] Open
Abstract
Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| | - Ctibor Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Daniel Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| |
Collapse
|
20
|
Mozafari Z, Arab Chamjangali M, Beglari M, Doosti R. The efficiency of ligand-receptor interaction information alone as new descriptors in QSAR modeling via random forest artificial neural network. Chem Biol Drug Des 2020; 96:812-824. [PMID: 32259386 DOI: 10.1111/cbdd.13690] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 02/15/2020] [Accepted: 03/15/2020] [Indexed: 11/28/2022]
Abstract
A new approach is introduced for the construction of a predictive quantitative structure-activity relationship model in which only ligand-receptor (LR) interaction features are used as relevant descriptors. This approach combines the benefit of the random forest (RF) as a new variable selection method with the intrinsic capability of the artificial neural network (ANN). The interaction information of the ligand-receptor (LR) complex was used as molecular docking descriptors. The most relevant descriptors were selected using the RF technique and used as inputs of ANN. The proposed RF ANN (RF-LM-ANN) method was optimized and then evaluated by the prediction of pEC50 for some of the azine derivatives as non-nucleoside reverse transcriptase inhibitors. RF-LM-ANN model under the optimal conditions was evaluated using internal (validation) and external test sets. The determination coefficients of the external test and validation sets were 0.88 and 0.89, respectively. The mean square deviation (MSE) values for the prediction of biological activities in the external test and validation sets were found to be 0.10 and 0.11, respectively. The results obtained demonstrated the good prediction ability and high generalizability of the proposed RF-LM-ANN model based on the MMDs alone.
Collapse
Affiliation(s)
- Zeinab Mozafari
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| | | | - Mozhgan Beglari
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| | - Rahele Doosti
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| |
Collapse
|
21
|
Insights into features and lead optimization of novel type 1½ inhibitors of p38α mitogen-activated protein kinase using QSAR, quantum mechanics, bioisostere replacement and ADMET studies. RESULTS IN CHEMISTRY 2020. [DOI: 10.1016/j.rechem.2020.100044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
22
|
Majumdar S, Basak SC, Lungu CN, Diudea MV, Grunwald GD. Finding Needles in a Haystack: Determining Key Molecular Descriptors Associated with the Blood-brain Barrier Entry of Chemical Compounds Using Machine Learning. Mol Inform 2019; 38:e1800164. [PMID: 31322827 DOI: 10.1002/minf.201800164] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 04/11/2019] [Indexed: 12/23/2022]
Abstract
In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.
Collapse
Affiliation(s)
- Subhabrata Majumdar
- University of Florida Informatics Institute, 432 Newell Dr, CISE Bldg E251, Gainesville, FL 32611, USA.,Currently at: AT&T Labs Research
| | - Subhash C Basak
- Department of Chemistry and Biochemistry, University of Minnesota, 246 Chemistry Building, 1039 University Drive, Duluth, MN 55812, USA
| | - Claudiu N Lungu
- Department of Chemistry, Babes-Bolyai University, Strada Arany János 11, Cluj-Napoca, 400028, Romania
| | - Mircea V Diudea
- Department of Chemistry, Babes-Bolyai University, Strada Arany János 11, Cluj-Napoca, 400028, Romania
| | - Gregory D Grunwald
- Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Highway, Duluth, MN 55811, USA
| |
Collapse
|
23
|
Understanding Collective Human Mobility Spatiotemporal Patterns on Weekdays from Taxi Origin-Destination Point Data. SENSORS 2019; 19:s19122812. [PMID: 31238525 PMCID: PMC6630456 DOI: 10.3390/s19122812] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 06/15/2019] [Accepted: 06/19/2019] [Indexed: 11/17/2022]
Abstract
With the availability of large geospatial datasets, the study of collective human mobility spatiotemporal patterns provides a new way to explore urban spatial environments from the perspective of residents. In this paper, we constructed a classification model for mobility patterns that is suitable for taxi OD (Origin-Destination) point data, and it is comprised of three parts. First, a new aggregate unit, which uses a road intersection as the constraint condition, is designed for the analysis of the taxi OD point data. Second, the time series similarity measurement is improved by adding a normalization procedure and time windows to address the particular characteristics of the taxi time series data. Finally, the DBSCAN algorithm is used to classify the time series into different mobility patterns based on a proximity index that is calculated using the improved similarity measurement. In addition, we used the random forest algorithm to establish a correlation model between the mobility patterns and the regional functional characteristics. Based on the taxi OD point data from Nanjing, we delimited seven mobility patterns and illustrated that the regional functions have obvious driving effects on these mobility patterns. These findings are applicable to urban planning, traffic management and planning, and land use analyses in the future.
Collapse
|
24
|
Pal R, Jana G, Sural S, Chattaraj PK. Hydrophobicity versus electrophilicity: A new protocol toward quantitative structure-toxicity relationship. Chem Biol Drug Des 2018; 93:1083-1095. [DOI: 10.1111/cbdd.13428] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 10/08/2018] [Accepted: 10/18/2018] [Indexed: 11/30/2022]
Affiliation(s)
- Ranita Pal
- Department of Chemistry and Center for Theoretical Studies; Indian Institute of Technology Kharagpur; Kharagpur India
| | - Gourhari Jana
- Department of Chemistry and Center for Theoretical Studies; Indian Institute of Technology Kharagpur; Kharagpur India
| | - Shamik Sural
- Department of Computer Science and Engineering; Indian Institute of Technology Kharagpur; Kharagpur India
| | - Pratim Kumar Chattaraj
- Department of Chemistry and Center for Theoretical Studies; Indian Institute of Technology Kharagpur; Kharagpur India
- Department of Chemistry; Indian Institute of Technology Bombay; Powai, Mumbai India
| |
Collapse
|
25
|
Machine Learning Reveals Missing Edges and Putative Interaction Mechanisms in Microbial Ecosystem Networks. mSystems 2018; 3:mSystems00181-18. [PMID: 30417106 PMCID: PMC6208640 DOI: 10.1128/msystems.00181-18] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 09/26/2018] [Indexed: 12/21/2022] Open
Abstract
Microbes affect each other's growth in multiple, often elusive, ways. The ensuing interdependencies form complex networks, believed to reflect taxonomic composition as well as community-level functional properties and dynamics. The elucidation of these networks is often pursued by measuring pairwise interactions in coculture experiments. However, the combinatorial complexity precludes an exhaustive experimental analysis of pairwise interactions, even for moderately sized microbial communities. Here, we used a machine learning random forest approach to address this challenge. In particular, we show how partial knowledge of a microbial interaction network, combined with trait-level representations of individual microbial species, can provide accurate inference of missing edges in the network and putative mechanisms underlying the interactions. We applied our algorithm to three case studies: an experimentally mapped network of interactions between auxotrophic Escherichia coli strains, a community of soil microbes, and a large in silico network of metabolic interdependencies between 100 human gut-associated bacteria. For this last case, 5% of the network was sufficient to predict the remaining 95% with 80% accuracy, and the mechanistic hypotheses produced by the algorithm accurately reflected known metabolic exchanges. Our approach, broadly applicable to any microbial or other ecological network, may drive the discovery of new interactions and new molecular mechanisms, both for therapeutic interventions involving natural communities and for the rational design of synthetic consortia. IMPORTANCE Different organisms in a microbial community may drastically affect each other's growth phenotypes, significantly affecting the community dynamics, with important implications for human and environmental health. Novel culturing methods and the decreasing costs of sequencing will gradually enable high-throughput measurements of pairwise interactions in systematic coculturing studies. However, a thorough characterization of all interactions that occur within a microbial community is greatly limited both by the combinatorial complexity of possible assortments and by the limited biological insight that interaction measurements typically provide without laborious specific follow-ups. Here, we show how a simple and flexible formal representation of microbial pairs can be used for the classification of interactions via machine learning. The approach we propose predicts with high accuracy the outcome of yet-to-be performed experiments and generates testable hypotheses about the mechanisms of specific interactions.
Collapse
|
26
|
Cardoso‐Silva J, Papadatos G, Papageorgiou LG, Tsoka S. Optimal Piecewise Linear Regression Algorithm for QSAR Modelling. Mol Inform 2018; 38:e1800028. [DOI: 10.1002/minf.201800028] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]
Affiliation(s)
- Jonathan Cardoso‐Silva
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| | - George Papadatos
- European Molecular Biology Laboratory – European Bioinformatics InstituteWellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK
- GlaxoSmithKline Gunnels Wood Road Stevenage, Hertfordshire SG1 2NY UK
| | - Lazaros G. Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical EngineeringUniversity College London Torrington Place London WC1E 7JE UK
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| |
Collapse
|
27
|
Ghasemi F, Mehridehnavi A, Fassihi A, Pérez-Sánchez H. Deep neural network in QSAR studies using deep belief network. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.09.040] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
28
|
Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017; 57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and
Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
| |
Collapse
|
29
|
Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets. J Chem Inf Model 2017; 57:1773-1792. [PMID: 28715209 DOI: 10.1021/acs.jcim.6b00753] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
Collapse
Affiliation(s)
- Richard L Marchese Robinson
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom.,School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University , James Parsons Building, Byrom Street, Liverpool L3 3AF, United Kingdom
| | - Anna Palczewska
- Department of Computing, University of Bradford , Bradford BD7 1DP, United Kingdom
| | - Jan Palczewski
- School of Mathematics, University of Leeds , Leeds LS2 9JT, United Kingdom
| | - Nathan Kidley
- Syngenta Ltd., Jealott's Hill International Research Centre , Bracknell, Berkshire RG42 6EY, United Kingdom
| |
Collapse
|
30
|
Cassano A, Robinson RLM, Palczewska A, Puzyn T, Gajewicz A, Tran L, Manganelli S, Cronin MT. Comparing the CORAL and Random Forest Approaches for Modelling the In Vitro Cytotoxicity of Silica Nanomaterials. Altern Lab Anim 2016; 44:533-556. [DOI: 10.1177/026119291604400603] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Nanotechnology is one of the most important technological developments of the 21st century. In silico methods to predict toxicity, such as quantitative structure–activity relationships (QSARs), promote the safe-by-design approach for the development of new materials, including nanomaterials. In this study, a set of cytotoxicity experimental data corresponding to 19 data points for silica nanomaterials were investigated, to compare the widely employed CORAL and Random Forest approaches in terms of their usefulness for developing so-called ‘nano-QSAR’ models. ‘External’ leave-one-out cross-validation (LOO) analysis was performed, to validate the two different approaches. An analysis of variable importance measures and signed feature contributions for both algorithms was undertaken, in order to interpret the models developed. CORAL showed a more pronounced difference between the average coefficient of determination (R2) for training and for LOO (0.83 and 0.65 for training and LOO, respectively), compared to Random Forest (0.87 and 0.78 without bootstrap sampling, 0.90 and 0.78 with bootstrap sampling), which may be due to overfitting. With regard to the physicochemical properties of the nanomaterials, the aspect ratio and zeta potential were found to be the two most important variables for Random Forest, and the average feature contributions calculated for the corresponding descriptors were consistent with the clear trends observed in the data set: less negative zeta potential values and lower aspect ratio values were associated with higher cytotoxicity. In contrast, CORAL failed to capture these trends.
Collapse
Affiliation(s)
- Antonio Cassano
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| | | | | | - Tomasz Puzyn
- Laboratory of Environmental Chemistry, University of Gdansk, Gdansk, Poland
| | - Agnieszka Gajewicz
- Laboratory of Environmental Chemistry, University of Gdansk, Gdansk, Poland
| | - Lang Tran
- Institute of Occupational Medicine, Edinburgh, Midlothian, UK
| | | | - Mark T.D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
31
|
Venkatraman V, Alsberg BK. Quantitative structure-property relationship modelling of thermal decomposition temperatures of ionic liquids. J Mol Liq 2016. [DOI: 10.1016/j.molliq.2016.08.023] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
Alves V, Muratov E, Capuzzi S, Politi R, Low Y, Braga R, Zakharov AV, Sedykh A, Mokshyna E, Farag S, Andrade C, Kuz'min V, Fourches D, Tropsha A. Alarms about structural alerts. GREEN CHEMISTRY : AN INTERNATIONAL JOURNAL AND GREEN CHEMISTRY RESOURCE : GC 2016; 18:4348-4360. [PMID: 28503093 PMCID: PMC5423727 DOI: 10.1039/c6gc01492e] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Structural alerts are widely accepted in chemical toxicology and regulatory decision support as a simple and transparent means to flag potential chemical hazards or group compounds into categories for read-across. However, there has been a growing concern that alerts disproportionally flag too many chemicals as toxic, which questions their reliability as toxicity markers. Conversely, the rigorously developed and properly validated statistical QSAR models can accurately and reliably predict the toxicity of a chemical; however, their use in regulatory toxicology has been hampered by the lack of transparency and interpretability. We demonstrate that contrary to the common perception of QSAR models as "black boxes" they can be used to identify statistically significant chemical substructures (QSAR-based alerts) that influence toxicity. We show through several case studies, however, that the mere presence of structural alerts in a chemical, irrespective of the derivation method (expert-based or QSAR-based), should be perceived only as hypotheses of possible toxicological effect. We propose a new approach that synergistically integrates structural alerts and rigorously validated QSAR models for a more transparent and accurate safety assessment of new chemicals.
Collapse
Affiliation(s)
- Vinicius Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Eugene Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Chemical Technology, Odessa National Polytechnic University, Odessa, 65000, Ukraine
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Regina Politi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Yen Low
- Netflix, San Francisco, CA 94123, USA
| | - Rodolpho Braga
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, MD 20850, USA
| | | | - Elena Mokshyna
- Laboratory of Theoretical Chemistry, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Carolina Andrade
- Laboratory for Molecular Modeling and Design, Department of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Victor Kuz'min
- Laboratory of Theoretical Chemistry, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Department of Chemistry and Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
33
|
Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V. Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J Chem Inf Model 2016; 56:1455-69. [DOI: 10.1021/acs.jcim.6b00371] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute
of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Oleg Tinkov
- T. G. Shevchenko Transdniestria State University, ul. 25 Oktyabrya 107, 3300 Tiraspol, Transdniestria, Republic of Moldova
| | - Tatiana Khristova
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
- Laboratoire
de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1 rue Blaise Pascal, 67000 Strasbourg, France
| | - Ludmila Ognichenko
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Anna Kosinskaya
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| | - Alexandre Varnek
- Laboratoire
de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1 rue Blaise Pascal, 67000 Strasbourg, France
- Laboratory
of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry, Kazan Federal University, Kremlevskaya 18, Kazan, Russia
| | - Victor Kuz’min
- A. V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine, Lustdorfskaya
doroga 86, 65080 Odessa, Ukraine
| |
Collapse
|
34
|
Yilmaz H, Sizochenko N, Rasulev B, Toropov A, Guzel Y, Kuz'min V, Leszczynska D, Leszczynski J. Amino substituted nitrogen heterocycle ureas as kinase insert domain containing receptor (KDR) inhibitors: Performance of structure–activity relationship approaches. J Food Drug Anal 2015; 23:168-175. [PMID: 28911371 PMCID: PMC9351780 DOI: 10.1016/j.jfda.2015.03.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A quantitative structure–activity relationship (QSAR) study was performed on a set of amino-substituted nitrogen heterocyclic urea derivatives. Two novel approaches were applied: (1) the simplified molecular input-line entry systems (SMILES) based optimal descriptors approach; and (2) the fragment-based simplex representation of molecular structure (SiRMS) approach. Comparison with the classic scheme of building up the model and balance of correlation (BC) for optimal descriptors approach shows that the BC scheme provides more robust predictions than the classic scheme for the considered pIC50 of the heterocyclic urea derivatives. Comparison of the SMILES-based optimal descriptors and SiRMS approaches has confirmed good performance of both techniques in prediction of kinase insert domain containing receptor (KDR) inhibitory activity, expressed as a logarithm of inhibitory concentration (pIC50) of studied compounds.
Collapse
Affiliation(s)
- Hayriye Yilmaz
- Kayseri Vocational School, Biomedical Devices and Technologies, Erciyes University, 38039, Kayseri, Turkey; Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA
| | - Natalia Sizochenko
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA; Odessa I.I. Mechnikov National University, Department of Chemistry, Dvoryanskaya Street, 2, 65082, Odessa, Ukraine
| | - Bakhtiyor Rasulev
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA
| | - Andrey Toropov
- Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, 20156, Via La Masa 19, Milano, Italy
| | - Yahya Guzel
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey
| | - Viktor Kuz'min
- Odessa I.I. Mechnikov National University, Department of Chemistry, Dvoryanskaya Street, 2, 65082, Odessa, Ukraine
| | - Danuta Leszczynska
- Department of Civil and Environmental Engineering, Jackson State University, Jackson, MS, 39217, USA
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA.
| |
Collapse
|
35
|
Welling SH, Clemmensen LKH, Buckley ST, Hovgaard L, Brockhoff PB, Refsgaard HHF. In silico modelling of permeation enhancement potency in Caco-2 monolayers based on molecular descriptors and random forest. Eur J Pharm Biopharm 2015; 94:152-9. [PMID: 26004819 DOI: 10.1016/j.ejpb.2015.05.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 05/14/2015] [Accepted: 05/17/2015] [Indexed: 10/23/2022]
Abstract
Structural traits of permeation enhancers are important determinants of their capacity to promote enhanced drug absorption. Therefore, in order to obtain a better understanding of structure-activity relationships for permeation enhancers, a Quantitative Structural Activity Relationship (QSAR) model has been developed. The random forest-QSAR model was based upon Caco-2 data for 41 surfactant-like permeation enhancers from Whitehead et al. (2008) and molecular descriptors calculated from their structure. The QSAR model was validated by two test-sets: (i) an eleven compound experimental set with Caco-2 data and (ii) nine compounds with Caco-2 data from literature. Feature contributions, a recent developed diagnostic tool, was applied to elucidate the contribution of individual molecular descriptors to the predicted potency. Feature contributions provided easy interpretable suggestions of important structural properties for potent permeation enhancers such as segregation of hydrophilic and lipophilic domains. Focusing on surfactant-like properties, it is possible to model the potency of the complex pharmaceutical excipients, permeation enhancers. For the first time, a QSAR model has been developed for permeation enhancement. The model is a valuable in silico approach for both screening of new permeation enhancers and physicochemical optimisation of surfactant enhancer systems.
Collapse
Affiliation(s)
- Søren H Welling
- Global Research, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark; Technical University of Denmark, DTU Compute, 2800 Kgs. Lyngby, Denmark
| | | | - Stephen T Buckley
- Global Research, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark
| | - Lars Hovgaard
- Global Research, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark
| | - Per B Brockhoff
- Technical University of Denmark, DTU Compute, 2800 Kgs. Lyngby, Denmark
| | | |
Collapse
|
36
|
Prediction of binding affinity and efficacy of thyroid hormone receptor ligands using QSAR and structure-based modeling methods. Toxicol Appl Pharmacol 2014; 280:177-89. [PMID: 25058446 DOI: 10.1016/j.taap.2014.07.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 07/10/2014] [Accepted: 07/11/2014] [Indexed: 12/12/2022]
Abstract
The thyroid hormone receptor (THR) is an important member of the nuclear receptor family that can be activated by endocrine disrupting chemicals (EDC). Quantitative Structure-Activity Relationship (QSAR) models have been developed to facilitate the prioritization of THR-mediated EDC for the experimental validation. The largest database of binding affinities available at the time of the study for ligand binding domain (LBD) of THRβ was assembled to generate both continuous and classification QSAR models with an external accuracy of R(2)=0.55 and CCR=0.76, respectively. In addition, for the first time a QSAR model was developed to predict binding affinities of antagonists inhibiting the interaction of coactivators with the AF-2 domain of THRβ (R(2)=0.70). Furthermore, molecular docking studies were performed for a set of THRβ ligands (57 agonists and 15 antagonists of LBD, 210 antagonists of the AF-2 domain, supplemented by putative decoys/non-binders) using several THRβ structures retrieved from the Protein Data Bank. We found that two agonist-bound THRβ conformations could effectively discriminate their corresponding ligands from presumed non-binders. Moreover, one of the agonist conformations could discriminate agonists from antagonists. Finally, we have conducted virtual screening of a chemical library compiled by the EPA as part of the Tox21 program to identify potential THRβ-mediated EDCs using both QSAR models and docking. We concluded that the library is unlikely to have any EDC that would bind to the THRβ. Models developed in this study can be employed either to identify environmental chemicals interacting with the THR or, conversely, to eliminate the THR-mediated mechanism of action for chemicals of concern.
Collapse
|
37
|
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014; 57:4977-5010. [PMID: 24351051 PMCID: PMC4074254 DOI: 10.1021/jm4004285] [Citation(s) in RCA: 1106] [Impact Index Per Article: 100.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Collapse
Affiliation(s)
- Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, V6H3Z6, Canada
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | - Denis Fourches
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Alexandre Varnek
- Department of Chemistry, L. Pasteur University of Strasbourg, Strasbourg, 67000, France
| | - Igor I. Baskin
- Department of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mark Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - John Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L33AF, UK
| | - Paola Gramatica
- Department of Structural and Functional Biology, University of Insubria, Varese, 21100, Italy
| | | | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, 20126, Italy
| | - Victor E. Kuz'min
- Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Odessa, 65080, Ukraine
| | | | - Romualdo Benigni
- Environment and Health Department, Istituto Superiore di Sanita’, Rome, 00161, Italy
| | | | - James Rathman
- Altamira LLC, Columbus OH 43235, USA
- Department of Chemical and Biomolecular Engineering, the Ohio State University, Columbus, OH 43215, USA
| | | | | | - Ann Richard
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27519, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| |
Collapse
|
38
|
Mitchell JBO. Machine learning methods in chemoinformatics. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014; 4:468-481. [PMID: 25285160 PMCID: PMC4180928 DOI: 10.1002/wcms.1183] [Citation(s) in RCA: 249] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure-activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers.
Collapse
|
39
|
Palczewska A, Palczewski J, Marchese Robinson R, Neagu D. Interpreting Random Forest Classification Models Using a Feature Contribution Method. INTEGRATION OF REUSABLE SYSTEMS 2014. [DOI: 10.1007/978-3-319-04717-1_9] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
40
|
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Mol Inform 2013; 32:843-53. [DOI: 10.1002/minf.201300029] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 07/29/2013] [Indexed: 11/07/2022]
|
41
|
Marcou G, Horvath D, Solov'ev V, Arrault A, Vayer P, Varnek A. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions. Mol Inform 2012; 31:639-42. [DOI: 10.1002/minf.201100136] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2011] [Accepted: 05/29/2012] [Indexed: 01/22/2023]
|
42
|
Ognichenko LN, Kuz'min VE, Gorb L, Hill FC, Artemenko AG, Polischuk PG, Leszczynski J. QSPR Prediction of Lipophilicity for Organic Compounds Using Random Forest Technique on the Basis of Simplex Representation of Molecular Structure. Mol Inform 2012; 31:273-80. [PMID: 27477097 DOI: 10.1002/minf.201100102] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 02/05/2012] [Indexed: 11/08/2022]
Abstract
The relationship between the octanol-water partition coefficient for more than twelve thousand organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset used in our study included 10973 compounds with experimental values of lipophilicity (LogKow ) for different chemical compounds. Random Forest (RF) method was used for statistical modeling at the 2D level of representation of molecular structure. Developed models are adequate and successfully validated with external test sets. Proposed models have clear interpretation due to the use of simplex representation of molecular structure and predict the LogKow values with the accuracy of the best modern models. Thus QSPR models proposed in this study represent powerful and easy-to use virtual screening tool that can be recommended for prediction of octanol-water partition coefficient.
Collapse
Affiliation(s)
- Liudmyla N Ognichenko
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute, National Academy of Science of Ukraine, Ukraine, Odessa, 65080, Lustdorfskaya Doroga 86
| | - Victor E Kuz'min
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute, National Academy of Science of Ukraine, Ukraine, Odessa, 65080, Lustdorfskaya Doroga 86
| | - Leonid Gorb
- Badger Technical Services, LLC, Vicksburg, Mississippi, USA
| | - Frances C Hill
- US Army ERDC, 3532 Manor Dr, Vicksburg, Mississippi, 39180, USA
| | - Anatoly G Artemenko
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute, National Academy of Science of Ukraine, Ukraine, Odessa, 65080, Lustdorfskaya Doroga 86
| | - Pavel G Polischuk
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute, National Academy of Science of Ukraine, Ukraine, Odessa, 65080, Lustdorfskaya Doroga 86
| | - Jerzy Leszczynski
- US Army ERDC, 3532 Manor Dr, Vicksburg, Mississippi, 39180, USA. .,Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, Mississippi, 39217, USA.
| |
Collapse
|
43
|
Powerful Integrative Tool Combining Structure Generator and Chemical Space Visualization. JOURNAL OF COMPUTER AIDED CHEMISTRY 2012. [DOI: 10.2751/jcac.13.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|