1
|
Grassano JS, Pickering I, Roitberg AE, González Lebrero MC, Estrin DA, Semelak JA. Assessment of Embedding Schemes in a Hybrid Machine Learning/Classical Potentials (ML/MM) Approach. J Chem Inf Model 2024; 64:4047-4058. [PMID: 38710065 DOI: 10.1021/acs.jcim.4c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Machine learning (ML) methods have reached high accuracy levels for the prediction of in vacuo molecular properties. However, the simulation of large systems solely through ML methods (such as those based on neural network potentials) is still a challenge. In this context, one of the most promising frameworks for integrating ML schemes in the simulation of complex molecular systems are the so-called ML/MM methods. These multiscale approaches combine ML methods with classical force fields (MM), in the same spirit as the successful hybrid quantum mechanics-molecular mechanics methods (QM/MM). The key issue for such ML/MM methods is an adequate description of the coupling between the region of the system described by ML and the region described at the MM level. In the context of QM/MM schemes, the main ingredient of the interaction is electrostatic, and the state of the art is the so-called electrostatic-embedding. In this study, we analyze the quality of simpler mechanical embedding-based approaches, specifically focusing on their application within a ML/MM framework utilizing atomic partial charges derived in vacuo. Taking as reference electrostatic embedding calculations performed at a QM(DFT)/MM level, we explore different atomic charges schemes, as well as a polarization correction computed using atomic polarizabilites. Our benchmark data set comprises a set of about 80k small organic structures from the ANI-1x and ANI-2x databases, solvated in water. The results suggest that the minimal basis iterative stockholder (MBIS) atomic charges yield the best agreement with the reference coupling energy. Remarkable enhancements are achieved by including a simple polarization correction.
Collapse
Affiliation(s)
- Juan S Grassano
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Ignacio Pickering
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Adrian E Roitberg
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Mariano C González Lebrero
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Dario A Estrin
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Jonathan A Semelak
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
2
|
Cao X, Luo W, Liu H. A prediction model for CO 2/CO adsorption performance on binary alloys based on machine learning. RSC Adv 2024; 14:12235-12246. [PMID: 38628487 PMCID: PMC11019484 DOI: 10.1039/d4ra00710g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the rapid development of computational methods, including density functional theory (DFT), predicting the performance of a catalytic material merely based on its atomic arrangements remains challenging. Although quantum mechanics-based methods can model 'real' materials with dopants, grain boundaries, and interfaces with acceptable accuracy, the high demand for computational resources no longer meets the needs of modern scientific research. On the other hand, Machine Learning (ML) method can accelerate the screening of alloy-based catalytic materials. In this study, an ML model was developed to predict the CO2 and CO adsorption affinity on single-atom doped binary alloys based on the thermochemical properties of component metals. By using a greedy algorithm, the best combination of features was determined, and the ML model was trained and verified based on a data set containing 78 alloys on which the adsorption energy values of CO2 and CO were calculated from DFT. Comparison between predicted and DFT calculated adsorption energy values suggests that the extreme gradient boosting (XGBoost) algorithm has excellent generalization performance, and the R-squared (R2) for CO2 and CO adsorption energy prediction are 0.96 and 0.91, respectively. The errors of predicted adsorption energy are 0.138 eV and 0.075 eV for CO2 and CO, respectively. This model can be expected to advance our understanding of structure-property relationships at the fundamental level and be used in large-scale screening of alloy-based catalysts.
Collapse
Affiliation(s)
- Xiaofeng Cao
- School of Chemistry and Chemical Engineering, Southwest Petroleum University Chengdu 610500 P. R. China
| | - Wenjia Luo
- School of Chemistry and Chemical Engineering, Southwest Petroleum University Chengdu 610500 P. R. China
| | - Huimin Liu
- School of Chemistry and Chemical Engineering, Southwest Petroleum University Chengdu 610500 P. R. China
| |
Collapse
|
3
|
Mohanty S, Stevenson J, Browning AR, Jacobson L, Leswing K, Halls MD, Afzal MAF. Development of scalable and generalizable machine learned force field for polymers. Sci Rep 2023; 13:17251. [PMID: 37821501 PMCID: PMC10567837 DOI: 10.1038/s41598-023-43804-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 09/28/2023] [Indexed: 10/13/2023] Open
Abstract
Understanding and predicting the properties of polymers is vital to developing tailored polymer molecules for desired applications. Classical force fields may fail to capture key properties, for example, the transport properties of certain polymer systems such as polyethylene glycol. As a solution, we present an alternative potential energy surface, a charge recursive neural network (QRNN) model trained on DFT calculations made on smaller atomic clusters that generalizes well to oligomers comprising larger atomic clusters or longer chains. We demonstrate the validity of the polymer QRNN workflow by modeling the oligomers of ethylene glycol. We apply two rounds of active learning (addition of new training clusters based on current model performance) and implement a novel model training approach that uses partial charges from a semi-empirical method. Our developed QRNN model for polymers produces stable molecular dynamics (MD) simulation trajectory and captures the dynamics of polymer chains as indicated by the striking agreement with experimental values. Our model allows working on much larger systems than allowed by DFT simulations, at the same time providing a more accurate force field than classical force fields which provides a promising avenue for large-scale molecular simulations of polymeric systems.
Collapse
|
4
|
Wang N, Zhang Y, Wang W, Ye Z, Chen H, Hu G, Ouyang D. How can machine learning and multiscale modeling benefit ocular drug development? Adv Drug Deliv Rev 2023; 196:114772. [PMID: 36906232 DOI: 10.1016/j.addr.2023.114772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/06/2023] [Accepted: 03/05/2023] [Indexed: 03/12/2023]
Abstract
The eyes possess sophisticated physiological structures, diverse disease targets, limited drug delivery space, distinctive barriers, and complicated biomechanical processes, requiring a more in-depth understanding of the interactions between drug delivery systems and biological systems for ocular formulation development. However, the tiny size of the eyes makes sampling difficult and invasive studies costly and ethically constrained. Developing ocular formulations following conventional trial-and-error formulation and manufacturing process screening procedures is inefficient. Along with the popularity of computational pharmaceutics, non-invasive in silico modeling & simulation offer new opportunities for the paradigm shift of ocular formulation development. The current work first systematically reviews the theoretical underpinnings, advanced applications, and unique advantages of data-driven machine learning and multiscale simulation approaches represented by molecular simulation, mathematical modeling, and pharmacokinetic (PK)/pharmacodynamic (PD) modeling for ocular drug development. Following this, a new computer-driven framework for rational pharmaceutical formulation design is proposed, inspired by the potential of in silico explorations in understanding drug delivery details and facilitating drug formulation design. Lastly, to promote the paradigm shift, integrated in silico methodologies were highlighted, and discussions on data challenges, model practicality, personalized modeling, regulatory science, interdisciplinary collaboration, and talent training were conducted in detail with a view to achieving more efficient objective-oriented pharmaceutical formulation design.
Collapse
Affiliation(s)
- Nannan Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Yunsen Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Wei Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Hongyu Chen
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China; Faculty of Science and Technology (FST), University of Macau, Macau, China
| | - Guanghui Hu
- Faculty of Science and Technology (FST), University of Macau, Macau, China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China; Department of Public Health and Medicinal Administration, Faculty of Health Sciences (FHS), University of Macau, Macau, China.
| |
Collapse
|
5
|
Walden DM, Bundey Y, Jagarapu A, Antontsev V, Chakravarty K, Varshney J. Molecular Simulation and Statistical Learning Methods toward Predicting Drug-Polymer Amorphous Solid Dispersion Miscibility, Stability, and Formulation Design. Molecules 2021; 26:E182. [PMID: 33401494 PMCID: PMC7794704 DOI: 10.3390/molecules26010182] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 12/28/2020] [Accepted: 12/29/2020] [Indexed: 12/20/2022] Open
Abstract
Amorphous solid dispersions (ASDs) have emerged as widespread formulations for drug delivery of poorly soluble active pharmaceutical ingredients (APIs). Predicting the API solubility with various carriers in the API-carrier mixture and the principal API-carrier non-bonding interactions are critical factors for rational drug development and formulation decisions. Experimental determination of these interactions, solubility, and dissolution mechanisms is time-consuming, costly, and reliant on trial and error. To that end, molecular modeling has been applied to simulate ASD properties and mechanisms. Quantum mechanical methods elucidate the strength of API-carrier non-bonding interactions, while molecular dynamics simulations model and predict ASD physical stability, solubility, and dissolution mechanisms. Statistical learning models have been recently applied to the prediction of a variety of drug formulation properties and show immense potential for continued application in the understanding and prediction of ASD solubility. Continued theoretical progress and computational applications will accelerate lead compound development before clinical trials. This article reviews in silico research for the rational formulation design of low-solubility drugs. Pertinent theoretical groundwork is presented, modeling applications and limitations are discussed, and the prospective clinical benefits of accelerated ASD formulation are envisioned.
Collapse
Affiliation(s)
| | | | | | | | | | - Jyotika Varshney
- VeriSIM Life Inc., 1 Sansome St, Suite 3500, San Francisco, CA 94104, USA; (D.M.W.); (Y.B.); (A.J.); (V.A.); (K.C.)
| |
Collapse
|
6
|
Hutchinson ST, Kobayashi R. Solvent-Specific Featurization for Predicting Free Energies of Solvation through Machine Learning. J Chem Inf Model 2019; 59:1338-1346. [DOI: 10.1021/acs.jcim.8b00901] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Samuel T. Hutchinson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Rika Kobayashi
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ANU Supercomputer Facility, Leonard Huxley Bldg 56, Mills Rd, Canberra, ACT 2601, Australia
| |
Collapse
|
7
|
Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme. Sci Rep 2017; 7:40053. [PMID: 28059133 PMCID: PMC5216401 DOI: 10.1038/srep40053] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/30/2016] [Indexed: 01/24/2023] Open
Abstract
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928–0.988, = 0.894–0.954, RMSE = 0.002–0.412, s = 0.001–0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
Collapse
|
8
|
Modeling & Informatics at Vertex Pharmaceuticals Incorporated: our philosophy for sustained impact. J Comput Aided Mol Des 2016; 31:293-300. [DOI: 10.1007/s10822-016-9994-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 11/17/2016] [Indexed: 11/26/2022]
|
9
|
Gillet VJ, Holliday JD, Willett P. Chemoinformatics at the University of Sheffield 2002-2014. Mol Inform 2016; 34:598-607. [PMID: 27490711 DOI: 10.1002/minf.201500004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 03/13/2015] [Indexed: 11/09/2022]
Abstract
This paper summarises work in chemoinformatics carried out in the Information School of the University of Sheffield during the period 2002-2014. Research studies are described on fingerprint-based similarity searching, data fusion, applications of reduced graphs and pharmacophore mapping, and on the School's teaching in chemoinformatics.
Collapse
Affiliation(s)
- Valerie J Gillet
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK
| | - John D Holliday
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK
| | - Peter Willett
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK.
| |
Collapse
|
10
|
Holliday JD, Sani N, Willett P. Calculation of substructural analysis weights using a genetic algorithm. J Chem Inf Model 2015; 55:214-21. [PMID: 25615712 DOI: 10.1021/ci500540s] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This work describes a genetic algorithm for the calculation of substructural analysis for use in ligand-based virtual screening. The algorithm is simple in concept and effective in operation, with simulated virtual screening experiments using the MDDR and WOMBAT data sets showing it to be superior to substructural analysis weights based on a naive Bayesian classifier.
Collapse
Affiliation(s)
- John D Holliday
- Information School, University of Sheffield , 211 Portobello Street, Sheffield S1 4DP, United Kingdom
| | | | | |
Collapse
|
11
|
MacCuish JD, MacCuish NE. Chemoinformatics applications of cluster analysis. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1152] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
12
|
Affiliation(s)
- Peter Willett
- Information School, University of Sheffield, 211 Portobello Street, Sheffield S1 4DP, United Kingdom.
| |
Collapse
|
13
|
Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision. J Cheminform 2011; 3:29. [PMID: 21824430 PMCID: PMC3195112 DOI: 10.1186/1758-2946-3-29] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/08/2011] [Indexed: 11/28/2022] Open
Abstract
Background Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. Results Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. Conclusions Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening.
Collapse
|
14
|
|
15
|
Willett P. Chemoinformatics: a history. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.1] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Peter Willett
- Information School, University of Sheffield, Sheffield S1 4DP, UK
| |
Collapse
|
16
|
Luo X, Krumrine JR, Shenvi AB, Pierson ME, Bernstein PR. Calculation and application of activity discriminants in lead optimization. J Mol Graph Model 2010; 29:372-81. [PMID: 20800520 DOI: 10.1016/j.jmgm.2010.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 07/10/2010] [Accepted: 07/14/2010] [Indexed: 11/18/2022]
Abstract
We present a technique for computing activity discriminants of in vitro (pharmacological, DMPK, and safety) assays and the application to the prediction of in vitro activities of proposed synthetic targets during the lead optimization phase of drug discovery projects. This technique emulates how medicinal chemists perform SAR analysis and activity prediction. The activity discriminants that are functions of 6 commonly used medicinal chemistry descriptors can be interpreted easily by medicinal chemists. Further, visualization with Spotfire allows medicinal chemists to analyze how the query molecule is related to compounds tested previously, and to evaluate easily the relevance of the activity discriminants to the activities of the query molecule. Validation with all compounds synthesized and tested in AstraZeneca Wilmington since 2006 demonstrates that this approach is useful for prioritizing new synthetic targets for synthesis.
Collapse
Affiliation(s)
- Xincai Luo
- Department of Chemistry, AstraZeneca Pharmaceuticals, 1800 Concord Pike, Wilmington, DE 19850, USA.
| | | | | | | | | |
Collapse
|
17
|
Arif SM, Holliday JD, Willett P. Inverse Frequency Weighting of Fragments for Similarity-Based Virtual Screening. J Chem Inf Model 2010; 50:1340-9. [DOI: 10.1021/ci1001235] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Shereena M. Arif
- Faculty of Information Science and Technology, National University of Malaysia, 43600 UKM Bangi, Malaysia and Information School, University of Sheffield, Sheffield S10 2TN, United Kingdom
| | - John D. Holliday
- Faculty of Information Science and Technology, National University of Malaysia, 43600 UKM Bangi, Malaysia and Information School, University of Sheffield, Sheffield S10 2TN, United Kingdom
| | - Peter Willett
- Faculty of Information Science and Technology, National University of Malaysia, 43600 UKM Bangi, Malaysia and Information School, University of Sheffield, Sheffield S10 2TN, United Kingdom
| |
Collapse
|
18
|
Abstract
This chapter reviews the use of molecular fingerprints for chemical similarity searching. The fingerprints encode the presence of 2D substructural fragments in a molecule, and the similarity between a pair of molecules is a function of the number of fragments that they have in common. Although this provides a very simple way of estimating the degree of structural similarity between two molecules, it has been found to provide an effective and an efficient tool for searching large chemical databases. The review describes the historical development of similarity searching since it was first described in the mid-1980s, reviews the many different coefficients, representations, and weightings that can be combined to form a similarity measure, describes quantitative measures of the effectiveness of similarity searching, and concludes by looking at current developments based on the use of data fusion and machine learning techniques.
Collapse
Affiliation(s)
- Peter Willett
- Department of Information Studies, The University of Sheffield, Sheffield, UK
| |
Collapse
|
19
|
Puzyn T, Leszczynska D, Leszczynski J. Toward the development of "nano-QSARs": advances and challenges. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2009; 5:2494-509. [PMID: 19787675 DOI: 10.1002/smll.200900179] [Citation(s) in RCA: 139] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The most significant achievements and challenges relating to an application of quantitative structure-activity relationship (QSAR) approach in the risk assessment of nanometer-sized materials are highlighted. Recent advances are discussed in the context of "classical" QSAR methodology. The possible ways for the structural characterization of compounds existing at the nanoscale (at least one dimension of 100 nm or less) are briefly reviewed. The applicability of the existing toxicological data for developing QSAR models is evaluated. Finally, the existing models are presented. The need to develop new interpretative descriptors for the nanosystems is also highlighted. It is suggested that, due to high variability in the molecular structures and different mechanisms of toxicity, individual classes of nanoparticles should be modeled separately.
Collapse
Affiliation(s)
- Tomasz Puzyn
- Interdisciplinary Nanotoxicity Center, Department of Chemistry, Jackson State University, 1325 Lynch St, Jackson, MS 39217-0510, USA
| | | | | |
Collapse
|
20
|
Gardiner EJ, Gillet VJ, Haranczyk M, Hert J, Holliday JD, Malim N, Patel Y, Willett P. Turbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Min 2009. [DOI: 10.1002/sam.10037] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
21
|
Arif SM, Holliday JD, Willett P. Analysis and use of fragment-occurrence data in similarity-based virtual screening. J Comput Aided Mol Des 2009; 23:655-68. [DOI: 10.1007/s10822-009-9285-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2009] [Accepted: 05/19/2009] [Indexed: 12/01/2022]
|
22
|
Enhancing the Effectiveness of Fingerprint-Based Virtual Screening: Use of Turbo Similarity Searching and of Fragment Frequencies of Occurrence. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-3-642-04031-3_35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
23
|
Willett P. From chemical documentation to chemoinformatics: 50 years of chemical information science. J Inf Sci 2008. [DOI: 10.1177/0165551507084631] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper summarizes the historical development of the discipline that is now called `chemoinformatics'. It shows how this has evolved, principally as a result of technological developments in chemistry and biology during the past decade, from long-established techniques for the modelling and searching of chemical molecules. A total of 30 papers, the earliest dating back to 1957, are briefly summarized to highlight some of the key publications and to show the development of the discipline.
Collapse
|
24
|
Schroeter TS, Schwaighofer A, Mika S, Ter Laak A, Suelzle D, Ganzer U, Heinrich N, Müller KR. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J Comput Aided Mol Des 2007; 21:651-64. [DOI: 10.1007/s10822-007-9160-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 06/11/2007] [Indexed: 11/29/2022]
|
25
|
Schroeter TS, Schwaighofer A, Mika S, Ter Laak A, Suelzle D, Ganzer U, Heinrich N, Müller KR. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J Comput Aided Mol Des 2007; 21:485-98. [PMID: 17632688 DOI: 10.1007/s10822-007-9125-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 06/11/2007] [Indexed: 10/23/2022]
Abstract
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Collapse
|