1
|
Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Collapse
Affiliation(s)
- Andrea Hunklinger
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Peter Hartog
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Martin Šícho
- Leiden Academic Centre for Drug Research, Leiden University, 55 Einsteinweg, 2333 CC Leiden, the Netherlands; CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Guillaume Godin
- dsm-firmenich SA, Rue de la Bergère 7, CH-1242 Satigny, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, DE-85716 Unterschleißheim, Germany.
| |
Collapse
|
2
|
Stienstra CMK, Ieritano C, Haack A, Hopkins WS. Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis. Anal Chem 2023. [PMID: 37384824 DOI: 10.1021/acs.analchem.3c00921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Aqueous solubility, log S, and the water-octanol partition coefficient, log P, are physicochemical properties that are used to screen the viability of drug candidates and to estimate mass transport in the environment. In this work, differential mobility spectrometry (DMS) experiments performed in microsolvating environments are used to train machine learning (ML) frameworks that predict the log S and log P of various molecule classes. In lieu of a consistent source of experimentally measured log S and log P values, the OPERA package was used to evaluate the aqueous solubility and hydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressors and ensemble stacking to derive relationships with a high degree of explainability, as assessed via SHapley Additive exPlanations (SHAP) analysis. The DMS-based regression models returned scores of R2 = 0.67 and RMSE = 1.03 ± 0.10 for log S predictions and R2 = 0.67 and RMSE = 1.20 ± 0.10 for log P after 5-fold random cross-validation. SHAP analysis reveals that the regressors strongly weighted gas-phase clustering in log P correlations. The addition of structural descriptors (e.g., # of aromatic carbons) improved log S predictions to yield RMSE = 0.84 ± 0.07 and R2 = 0.78. Similarly, log P predictions using the same data resulted in an RMSE of 0.83 ± 0.04 and R2 = 0.84. The SHAP analysis of log P models highlights the need for additional experimental parameters describing hydrophobic interactions. These results were achieved with a smaller dataset (333 instances) and minimal structural correlation compared to purely structure-based models, underscoring the value of employing DMS data in predictive models.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Christian Ieritano
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
3
|
Zamora WJ, Viayna A, Pinheiro S, Curutchet C, Bisbal L, Ruiz R, Ràfols C, Luque FJ. Prediction of toluene/water partition coefficients in the SAMPL9 blind challenge: assessment of machine learning and IEF-PCM/MST continuum solvation models. Phys Chem Chem Phys 2023. [PMID: 37376995 DOI: 10.1039/d3cp01428b] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
In recent years the use of partition systems other than the widely used biphasic n-octanol/water has received increased attention to gain insight into the molecular features that dictate the lipophilicity of compounds. Thus, the difference between n-octanol/water and toluene/water partition coefficients has proven to be a valuable descriptor to study the propensity of molecules to form intramolecular hydrogen bonds and exhibit chameleon-like properties that modulate solubility and permeability. In this context, this study reports the experimental toluene/water partition coefficients (log Ptol/w) for a series of 16 drugs that were selected as an external test set in the framework of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenge. This external set has been used by the computational community to calibrate their methods in the current edition (SAMPL9) of this contest. Furthermore, the study also investigates the performance of two computational strategies for the prediction of log Ptol/w. The first relies on the development of two machine learning (ML) models, which are built up by combining the selection of 11 molecular descriptors in conjunction with either the multiple linear regression (MLR) or the random forest regression (RFR) model to target a dataset of 252 experimental log Ptol/w values. The second consists of the parametrization of the IEF-PCM/MST continuum solvation model from B3LYP/6-31G(d) calculations to predict the solvation free energies of 163 compounds in toluene and benzene. The performance of the ML and IEF-PCM/MST models has been calibrated against external test sets, including the compounds that define the SAMPL9 log Ptol/w challenge. The results are used to discuss the merits and weaknesses of the two computational approaches.
Collapse
Affiliation(s)
- William J Zamora
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica
- Advanced Computing Lab (CNCA), National High Technology Center (CeNAT), Pavas, San José, Costa Rica
| | - Antonio Viayna
- Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain.
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
| | - Silvana Pinheiro
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica
| | - Carles Curutchet
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament de Farmàcia i Tecnologia Farmacèutica, i Fisicoquímica, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Joan XXIII 27-31, 08028, Barcelona, Spain
| | - Laia Bisbal
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain.
| | - Rebeca Ruiz
- Pion Inc., Forest Row Business Park, Forest Row RH18 5DW, UK
| | - Clara Ràfols
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain.
| | - F Javier Luque
- Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain.
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
| |
Collapse
|
4
|
Zhu X, Polyakov VR, Bajjuri K, Hu H, Maderna A, Tovee CA, Ward SC. Building Machine Learning Small Molecule Melting Points and Solubility Models Using CCDC Melting Points Dataset. J Chem Inf Model 2023; 63:2948-2959. [PMID: 37125691 DOI: 10.1021/acs.jcim.3c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Predicting solubility of small molecules is a very difficult undertaking due to the lack of reliable and consistent experimental solubility data. It is well known that for a molecule in a crystal lattice to be dissolved, it must, first, dissociate from the lattice and then, second, be solvated. The melting point of a compound is proportional to the lattice energy, and the octanol-water partition coefficient (log P) is a measure of the compound's solvation efficiency. The CCDC's melting point dataset of almost one hundred thousand compounds was utilized to create widely applicable machine learning models of small molecule melting points. Using the general solubility equation, the aqueous thermodynamic solubilities of the same compounds can be predicted. The global model could be easily localized by adding additional melting point measurements for a chemical series of interest.
Collapse
Affiliation(s)
- Xiangwei Zhu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Valery R Polyakov
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Krishna Bajjuri
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Huiyong Hu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Andreas Maderna
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Clare A Tovee
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| | - Suzanna C Ward
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| |
Collapse
|
5
|
Burns MJ, Andrews IX, Baumann JC, Elliott EL, Fennell JW, Kallemeyn JM, Lemaire S, Murphy NS, Palacio M, Raw SA, Roberts AJ, Moura Rocha NF, Schils D, Oestrich RS, Shannon-Little AL, Stevenson N, Talavera P, Teasdale A, Urquhart MW, Waechter F. Establishing Best Practice for the Application and Support of Solubility Purge Factors. Org Process Res Dev 2023. [DOI: 10.1021/acs.oprd.2c00360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Affiliation(s)
| | - Ian X. Andrews
- GlaxoSmithKline, Collegeville, Pennsylvania 19426, United States
| | | | - Eric L. Elliott
- Takeda Pharmaceuticals, Cambridge, Massachusetts 02139, United States
| | | | | | - Sebastien Lemaire
- Johnson & Johnson Pharmaceutical Research and Development, 2340 Beerse, Belgium
| | | | | | - Steven A. Raw
- Pharmaceutical Technology and Development, AstraZeneca, Macclesfield Campus, Charter Way, Macclesfield, Cheshire SK10 2NA, United Kingdom
| | | | | | | | | | | | | | | | - Andrew Teasdale
- Pharmaceutical Technology and Development, AstraZeneca, Macclesfield Campus, Charter Way, Macclesfield, Cheshire SK10 2NA, United Kingdom
| | | | | |
Collapse
|
6
|
Kenney DH, Paffenroth RC, Timko MT, Teixeira AR. Dimensionally reduced machine learning model for predicting single component octanol-water partition coefficients. J Cheminform 2023; 15:9. [PMID: 36658606 PMCID: PMC9854055 DOI: 10.1186/s13321-022-00660-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/25/2022] [Indexed: 01/20/2023] Open
Abstract
MF-LOGP, a new method for determining a single component octanol-water partition coefficients ([Formula: see text]) is presented which uses molecular formula as the only input. Octanol-water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make [Formula: see text] predictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average [Formula: see text] = 0.77 ± 0.007, [Formula: see text] = 0.52 ± 0.003, and [Formula: see text] = 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ([Formula: see text] = 0.42-1.54, [Formula: see text] = 0.09-1.07, and [Formula: see text] = 0.32-0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures.
Collapse
Affiliation(s)
- David H. Kenney
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Randy C. Paffenroth
- grid.268323.e0000 0001 1957 0327Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Michael T. Timko
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| | - Andrew R. Teixeira
- grid.268323.e0000 0001 1957 0327Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA 01609 USA
| |
Collapse
|
7
|
Jia Q, Ni Y, Liu Z, Gu X, Cui Z, Fan M, Zhu Q, Wang Y, Ma J. Fast Prediction of Lipophilicity of Organofluorine Molecules: Deep Learning-Derived Polarity Characters and Experimental Tests. J Chem Inf Model 2022; 62:4928-4936. [PMID: 36223527 DOI: 10.1021/acs.jcim.2c01201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Fast and accurate estimation of lipophilicity for organofluorine molecules is in great demand for accelerating drug and materials discovery. A lipophilicity data set of organofluorine molecules (OFL data set), containing 1907 samples, is constructed through density functional theory (DFT) calculations and experimental measurements. An efficient and interpretable model, called PoLogP, is developed to predict the n-octanol/water partition coefficient, log Po/w, of organofluorine molecules on the basis of the descriptors of polarization, which is a combination of polarity descriptors, including the molecular polarity index and molecular polarizability (α), and hydrogen bond (HBs) index, consisting of the number of donors (NHBD) and acceptors (NHBA and NHB-FA). The present PoLogP with a combination of polarity descriptors is demonstrated to perform better than the dipole moment (μ) alone for the F-contained molecules. With the aid of a multilevel attention graph convolutional neural network model, the fast generation of polarity descriptors of organofluorine molecules could be achieved with the DFT accuracy based only on a topological molecular graph structure. The performance of PoLogP is further validated on synthesized organofluorine molecules and 2626 non-fluorinated molecules with satisfactory accuracy, highlighting the potential usage of PoLogP in high-throughput screening of the functional molecules with the desired solubility in various solvent media.
Collapse
Affiliation(s)
- Qingqing Jia
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Yifan Ni
- Jiangsu Key Laboratory of Advanced Organic Materials, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Ziteng Liu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Xu Gu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Ziyi Cui
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Mengting Fan
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Qiang Zhu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Yi Wang
- Jiangsu Key Laboratory of Advanced Organic Materials, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Jing Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China.,Jiangsu Key Laboratory of Advanced Organic Materials, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| |
Collapse
|
8
|
Zhu Q, Jia Q, Liu Z, Ge Y, Gu X, Cui Z, Fan M, Ma J. Molecular partition coefficient from machine learning with polarization and entropy embedded atom-centered symmetry functions. Phys Chem Chem Phys 2022; 24:23082-23088. [PMID: 36134471 DOI: 10.1039/d2cp02648a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Efficient prediction of the partition coefficient (log P) between polar and non-polar phases could shorten the cycle of drug and materials design. In this work, a descriptor, named 〈q - ACSFs〉conf, is proposed to take the explicit polarization effects in the polar phase and the conformation ensemble of energetic and entropic significance in the non-polar phase into consideration. The polarization effects are involved by embedding the partial charge directly derived from force fields or quantum chemistry calculations into the atom-centered symmetry functions (ACSFs), together with the entropy effects, which are averaged according to the Boltzmann distribution of different conformations taken from the similarity matrix. The model was trained with high-dimensional neural networks (HDNNs) on a public dataset PhysProp (with 41 039 samples). Satisfactory log P prediction performance was achieved on three other datasets, namely, Martel (707 molecules), Star & Non-Star (266) and Huuskonen (1870). The present 〈q - ACSFs〉conf model was also applicable to n-carboxylic acids with the number of carbons ranging from 2 to 14 and 54 kinds of organic solvent. It is easy to apply the present method to arbitrary sized systems and give a transferable atom-based partition coefficient.
Collapse
Affiliation(s)
- Qiang Zhu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Qingqing Jia
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Ziteng Liu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Yang Ge
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Xu Gu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Ziyi Cui
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Mengting Fan
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Jing Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education Institute of Theoretical and Computational Chemistry School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| |
Collapse
|
9
|
MRlogP: Transfer Learning Enables Accurate logP Prediction Using Small Experimental Training Datasets. Processes (Basel) 2021. [DOI: 10.3390/pr9112029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Small molecule lipophilicity is often included in generalized rules for medicinal chemistry. These rules aim to reduce time, effort, costs, and attrition rates in drug discovery, allowing the rejection or prioritization of compounds without the need for synthesis and testing. The availability of high quality, abundant training data for machine learning methods can be a major limiting factor in building effective property predictors. We utilize transfer learning techniques to get around this problem, first learning on a large amount of low accuracy predicted logP values before finally tuning our model using a small, accurate dataset of 244 druglike compounds to create MRlogP, a neural network-based predictor of logP capable of outperforming state of the art freely available logP prediction methods for druglike small molecules. MRlogP achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP. We have made the trained neural network predictor and all associated code for descriptor generation freely available. In addition, MRlogP may be used online via a web interface.
Collapse
|
10
|
Gorzalczany SB, Rodriguez Basso AG. Strategies to apply 3Rs in preclinical testing. Pharmacol Res Perspect 2021; 9:e00863. [PMID: 34609088 PMCID: PMC8491455 DOI: 10.1002/prp2.863] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 08/13/2021] [Indexed: 12/12/2022] Open
Abstract
Animal experimentation has been fundamental in biological and biomedical research. To guarantee the maximum quality, efficacy and/or safety of products intended for the use in humans in vivo testing is necessary; however, for over 60 years, alternative methods have been developed in response to the necessity to reduce the number of animals used in experimentation, to guarantee their welfare; resorting to animal models only when strictly necessary. The three Rs (Replacement, Reduction, and Refinement), seek to ensure the rational and respectful use of laboratory animals and maintain an adequate projection in terms of bioethical considerations. This article describes different approaches to apply 3Rs in preclinical experimentation for either research or regulatory purposes.
Collapse
Affiliation(s)
- Susana B. Gorzalczany
- Universidad de Buenos AiresFacultad de Farmacia y Bioquímica, Pharmacology DepartmentBuenos AiresArgentina
| | - Angeles G. Rodriguez Basso
- Universidad de Buenos AiresFacultad de Farmacia y Bioquímica, Pharmacology DepartmentBuenos AiresArgentina
| |
Collapse
|
11
|
Grant J, Özkan A, Oh C, Mahajan G, Prantil-Baun R, Ingber DE. Simulating drug concentrations in PDMS microfluidic organ chips. LAB ON A CHIP 2021; 21:3509-3519. [PMID: 34346471 PMCID: PMC8440455 DOI: 10.1039/d1lc00348h] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Microfluidic organ-on-a-chip (Organ Chip) cell culture devices are often fabricated using polydimethylsiloxane (PDMS) because it is biocompatible, transparent, elastomeric, and oxygen permeable; however, hydrophobic small molecules can absorb to PDMS, which makes it challenging to predict drug responses. Here, we describe a combined simulation and experimental approach to predict the spatial and temporal concentration profile of a drug under continuous dosing in a PDMS Organ Chip containing two parallel channels separated by a porous membrane that is lined with cultured cells, without prior knowledge of its log P value. First, a three-dimensional finite element model of drug loss into the chip was developed that incorporates absorption, adsorption, convection, and diffusion, which simulates changes in drug levels over time and space as a function of potential PDMS diffusion coefficients and log P values. By then experimentally measuring the diffusivity of the compound in PDMS and determining its partition coefficient through mass spectrometric analysis of the drug concentration in the channel outflow, it is possible to estimate the effective log P range of the compound. The diffusion and partition coefficients were experimentally derived for the antimalarial drug and potential SARS-CoV-2 therapeutic, amodiaquine, and incorporated into the model to quantitatively estimate the drug-specific concentration profile over time measured in human lung airway chips lined with bronchial epithelium interfaced with pulmonary microvascular endothelium. The same strategy can be applied to any device geometry, surface treatment, or in vitro microfluidic model to simulate the spatial and temporal gradient of a drug in 3D without prior knowledge of the partition coefficient or the rate of diffusion in PDMS. Thus, this approach may expand the use of PDMS Organ Chip devices for various forms of drug testing.
Collapse
Affiliation(s)
- Jennifer Grant
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| | - Alican Özkan
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| | - Crystal Oh
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| | - Gautam Mahajan
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| | - Rachelle Prantil-Baun
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
| | - Donald E Ingber
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA.
- Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
- Vascular Biology Program and Department of Surgery, Harvard Medical School and Boston Children's Hospital, Boston, MA 02115, USA
| |
Collapse
|
12
|
Lopez K, Pinheiro S, Zamora WJ. Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge. J Comput Aided Mol Des 2021; 35:923-931. [PMID: 34251523 PMCID: PMC8273033 DOI: 10.1007/s10822-021-00409-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 07/05/2021] [Indexed: 01/19/2023]
Abstract
A multiple linear regression model called MLR-3 is used for predicting the experimental n-octanol/water partition coefficient (log PN) of 22 N-sulfonamides proposed by the organizers of the SAMPL7 blind challenge. The MLR-3 method was trained with 82 molecules including drug-like sulfonamides and small organic molecules, which resembled the main functional groups present in the challenge dataset. Our model, submitted as "TFE-MLR", presented a root-mean-square error of 0.58 and mean absolute error of 0.41 in log P units, accomplishing the highest accuracy, among empirical methods and also in all submissions based on the ranked ones. Overall, the results support the appropriateness of multiple linear regression approach MLR-3 for computing the n-octanol/water partition coefficient in sulfonamide-bearing compounds. In this context, the outstanding performance of empirical methodologies, where 75% of the ranked submissions achieved root-mean-square errors < 1 log P units, support the suitability of these strategies for obtaining accurate and fast predictions of physicochemical properties as partition coefficients of bioorganic compounds.
Collapse
Affiliation(s)
- Kenneth Lopez
- School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica
| | - Silvana Pinheiro
- Institute of Exact and Natural Sciences, Federal University of Pará, Belém, Pará, 66075-110, Brazil
| | - William J Zamora
- School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Advanced Computing Lab (CNCA), National High Technology Center (CeNAT-CONARE), Pavas, San José, Costa Rica.
| |
Collapse
|
13
|
Lim H, Jung Y. MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J Cheminform 2021; 13:56. [PMID: 34332634 PMCID: PMC8325294 DOI: 10.1186/s13321-021-00533-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/15/2021] [Indexed: 01/04/2023] Open
Abstract
Recent advances in machine learning technologies and their applications have led to the development of diverse structure-property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic feature vectors calculates their interactions. The results of 6239 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides not only predictions of target properties but also more detailed physicochemical insights.
Collapse
Affiliation(s)
- Hyuntae Lim
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea
| | - YounJoon Jung
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
14
|
Multitask machine learning models for predicting lipophilicity (logP) in the SAMPL7 challenge. J Comput Aided Mol Des 2021; 35:901-909. [PMID: 34273053 PMCID: PMC8367913 DOI: 10.1007/s10822-021-00405-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 06/22/2021] [Indexed: 12/22/2022]
Abstract
Accurate prediction of lipophilicity—logP—based on molecular structures is a well-established field. Predictions of logP are often used to drive forward drug discovery projects. Driven by the SAMPL7 challenge, in this manuscript we describe the steps that were taken to construct a novel machine learning model that can predict and generalize well. This model is based on the recently described Directed-Message Passing Neural Networks (D-MPNNs). Further enhancements included: both the inclusion of additional datasets from ChEMBL (RMSE improvement of 0.03), and the addition of helper tasks (RMSE improvement of 0.04). To the best of our knowledge, the concept of adding predictions from other models (Simulations Plus logP and logD@pH7.4, respectively) as helper tasks is novel and could be applied in a broader context. The final model that we constructed and used to participate in the challenge ranked 2/17 ranked submissions with an RMSE of 0.66, and an MAE of 0.48 (submission: Chemprop). On other datasets the model also works well, especially retrospectively applied to the SAMPL6 challenge where it would have ranked number one out of all submissions (RMSE of 0.35). Despite the fact that our model works well, we conclude with suggestions that are expected to improve the model even further.
Collapse
|
15
|
Donyapour N, Dickson A. Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method. J Comput Aided Mol Des 2021; 35:819-830. [PMID: 34181200 PMCID: PMC8295205 DOI: 10.1007/s10822-021-00400-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/17/2021] [Indexed: 02/02/2023]
Abstract
The prediction of [Formula: see text] values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a "master" dataset comprised of over 41,000 unique [Formula: see text] values. The specific molecular targets in the SAMPL7 [Formula: see text] prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 [Formula: see text] units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end [Formula: see text] predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.
Collapse
Affiliation(s)
- Nazanin Donyapour
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Alex Dickson
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
16
|
Donyapour N, Hirn MJ, Dickson A. ClassicalGSG: Prediction of log P using classical molecular force fields and geometric scattering for graphs. J Comput Chem 2021; 42:1006-1017. [PMID: 33786857 PMCID: PMC8062296 DOI: 10.1002/jcc.26519] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/15/2022]
Abstract
This work examines methods for predicting the partition coefficient (log P) for a dataset of small molecules. Here, we use atomic attributes such as radius and partial charge, which are typically used as force field parameters in classical molecular dynamics simulations. These atomic attributes are transformed into index-invariant molecular features using a recently developed method called geometric scattering for graphs (GSG). We call this approach "ClassicalGSG" and examine its performance under a broad range of conditions and hyperparameters. We train ClassicalGSG log P predictors with neural networks using 10,722 molecules from the OpenChem dataset and apply them to predict the log P values from four independent test sets. The ClassicalGSG method's performance is compared to a baseline model that employs graph convolutional networks. Our results show that the best prediction accuracies are obtained using atomic attributes generated with the CHARMM generalized force field and 2D molecular structures.
Collapse
Affiliation(s)
- Nazanin Donyapour
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Matthew J. Hirn
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, USA
- Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
- Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Alex Dickson
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
17
|
Zhao ZW, Omar ÖH, Padula D, Geng Y, Troisi A. Computational Identification of Novel Families of Nonfullerene Acceptors by Modification of Known Compounds. J Phys Chem Lett 2021; 12:5009-5015. [PMID: 34018746 DOI: 10.1021/acs.jpclett.1c01010] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We considered a database of tens of thousands of known organic semiconductors and identified those compounds with computed electronic properties (orbital energies, excited state energies, and oscillator strengths) that would make them suitable as nonfullerene electron acceptors in organic solar cells. The range of parameters for the desirable acceptors is determined from a set of experimentally characterized high-efficiency nonfullerene acceptors. This search leads to ∼30 lead compounds never considered before for organic photovoltaic applications. We then proceed to modify these compounds to bring their computed solubility in line with that of the best small-molecule nonfullerene acceptors. A further refinement of the search can be based on additional properties like the reorganization energy for chemical reduction. This simple strategy, which relies on a few easily computable parameters and can be expanded to a larger set of molecules, enables the identification of completely new chemical families to be explored experimentally.
Collapse
Affiliation(s)
- Zhi-Wen Zhao
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun 130024, Jilin, P. R. China
| | - Ömer H Omar
- Department of Chemistry, University of Liverpool, Liverpool L69 3BX, U.K
| | - Daniele Padula
- Dipartimento di Biotecnologie, Chimica e Farmacia, Università di Siena, via A. Moro 2, Siena 53100, Italy
| | - Yun Geng
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun 130024, Jilin, P. R. China
| | - Alessandro Troisi
- Department of Chemistry, University of Liverpool, Liverpool L69 3BX, U.K
| |
Collapse
|
18
|
Predicting the membrane permeability of organic fluorescent probes by the deep neural network based lipophilicity descriptor DeepFl-LogP. Sci Rep 2021; 11:6991. [PMID: 33772099 PMCID: PMC7997998 DOI: 10.1038/s41598-021-86460-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/16/2021] [Indexed: 01/17/2023] Open
Abstract
Light microscopy has become an indispensable tool for the life sciences, as it enables the rapid acquisition of three-dimensional images from the interior of living cells/tissues. Over the last decades, super-resolution light microscopy techniques have been developed, which allow a resolution up to an order of magnitude higher than that of conventional light microscopy. Those techniques require labelling of cellular structures with fluorescent probes exhibiting specific properties, which are supplied from outside and therefore have to surpass cell membranes. Currently, major efforts are undertaken to develop probes which can surpass cell membranes and exhibit the photophysical properties required for super-resolution imaging. However, the process of probe development is still based on a tedious and time consuming manual screening. An accurate computer based model that enables the prediction of the cell permeability based on their chemical structure would therefore be an invaluable asset for the development of fluorescent probes. Unfortunately, current models, which are based on multiple molecular descriptors, are not well suited for this task as they require high effort in the usage and exhibit moderate accuracy in their prediction. Here, we present a novel fragment based lipophilicity descriptor DeepFL-LogP, which was developed on the basis of a deep neural network. DeepFL-LogP exhibits excellent correlation with the experimental partition coefficient reference data (R2 = 0.892 and MSE = 0.359) of drug-like substances. Further a simple threshold permeability model on the basis of this descriptor allows to categorize the permeability of fluorescent probes with 96% accuracy. This novel descriptor is expected to largely simplify and speed up the development process for novel cell permeable fluorophores.
Collapse
|
19
|
Plante J, Caine BA, Popelier PLA. Enhancing Carbon Acid pK a Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values. Molecules 2021; 26:1048. [PMID: 33671348 PMCID: PMC7922142 DOI: 10.3390/molecules26041048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 11/25/2022] Open
Abstract
The prediction of the aqueous pKa of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pKa prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pKa of the ionisable centre. In the current work, we augment our dataset with pKa values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.
Collapse
Affiliation(s)
- Jeffrey Plante
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, UK;
| | - Beth A. Caine
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UK;
| | - Paul L. A. Popelier
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UK;
- Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| |
Collapse
|
20
|
Ponting DJ, van Deursen R, Ott MA. Machine Learning Predicts Degree of Aromaticity from Structural Fingerprints. J Chem Inf Model 2020; 60:4560-4568. [PMID: 32966076 DOI: 10.1021/acs.jcim.0c00483] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Prediction of whether a compound is "aromatic" is at first glance a relatively simple task-does it obey Hückel's rule (planar cyclic π-system with 4n + 2 electrons) or not? However, aromaticity is far from a binary property, and there are distinct variations in the chemical and biological behavior of different systems which obey Hückel's rule and are thus classified as aromatic. To that end, the aromaticity of each molecule in a large public dataset was quantified by an extension of the work of Raczyńska et al. Building on this data, a method is proposed for machine learning the degree of aromaticity of each aromatic ring in a molecule. Categories are derived from the numeric results, allowing the differentiation of structural patterns between them and thus a better representation of the underlying chemical and biological behavior in expert and (Q)SAR systems.
Collapse
Affiliation(s)
- David J Ponting
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, United Kingdom
| | - Ruud van Deursen
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, United Kingdom
| | - Martin A Ott
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, United Kingdom
| |
Collapse
|
21
|
Sleight TW, Khanna V, Gilbertson LM, Ng CA. Network Analysis for Prioritizing Biodegradation Metabolites of Polycyclic Aromatic Hydrocarbons. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:10735-10744. [PMID: 32692172 DOI: 10.1021/acs.est.0c02217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Polycyclic aromatic hydrocarbons (PAHs) are a diverse group of environmental contaminants released during the combustion of organic materials and the production and utilization of fossil fuels. Once released, PAHs deposit in soil and water bodies where they are subjected to environmental transport and transformations. As they degrade, intermediate transformation products may play an important role in their environmental impact. However, studying the effects of these degradation products has proven challenging because of the complexity, transience, and low concentration of many intermediates. Herein, a novel integration of a pathway prediction system and network theory was developed and applied to a set of four PAHs to demonstrate a possible solution to this challenge. Network analysis techniques were employed to refine the thousands of potential outputs and elucidate compounds of interest. Using these tools, we determined correlations between PAH degradation network data and intermediate metabolite structures, gaining information about the chemical characteristics of compounds based on their placement within the degradation network. Upon applying our developed filtering algorithm, we are able to predict up to 48% of the most common transformation products identified in a comprehensive empirical literature review. Additionally, our integrated approach uncovers potential metabolites which connect those found by past empirical studies but are currently undetected, thereby filling in the gaps of information in PAH degradation pathways.
Collapse
Affiliation(s)
- Trevor W Sleight
- Department of Civil and Environmental Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| | - Vikas Khanna
- Department of Civil and Environmental Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
- Secondary Appointment, Department of Chemical and Petroleum Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| | - Leanne M Gilbertson
- Department of Civil and Environmental Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
- Secondary Appointment, Department of Chemical and Petroleum Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| | - Carla A Ng
- Department of Civil and Environmental Engineering, University of Pittsburgh, Benedum Hall, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
- Secondary Appointment, Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
22
|
Prasad S, Brooks BR. A deep learning approach for the blind logP prediction in SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:535-542. [PMID: 32002779 PMCID: PMC8689685 DOI: 10.1007/s10822-020-00292-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/17/2020] [Indexed: 12/14/2022]
Abstract
Water octanol partition coefficient serves as a measure for the lipophilicity of a molecule and is important in the field of drug discovery. A novel method for computational prediction of logarithm of partition coefficient (logP) has been developed using molecular fingerprints and a deep neural network. The machine learning model was trained on a dataset of 12,000 molecules and tested on 2000 molecules. In this article, we present our results for the blind prediction of logP for the SAMPL6 challenge. While the best submission achieved a RMSE of 0.41 logP units, our submission had a RMSE of 0.61 logP units. Overall, we ranked in the top quarter out of the 92 submissions that were made. Our results show that the deep learning model can be used as a fast, accurate and robust method for high throughput prediction of logP of small molecules.
Collapse
Affiliation(s)
- Samarjeet Prasad
- Biophysics and Biophysical Chemistry, The Johns Hopkins University, School of Medicine, Baltimore, MD, 21205, USA.
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA.
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA
| |
Collapse
|
23
|
Yau E, Olivares-Morales A, Gertz M, Parrott N, Darwich AS, Aarons L, Ogungbenro K. Global Sensitivity Analysis of the Rodgers and Rowland Model for Prediction of Tissue: Plasma Partitioning Coefficients: Assessment of the Key Physiological and Physicochemical Factors That Determine Small-Molecule Tissue Distribution. AAPS JOURNAL 2020; 22:41. [PMID: 32016678 DOI: 10.1208/s12248-020-0418-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/07/2020] [Indexed: 12/14/2022]
Abstract
In physiologically based pharmacokinetic (PBPK) modelling, the large number of input parameters, limited amount of available data and the structural model complexity generally hinder simultaneous estimation of uncertain and/or unknown parameters. These parameters are generally subject to estimation. However, the approaches taken for parameter estimation vary widely. Global sensitivity analyses are proposed as a method to systematically determine the most influential parameters that can be subject to estimation. Herein, a global sensitivity analysis was conducted to identify the key drug and physiological parameters influencing drug disposition in PBPK models and to potentially reduce the PBPK model dimensionality. The impact of these parameters was evaluated on the tissue-to-unbound plasma partition coefficients (Kpus) predicted by the Rodgers and Rowland model using Latin hypercube sampling combined to partial rank correlation coefficients (PRCC). For most drug classes, PRCC showed that LogP and fraction unbound in plasma (fup) were generally the most influential parameters for Kpu predictions. For strong bases, blood:plasma partitioning was one of the most influential parameter. Uncertainty in tissue composition parameters had a large impact on Kpu and Vss predictions for all classes. Among tissue composition parameters, changes in Kpu outputs were especially attributed to changes in tissue acidic phospholipid concentrations and extracellular protein tissue:plasma ratio values. In conclusion, this work demonstrates that for parameter estimation involving PBPK models and dimensionality reduction purposes, less influential parameters might be assigned fixed values depending on the parameter space, while influential parameters could be subject to parameters estimation.
Collapse
Affiliation(s)
- Estelle Yau
- Centre for Applied Pharmacokinetic Research (CAPKR), The University of Manchester, Manchester, UK.,Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Andrés Olivares-Morales
- Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland.
| | - Michael Gertz
- Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Neil Parrott
- Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Adam S Darwich
- Centre for Applied Pharmacokinetic Research (CAPKR), The University of Manchester, Manchester, UK.,Logistics and Informatics in Health Care, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), KTH Royal Institute of Technology, Stockholm, Sweden
| | - Leon Aarons
- Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | - Kayode Ogungbenro
- Roche Pharma and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
| |
Collapse
|
24
|
Lui R, Guan D, Matthews S. A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge. J Comput Aided Mol Des 2020; 34:523-534. [PMID: 31933037 DOI: 10.1007/s10822-020-00279-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022]
Abstract
Effective representation of a molecule is required to develop useful quantitative structure-property relationships (QSPR) for accurate prediction of chemical properties. The octanol-water partition coefficient logP, a measure of lipophilicity, is an important property for pharmacological and toxicological endpoints used in the pharmaceutical and regulatory spheres. We compare physicochemical descriptors, structural keys, and circular fingerprints in their ability to effectively represent a chemical space and characterise molecular features to correlate with lipophilicity. Exploratory landscape continuity analyses revealed that whole-molecule physicochemical descriptors could map together compounds that were similar in both molecular features and logP, indicating higher potential for use in logP QSPRs compared to the substructural approach of structural keys and circular fingerprints. Indeed, logP QSPR models parameterised by physicochemical descriptors consistently performed with the lowest error. Our best performing model was a stochastic gradient descent-optimised multilinear regression with 1438 descriptors, returning an internal benchmark RMSE of 1.03 log units. This corroborates the well-established notion that lipophilicity is an additive, whole-molecule property. We externally tested the model by participating in the 2019 SAMPL6 logP Prediction Challenge and blindly predicting for 11 protein kinase inhibitor fragment-like molecules. Our model returned an RMSE of 0.49 log units, placing eighth overall and third in the empirical methods category (submission ID 'hdpuj'). Permutation feature importance analyses revealed that physicochemical descriptors could characterise predictive molecular features highly relevant to the kinase inhibitor fragment-like molecules.
Collapse
Affiliation(s)
- Raymond Lui
- Pharmacoinformatics Laboratory, Discipline of Pharmacology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Davy Guan
- Pharmacoinformatics Laboratory, Discipline of Pharmacology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Slade Matthews
- Pharmacoinformatics Laboratory, Discipline of Pharmacology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
25
|
Esaki T, Ohashi R, Watanabe R, Natsume-Kitatani Y, Kawashima H, Nagao C, Mizuguchi K. Computational Model To Predict the Fraction of Unbound Drug in the Brain. J Chem Inf Model 2019; 59:3251-3261. [DOI: 10.1021/acs.jcim.9b00180] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tsuyoshi Esaki
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| | - Rikiya Ohashi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
- Discovery Technology Laboratories, Mitsubishi Tanabe Pharma Corporation, 2-2-50 Kawagishi, Toda, Saitama 335-8505, Japan
| | - Reiko Watanabe
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| | - Yayoi Natsume-Kitatani
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
- Laboratory of In-silico Drug Design, Center of Drug Design Research, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| | - Hitoshi Kawashima
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| | - Chioko Nagao
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
- Laboratory of In-silico Drug Design, Center of Drug Design Research, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
- Laboratory of In-silico Drug Design, Center of Drug Design Research, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Osaka, Ibaraki 567-0085, Japan
| |
Collapse
|
26
|
Hanser T, Steinmetz FP, Plante J, Rippmann F, Krier M. Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting. J Cheminform 2019; 11:9. [PMID: 30712151 PMCID: PMC6689868 DOI: 10.1186/s13321-019-0334-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 01/25/2019] [Indexed: 11/25/2022] Open
Abstract
In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources can improve the relevance of the training set in regard of the target chemical space, leading to improved performance. Similarly we will demonstrate that combining multiple statistical models together, and with expert systems, can lead to positive synergistic effects when taking into account the confidence in the predictions of the merged systems. The best combinations analyzed display a good hERG predictivity. Finally, this work demonstrates the suitability of the SOHN methodology for building models in the context of receptor based endpoints like hERG inhibition when using the appropriate pharmacophoric descriptors.
Collapse
|