1
|
Gheta SKO, Bonin A, Gerlach T, Göller AH. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 2023; 37:765-789. [PMID: 37878216 DOI: 10.1007/s10822-023-00538-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023]
Abstract
In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.
Collapse
Affiliation(s)
- Sadra Kashef Ol Gheta
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Anne Bonin
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Thomas Gerlach
- Bayer AG, Crop Science, R&D, Digital Transformation, 40789, Monheim, Germany
- Bayer AG, Engineering & Technology, Thermal Separation Technologies, 51368, Leverkusen, Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| |
Collapse
|
2
|
Montecinos R, Diaz-Wilson F, Bravo-Sepulveda A, Salas CO, Recabarren-Gajardo G, Nome F. Investigation about the complexation of trimethylammonium-derived pillar[5]arene with indole and azaindole derivatives. J PHYS ORG CHEM 2018. [DOI: 10.1002/poc.3889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rodrigo Montecinos
- Facultad de Química; Pontificia Universidad Católica de Chile; Santiago Chile
| | | | | | - Cristian O. Salas
- Facultad de Química; Pontificia Universidad Católica de Chile; Santiago Chile
| | | | - Faruk Nome
- Department of Chemistry; Federal University of Santa Catarina; Florianopolis Santa Catarina Brazil
| |
Collapse
|
3
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 512] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
4
|
Luo X, Krumrine JR, Shenvi AB, Pierson ME, Bernstein PR. Calculation and application of activity discriminants in lead optimization. J Mol Graph Model 2010; 29:372-81. [PMID: 20800520 DOI: 10.1016/j.jmgm.2010.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 07/10/2010] [Accepted: 07/14/2010] [Indexed: 11/18/2022]
Abstract
We present a technique for computing activity discriminants of in vitro (pharmacological, DMPK, and safety) assays and the application to the prediction of in vitro activities of proposed synthetic targets during the lead optimization phase of drug discovery projects. This technique emulates how medicinal chemists perform SAR analysis and activity prediction. The activity discriminants that are functions of 6 commonly used medicinal chemistry descriptors can be interpreted easily by medicinal chemists. Further, visualization with Spotfire allows medicinal chemists to analyze how the query molecule is related to compounds tested previously, and to evaluate easily the relevance of the activity discriminants to the activities of the query molecule. Validation with all compounds synthesized and tested in AstraZeneca Wilmington since 2006 demonstrates that this approach is useful for prioritizing new synthetic targets for synthesis.
Collapse
Affiliation(s)
- Xincai Luo
- Department of Chemistry, AstraZeneca Pharmaceuticals, 1800 Concord Pike, Wilmington, DE 19850, USA.
| | | | | | | | | |
Collapse
|
5
|
Fjodorova N, Vračko M, Novič M, Roncaglioni A, Benfenati E. New public QSAR model for carcinogenicity. Chem Cent J 2010; 4 Suppl 1:S3. [PMID: 20678182 PMCID: PMC2913330 DOI: 10.1186/1752-153x-4-s1-s3] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration. RESULTS Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B. CONCLUSION Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible.
Collapse
Affiliation(s)
- Natalja Fjodorova
- National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Marjan Vračko
- National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Marjana Novič
- National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Alessandra Roncaglioni
- Institute for Pharmacological Research "Mario Negri", Via La Masa 19, 20156 Milan, Italy
| | - Emilio Benfenati
- Institute for Pharmacological Research "Mario Negri", Via La Masa 19, 20156 Milan, Italy
| |
Collapse
|
6
|
Fjodorova N, Vracko M, Novic M, Roncaglioni A, Benfenati E. New public QSAR model for carcinogenicity. Chem Cent J 2010. [PMID: 20678182 DOI: 10.1186/1752–153x–4–s1–s3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration. RESULTS Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B. CONCLUSION Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible.
Collapse
Affiliation(s)
- Natalja Fjodorova
- National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia.
| | | | | | | | | |
Collapse
|
7
|
Kertesz TM, Hall LH, Hill DW, Grant DF. CE50: quantifying collision induced dissociation energy for small molecule characterization and identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2009; 20:1759-1767. [PMID: 19616966 DOI: 10.1016/j.jasms.2009.06.002] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2009] [Revised: 05/28/2009] [Accepted: 06/02/2009] [Indexed: 05/28/2023]
Abstract
Survival yield analysis is routinely used in mass spectroscopy as a tool for assessing precursor ion stability and internal energy. Because ion internal energy and decomposition reaction rates are dependent on chemical structure, we reasoned that survival yield curves should be compound-specific and therefore useful for chemical identification. In this study, a quantitative approach for analyzing the correlation between survival yield and collision energy was developed and validated. This method is based on determining the collision energy (CE) at which the survival yield is 50% (CE(50)) and, further, provides slope and intercept values for each survival yield curve. In initial experiments using a defined set of homologous compounds, we found that CE(50) values were easily determined, quantitative, highly reproducible, and could discriminate between structural and even positional isomers. Further analysis demonstrated that CE(50) values were independent of cone potential and orthogonal to compound mass. Experimentally determined CE(50) values for a diverse set of 54 compounds were correlated to Molconn molecular structure descriptors. The resulting model yielded a statistically significant linear correlation between experimental and calculated CE(50) values and identified several structural characteristics related to precursor ion stability and fragmentation mechanism. Thus, the CE(50) is a promising method for compound identification and discrimination.
Collapse
Affiliation(s)
- Tzipporah M Kertesz
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269-3092, USA
| | | | | | | |
Collapse
|
8
|
Kenny PW. Hydrogen Bonding, Electrostatic Potential, and Molecular Design. J Chem Inf Model 2009; 49:1234-44. [DOI: 10.1021/ci9000234] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
9
|
On the importance of topological descriptors in understanding structure–property relationships. J Comput Aided Mol Des 2008; 22:441-60. [DOI: 10.1007/s10822-008-9204-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 02/20/2008] [Indexed: 10/22/2022]
|
10
|
Dong PP, Zhang YY, Ge GB, Ai CZ, Liu Y, Yang L, Liu CX. Modeling resistance index of taxoids to MCF-7 cell lines using ANN together with electrotopological state descriptors. Acta Pharmacol Sin 2008; 29:385-96. [PMID: 18298905 DOI: 10.1111/j.1745-7254.2008.00746.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
AIM To develop an artificial neural network model for predicting the resistance index (RI) of taxoids. METHODS A dataset of 63 experimental data points were compiled from published studies and randomly subdivided into training and external test sets. Electrotopological state (E-state) indices were calculated to characterize molecular structure together with a principle component analysis to reduce the variable space and analyze the relative importance of E-state indices. Back propagation neural network technique was used to build the models. Five-fold cross-validation was performed and 5 models with different compound composition in training and validation sets were built. The independent external test set was used to evaluate the predictive ability of models. RESULTS The final model proved to be good with the cross-validation Q2cv0.62, external testing R2 0.84, and the slope of the regression line through the origin for the testing set at 0.9933. CONCLUSION The quantitative structure-activity relationship model can predict the RI to a relative nicety, which will aid in the development of new anti-multidrug resistance taxoids.
Collapse
Affiliation(s)
- Pei-pei Dong
- Laboratory of Pharmaceutical Resource Discovery, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | | | | | | | | | | | | |
Collapse
|
11
|
Spasov B, Hall L. Modeling Dipeptides as ACE Inhibitors and Bitter-Tasting Compounds by Means of E-State Structure-Information Representation. Chem Biodivers 2007; 4:2528-39. [DOI: 10.1002/cbdv.200790206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|