1
|
Ferraz-Caetano J, Teixeira F, Cordeiro MNDS. Data-driven, explainable machine learning model for predicting volatile organic compounds' standard vaporization enthalpy. CHEMOSPHERE 2024; 359:142257. [PMID: 38719116 DOI: 10.1016/j.chemosphere.2024.142257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 04/18/2024] [Accepted: 05/04/2024] [Indexed: 05/21/2024]
Abstract
The accurate prediction of standard vaporization enthalpy (ΔvapHm°) for volatile organic compounds (VOCs) is of paramount importance in environmental chemistry, industrial applications and regulatory compliance. To overcome traditional experimental methods for predicting ΔvapHm° of VOCs, machine learning (ML) models enable a high-throughput, cost-effective property estimation. But despite a rising momentum, existing ML algorithms still present limitations in prediction accuracy and broad chemical applications. In this work, we present a data driven, explainable supervised ML model to predict ΔvapHm° of VOCs. The model was built on an established experimental database of 2410 unique molecules and 223 VOCs categorized by chemical groups. Using supervised ML regression algorithms, the Random Forest successfully predicted VOCs' ΔvapHm° with a mean absolute error of 3.02 kJ mol-1 and a 95% test score. The model was successfully validated through the prediction of ΔvapHm° for a known database of VOCs and through molecular group hold-out tests. Through chemical feature importance analysis, this explainable model revealed that VOC polarizability, connectivity indexes and electrotopological state are key for the model's prediction accuracy. We thus present a replicable and explainable model, which can be further expanded towards the prediction of other thermodynamic properties of VOCs.
Collapse
Affiliation(s)
- José Ferraz-Caetano
- LAQV-REQUIMTE - Department of Chemistry and Biochemistry - Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007, Porto, Portugal.
| | - Filipe Teixeira
- CQUM - Centre of Chemistry, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - M Natália D S Cordeiro
- LAQV-REQUIMTE - Department of Chemistry and Biochemistry - Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007, Porto, Portugal.
| |
Collapse
|
2
|
Abstract
AbstractGraph entropy is an important measure of the evolution and complexity of networks. Bipartite graph is a special network and an important mathematical model for system resource allocation and management. In reality, a network system usually has obvious directionality. The direction of the network, or the movement trend of the network, can be described with spectrum index. However, little research has been done on the eigenvalue-based entropy of directed bipartite network. In this study, based on the adjacency matrix, the in-degree Laplacian matrix and the in-degree signless Laplacian matrix of directed bipartite graph, we defined the eigenvalue-based entropy for the directed bipartite network. Using the eigenvalue-based entropy, we described the evolution law of the directed bipartite network structure. Aiming at the direction and bipartite feature of the directed bipartite network, we improved the generation algorithm of the undirected network. We then constructed the directed bipartite nearest-neighbor coupling network, directed bipartite small-world network, directed bipartite scale-free network, and directed bipartite random network. In the proposed model, spectrum of those directed bipartite network is used to describe the directionality and bipartite property. Moreover, eigenvalue-based entropy is empirically studied on a real-world directed movie recommendation network, in which the law of eigenvalue-base entropy is observed. That is, if eigenvalue-based entropy value of the recommendation system is large, the evolution of movie recommendation network becomes orderless. While if eigenvalue-based entropy value is small, the structural evolution of the movie recommendation network tends to be regular. The simulation experiment shows that eigenvalue-based entropy value in the real directed bipartite network is between the values of a directed bipartite small world and a scale-free network. It shows that the real directed bipartite network has the structural property of the two typical directed bipartite networks. The coexistence of the small-world phenomena and the scale-free phenomena in the real network is consistent with the evolution law of typical network models. The experimental results show that the validity and rationality of the definition of eigenvalue-based entropy, which serves as a tool in the analysis of directed bipartite networks.
Collapse
|
3
|
Sun Y, Zhao H, Liang J, Ma X. Eigenvalue-based entropy in directed complex networks. PLoS One 2021; 16:e0251993. [PMID: 34153043 PMCID: PMC8216510 DOI: 10.1371/journal.pone.0251993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 04/19/2021] [Indexed: 11/18/2022] Open
Abstract
Entropy is an important index for describing the structure, function, and evolution of network. The existing research on entropy is primarily applied to undirected networks. Compared with an undirected network, a directed network involves a special asymmetric transfer. The research on the entropy of directed networks is very significant to effectively quantify the structural information of the whole network. Typical complex network models include nearest-neighbour coupling network, small-world network, scale-free network, and random network. These network models are abstracted as undirected graphs without considering the direction of node connection. For complex networks, modeling through the direction of network nodes is extremely challenging. In this paper, based on these typical models of complex network, a directed network model considering node connection in-direction is proposed, and the eigenvalue entropies of three matrices in the directed network is defined and studied, where the three matrices are adjacency matrix, in-degree Laplacian matrix and in-degree signless Laplacian matrix. The eigenvalue-based entropies of three matrices are calculated in directed nearest-neighbor coupling, directed small world, directed scale-free and directed random networks. Through the simulation experiment on the real directed network, the result shows that the eigenvalue entropy of the real directed network is between the eigenvalue entropy of directed scale-free network and directed small-world network.
Collapse
Affiliation(s)
- Yan Sun
- School of Computer, Qinghai Normal University, Xining, China
- School of Computer, Qinghai Nationality University, Xining, China
- The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining, Qinghai, China
| | - Haixing Zhao
- School of Computer, Qinghai Normal University, Xining, China
- The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining, Qinghai, China
- * E-mail:
| | - Jing Liang
- School of Computer, Qinghai Normal University, Xining, China
- The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining, Qinghai, China
| | - Xiujuan Ma
- School of Computer, Qinghai Normal University, Xining, China
- The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining, Qinghai, China
| |
Collapse
|
4
|
Rajabi M, Shafiei F. Structure–property relationships of aliphatic esters using topological descriptors and backward
‐
multiple linear regression method. J CHIN CHEM SOC-TAIP 2020. [DOI: 10.1002/jccs.201900528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Mehdi Rajabi
- Department of Chemistry, Science Faculty, Arak BranchIslamic Azad University Arak Iran
| | - Fatemeh Shafiei
- Department of Chemistry, Science Faculty, Arak BranchIslamic Azad University Arak Iran
| |
Collapse
|
5
|
Toropov AA, Toropova AP. QSPR/QSAR: State-of-Art, Weirdness, the Future. Molecules 2020; 25:E1292. [PMID: 32178379 PMCID: PMC7143984 DOI: 10.3390/molecules25061292] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 03/06/2020] [Accepted: 03/10/2020] [Indexed: 12/15/2022] Open
Abstract
Ability of quantitative structure-property/activity relationships (QSPRs/QSARs) to serve for epistemological processes in natural sciences is discussed. Some weirdness of QSPR/QSAR state-of-art is listed. There are some contradictions in the research results in this area. Sometimes, these should be classified as paradoxes or weirdness. These points are often ignored. Here, these are listed and briefly commented. In addition, hypotheses on the future evolution of the QSPR/QSAR theory and practice are suggested. In particular, the possibility of extending of the QSPR/QSAR problematic by searching for the "statistical similarity" of different endpoints is suggested and illustrated by an example for relatively "distanced each from other" endpoints, namely (i) mutagenicity, (ii) anticancer activity, and (iii) blood-brain barrier.
Collapse
Affiliation(s)
| | - Alla P. Toropova
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy;
| |
Collapse
|
6
|
A study of the Immune Epitope Database for some fungi species using network topological indices. Mol Divers 2017; 21:713-718. [DOI: 10.1007/s11030-017-9749-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 05/09/2017] [Indexed: 10/19/2022]
|
7
|
Keshavarz MH, Pouretedal HR, Ghaedsharafi AR, Taghizadeh SE. Simple Method for Prediction of the Standard Gibbs Free Energy of Formation of Energetic Compounds. PROPELLANTS EXPLOSIVES PYROTECHNICS 2014. [DOI: 10.1002/prep.201400032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
8
|
Maleki A, Daraei H, Alaei L, Faraji A. Comparison of QSAR models based on combinations of genetic algorithm, stepwise multiple linear regression, and artificial neural network methods to predict K d of some derivatives of aromatic sulfonamides as carbonic anhydrase II inhibitors. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2014. [DOI: 10.1134/s106816201306006x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Atabati M, Emamalizadeh R. The Hydrogen Perturbation in Molecular Connectivity Indices and their Application to a QSPR Study. J SOLUTION CHEM 2012. [DOI: 10.1007/s10953-012-9919-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Khajeh A, Modarress H. Quantitative Structure–Property Relationship Prediction of Gas Heat Capacity for Organic Compounds. Ind Eng Chem Res 2012. [DOI: 10.1021/ie301317f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Aboozar Khajeh
- Department of Chemical Engineering, Amirkabir University of Technology (Tehran
Polytechnic), Hafez Avenue, 15914 Tehran, Iran
| | - Hamid Modarress
- Department of Chemical Engineering, Amirkabir University of Technology (Tehran
Polytechnic), Hafez Avenue, 15914 Tehran, Iran
| |
Collapse
|
11
|
Gupta M, Gupta S, Dureja H, Madan AK. Superaugmented Eccentric Distance Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for QSAR/QSPR. Chem Biol Drug Des 2011; 79:38-52. [DOI: 10.1111/j.1747-0285.2011.01264.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
12
|
Li X, Rankin SE. Influence of unlimited 3-membered ring cyclization on a multiscale dynamic Monte Carlo/continuum model of drying and curing in sol–gel silica films. Chem Eng Sci 2011. [DOI: 10.1016/j.ces.2010.11.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Pérez-Montoto LG, Santana L, González-Díaz H. Scoring function for DNA-drug docking of anticancer and antiparasitic compounds based on spectral moments of 2D lattice graphs for molecular dynamics trajectories. Eur J Med Chem 2009; 44:4461-9. [PMID: 19604606 PMCID: PMC7127518 DOI: 10.1016/j.ejmech.2009.06.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 06/04/2009] [Accepted: 06/05/2009] [Indexed: 02/02/2023]
Abstract
We introduce here a new class of invariants for MD trajectories based on the spectral moments pi(k)(L) of the Markov matrix associated to lattice network-like (LN) graph representations of Molecular Dynamics (MD) trajectories. The procedure embeds the MD energy profiles on a 2D Cartesian coordinates system using simple heuristic rules. At the same time, we associate the LN with a Markov matrix that describes the probabilities of passing from one state to other in the new 2D space. We construct this type of LNs for 422 MD trajectories obtained in DNA-drug docking experiments of 57 furocoumarins. The combined use of psoralens+ultraviolet light (UVA) radiation is known as PUVA therapy. PUVA is effective in the treatment of skin diseases such as psoriasis and mycosis fungoides. PUVA is also useful to treat human platelet (PTL) concentrates in order to eliminate Leishmania spp. and Trypanosoma cruzi. Both are parasites that cause Leishmaniosis (a dangerous skin and visceral disease) and Chagas disease, respectively; and may circulate in blood products collected from infected donors. We included in this study both lineal (psoralens) and angular (angelicins) furocoumarins. In the study, we grouped the LNs on two sets; set1: DNA-drug complex MD trajectories for active compounds and set2: MD trajectories of non-active compounds or no-optimal MD trajectories of active compounds. We calculated the respective pi(k)(L) values for all these LNs and used them as inputs to train a new classifier that discriminate set1 from set2 cases. In training series the model correctly classifies 79 out of 80 (specificity=98.75%) set1 and 226 out of 238 (Sensitivity=94.96%) set2 trajectories. In independent validation series the model correctly classifies 26 out of 26 (specificity=100%) set1 and 75 out of 78 (sensitivity=96.15%) set2 trajectories. We propose this new model as a scoring function to guide DNA-docking studies in the drug design of new coumarins for anticancer or antiparasitic PUVA therapy.
Collapse
Affiliation(s)
- Lázaro G. Pérez-Montoto
- Department of Microbiology & Parasitology, and Department of Organic Chemistry
- Faculty of Pharmacy, University of Santiago de Compostela, 15782, Spain
| | - Lourdes Santana
- Faculty of Pharmacy, University of Santiago de Compostela, 15782, Spain
| | - Humberto González-Díaz
- Department of Microbiology & Parasitology, and Department of Organic Chemistry
- Faculty of Pharmacy, University of Santiago de Compostela, 15782, Spain
| |
Collapse
|
14
|
Pérez-Montoto LG, Dea-Ayuela MA, Prado-Prado FJ, Bolas-Fernández F, Ubeira FM, González-Díaz H. Study of peptide fingerprints of parasite proteins and drug-DNA interactions with Markov-Mean-Energy invariants of biopolymer molecular-dynamic lattice networks. POLYMER 2009; 50:3857-3870. [PMID: 32287404 PMCID: PMC7111648 DOI: 10.1016/j.polymer.2009.05.055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Revised: 05/06/2009] [Accepted: 05/14/2009] [Indexed: 11/26/2022]
Abstract
Since the advent of Molecular Dynamics (MD) in biopolymers science with the study by Karplus et al. on protein dynamics, MD has become the by foremost well established, computational technique to investigate structure and function of biomolecules and their respective complexes and interactions. The analysis of the MD trajectories (MDTs) remains, however, the greatest challenge and requires a great deal of insight, experience, and effort. Here, we introduce a new class of invariants for MDTs based on the spatial distribution of Mean-Energy values ξk (L) on a 2D Euclidean space representation of the MDTs. The procedure forces one MD trajectory to fold into a 2D Cartesian coordinates system using a step-by-step procedure driven by simple rules. The ξk (L) values are invariants of a Markov matrix (1 Π), which describes the probabilities of transition between two states in the new 2D space; which is associated to a graph representation of MDTs similar to the lattice networks (LNs) of DNA and protein sequences. We also introduce a new algorithm to perform phylogenetic analysis of peptides based on MDTs instead of the sequence of the polypeptide. In a first experiment, we illustrate this algorithm for 35 peptides present on the Peptide Mass Fingerprint (PMF) of a new protein of Leishmania infantum studied in this work. We report, by the first time, 2D Electrophoresis isolation, MALDI TOF Mass Spectroscopy characterization, and MASCOT search results for this PMF. In a second experiment, we construct the LNs for 422 MDTs obtained in DNA-Drug Docking simulations of the interaction of 57 anticancer furocoumarins with a DNA oligonucleotide. We calculated the respective ξk (L) values for all these LNs and used them as inputs to train a new classifier with Accuracy = 85.44% and 84.91% in training and validation respectively. The new model can be used as scoring function to guide DNA-Drug Docking studies in drug design of new coumarins for PUVA therapy. The new phylogenetics analysis algorithms encode information different from sequence similarity and may be used to analyze MDTs obtained in Docking or modeling experiments for any classes of biopolymers. The work opens new perspective on the analysis and applications of MD in polymer sciences.
Collapse
Affiliation(s)
- Lázaro Guillermo Pérez-Montoto
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - María Auxiliadora Dea-Ayuela
- Departamento de Atención Sanitaria, Salud Pública y Sanidad Animal, Facultad CC Experimentales y de La Salud, Universidad CEU Cardenal Herrera, 46113 Moncada (Valencia), Spain
| | - Francisco J. Prado-Prado
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | | | - Florencio M. Ubeira
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humberto González-Díaz
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,Corresponding author. Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| |
Collapse
|
15
|
Skvortsova MI, Palyulin VA, Zefirov NS. Design of topological indices: computer-oriented approach. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2009; 20:357-377. [PMID: 19544196 DOI: 10.1080/10629360902949161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A novel method is suggested for constructing topological indices (TIs) of molecular graphs which models human logic. This method is described in terms of a block scheme, consisting of the mutually connected elementary blocks. In each block the simple transformations of a molecular graph are fulfilled. A variant of the transformation is selected from the list of possible variants. Every TI is obtained as a result of the sequential execution of a number of operations, corresponding to some 'walk' on the block scheme. This walk can be selected both randomly and by the investigator. The suggested method can serve as a basis for the development of the respective computer program which may be used for the automatic construction of any number of TIs of differing nature. By this process one can also obtain the TIs that are unlikely to be constructed manually, due to their complexity. The set of obtained TIs may be used for building the structure-property models. In the case of an unsatisfactory result the obtained set of TIs may be changed using the described generator of TIs. A number of examples of application of the suggested approach for the building QSAR/QSPR models is given.
Collapse
Affiliation(s)
- M I Skvortsova
- Laboratory of Mathematical Chemistry and Computer Synthesis, N.D. Zelinsky Institute of Organic Chemistry, Moscow, Russia
| | | | | |
Collapse
|
16
|
García I, Munteanu CR, Fall Y, Gómez G, Uriarte E, González-Díaz H. QSAR and complex network study of the chiral HMGR inhibitor structural diversity. Bioorg Med Chem 2009; 17:165-75. [DOI: 10.1016/j.bmc.2008.11.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2008] [Revised: 10/31/2008] [Accepted: 11/06/2008] [Indexed: 10/21/2022]
|
17
|
Jover J, Bosque R, Martinho Simões JA, Sales J. Estimation of enthalpies of formation of organometallic compounds from their molecular structures. J Organomet Chem 2008. [DOI: 10.1016/j.jorganchem.2008.01.021] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
18
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
19
|
Predicting anti-HIV-1 activity of 6-arylbenzonitriles: Computational approach using superaugmented eccentric connectivity topochemical indices. J Mol Graph Model 2008; 26:1020-9. [DOI: 10.1016/j.jmgm.2007.08.008] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2007] [Revised: 08/20/2007] [Accepted: 08/25/2007] [Indexed: 11/23/2022]
|
20
|
Dureja H, Madan AK. Superaugmented eccentric connectivity indices: new-generation highly discriminating topological descriptors for QSAR/QSPR modeling. Med Chem Res 2007. [DOI: 10.1007/s00044-007-9032-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
Zhokhova NI, Palyulin VA, Baskin II, Zefirov AN, Zefirov NS. Fragment descriptors in the QSPR method: Their use for calculating the enthalpies of vaporization of organic substances. RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY A 2007. [DOI: 10.1134/s0036024407010037] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
22
|
Yan A. Modeling of Gibbs Energy of Formation of Organic Compounds by Linear and Nonlinear Methods. J Chem Inf Model 2006; 46:2299-304. [PMID: 17125172 DOI: 10.1021/ci0600105] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Two quantitative models for the prediction of the Gibbs energy of formation (DeltaGf degrees ) of 177 organic compounds were developed. These molecules contain elements such as H, C, N, O, F, S, Cl, and Br, with the molecular weight in the range of 16.04-202.25. The molecules were represented by six selected 2D-structure descriptors. At first, the complex relationship between DeltaGf degrees and the six selected input descriptors was depicted by a two-dimensional Kohonen's self-organizing neural network (KohNN) map; on the basis of the KohNN map, the whole data set was split into a training set consisting of 130 compounds and a test set (or a validation set and a test set) including 47 compounds. Then, DeltaGf degrees was predicted using a multilinear regression (MLR) analysis and a back-propagation (BPG) neural network. For 177 organic compounds, root-mean-square deviations of 17.8 and 15.4 kcal mol-1 were achieved by MLR and the BPG neural network, respectively.
Collapse
Affiliation(s)
- Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, P.O. Box 53, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing 100029, P. R. China.
| |
Collapse
|
23
|
Cerruela García G, Luque Ruiz I, Gómez-Nieto MA, Cabrero Doncel JA, Guevara Plaza A. From Wiener index to molecules. J Chem Inf Model 2005; 45:231-8. [PMID: 15807483 DOI: 10.1021/ci049788l] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this paper we present an algorithm for the generation of molecular graphs with a given value of the Wiener index. The high number of graphs for a given value of the Wiener index is reduced thanks to the application of a set of heuristics taking into account the structural characteristics of the molecules. The selection of parameters as the interval of values for the Wiener index, the diversity and occurrence of atoms and bonds, the size and number of cycles, and the presence of structural patterns guide the processing of the heuristics generating molecular graphs with a considerable saving in computational cost. The modularity in the design of the algorithm allows it to be used as a pattern for the development of other algorithms based on different topological invariants, which allow for its use in areas of interest, say as involving combinatorial databases and screening in chemical databases.
Collapse
Affiliation(s)
- Gonzalo Cerruela García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| | | | | | | | | |
Collapse
|
24
|
Marrero-Ponce Y. Linear Indices of the “Molecular Pseudograph's Atom Adjacency Matrix”: Definition, Significance-Interpretation, and Application to QSAR Analysis of Flavone Derivatives as HIV-1 Integrase Inhibitors. ACTA ACUST UNITED AC 2004; 44:2010-26. [PMID: 15554670 DOI: 10.1021/ci049950k] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This report describes a new set of molecular descriptors of relevance to QSAR/QSPR studies and drug design, atom linear indices fk(xi). These atomic level chemical descriptors are based on the calculation of linear maps on Rn[fk(xi): Rn--> Rn] in canonical basis. In this context, the kth power of the molecular pseudograph's atom adjacency matrix [Mk(G)] denotes the matrix of fk(xi) with respect to the canonical basis. In addition, a local-fragment (atom-type) formalism was developed. The kth atom-type linear indices are calculated by summing the kth atom linear indices of all atoms of the same atom type in the molecules. Moreover, total (whole-molecule) linear indices are also proposed. This descriptor is a linear functional (linear form) on Rn. That is, the kth total linear indices is a linear map from Rn to the scalar R[ fk(x): Rn --> R]. Thus, the kth total linear indices are calculated by summing the atom linear indices of all atoms in the molecule. The features of the kth total and local linear indices are illustrated by examples of various types of molecular structures, including chain-lengthening, branching, heteroatoms-content, and multiple bonds. Additionally, the linear independence of the local linear indices to other 0D, 1D, 2D, and 3D molecular descriptors is demonstrated by using principal component analysis for 42 very heterogeneous molecules. Much redundancy and overlapping was found among total linear indices and most of the other structural indices presently in use in the QSPR/QSAR practice. On the contrary, the information carried by atom-type linear indices was strikingly different from that codified in most of the 229 0D-3D molecular descriptors used in this study. It is concluded that the local linear indices are an independent indices containing important structural information to be used in QSPR/QSAR and drug design studies. In this sense, atom, atom-type, and total linear indices were used for the prediction of pIC50 values for the cleavage process of a set of flavone derivatives inhibitors of HIV-1 integrase. Quantitative models found are significant from a statistical point of view (R of 0.965, 0.902, and 0.927, respectively) and permit a clear interpretation of the studied properties in terms of the structural features of molecules. A LOO cross-validation procedure revealed that the regression models had a fairly good predictability (q2 of 0.679, 0.543, and 0.721, respectively). The comparison with other approaches reveals good behavior of the method proposed. The approach described in this paper appears to be an excellent alternative or guides for discovery and optimization of new lead compounds.
Collapse
Affiliation(s)
- Yovani Marrero-Ponce
- Department of Pharmacy, Faculty of Chemical-Pharmacy, and Department of Drug Design, Chemical Bioactive Center, Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba.
| |
Collapse
|
25
|
Predicting anti-HIV activity of phenethylthiazolethiourea (PETT) analogs: computational approach using Wiener's topochemical index. ACTA ACUST UNITED AC 2004. [DOI: 10.1016/j.theochem.2004.01.052] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Yang C, Zhong C. Modified Connectivity Indices and Their Application to QSPR Study. ACTA ACUST UNITED AC 2003; 43:1998-2004. [PMID: 14632450 DOI: 10.1021/ci034093q] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A modified adjacency matrix was developed to delineate the chemical graph of a compound, in which the element a(ii) along the diagonal of the matrix reflects the numbers of the lone-pair electrons and pi bonds of the ith atom, and the off-diagonal element a(ij) of the matrix characterizes whether the jth non-hydrogen atom is bonded to the ith non-hydrogen atom as well as the number of hydrogen atoms bonded to the jth non-hydrogen atom. The corresponding vertex-degree matrix can distinguish the non-hydrogen atoms in the compound better than that from the original adjacency matrix. Based on the newly proposed adjacency matrix, modified molecular connectivity indices (mMCIs) were proposed as structural descriptors for organic compounds, which were applied to the QSPR studies on the boiling point temperature, molar volume and molar refraction of alkanes, alkenes and alcohols. The results show that, in most cases, the mMCIs give improved correlations than the original molecular connectivity indices (MCIs), which are particularly suitable to distinguish isomers.
Collapse
Affiliation(s)
- Chunsheng Yang
- Department of Chemical Engineering, PO Box 100, Beijing University of Chemical Technology, Beijing 100029, China
| | | |
Collapse
|
27
|
Total and Local Quadratic Indices of the “Molecular Pseudograph’s Atom Adjacency Matrix”. Application to Prediction of Caco-2 Permeability of Drugs. Int J Mol Sci 2003. [DOI: 10.3390/i4080512] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
28
|
Estrada E. Generalized Graph Matrix, Graph Geometry, Quantum Chemistry, and Optimal Description of Physicochemical Properties. J Phys Chem A 2003. [DOI: 10.1021/jp0346561] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ernesto Estrada
- Safety & Environmental Assurance Centre (SEAC), Unilever, Colworth House, Sharnbrook, Bedford MK44 1LQ, U.K
| |
Collapse
|
29
|
Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds. Molecules 2003. [PMCID: PMC6146921 DOI: 10.3390/80900687] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A novel topological approach for obtaining a family of new molecular descriptors is proposed. In this connection, a vector space E (molecular vector space), whose elements are organic molecules, is defined as a “direct sum” of different ℜi spaces. In this way we can represent molecules having a total of i atoms as elements (vectors) of the vector spaces ℜi (i=1, 2, 3,..., n; where n is number of atoms in the molecule). In these spaces the components of the vectors are atomic properties that characterize each kind of atom in particular. The total quadratic indices are based on the calculation of mathematical quadratic forms. These forms are functions of the k-th power of the molecular pseudograph’s atom adjacency matrix (M). For simplicity, canonical bases are selected as the quadratic forms’ bases. These indices were generalized to “higher analogues” as number sequences. In addition, this paper also introduces a local approach (local invariant) for molecular quadratic indices. This approach is based mainly on the use of a local matrix [Mk(G, FR)]. This local matrix is obtained from the k-th power (Mk(G)) of the atom adjacency matrix M. Mk(G, FR) includes the elements of the fragment of interest and those that are connected with it, through paths of length k. Finally, total (and local) quadratic indices have been used in QSPR studies of four series of organic compounds. The quantitative models found are significant from a statistical point of view and permit a clear interpretation of the studied properties in terms of the structural features of molecules. External prediction series and cross-validation procedures (leave-one-out and leave-group-out) assessed model predictability. The reported method has shown similar results, compared with other topological approaches. The results obtained were the following: a) Seven physical properties of 74 normal and branched alkanes (boiling points, molar volumes, molar refractions, heats of vaporization, critical temperatures, critical pressures and surface tensions) were well modeled (R>0.98, q2>0.95) by the total quadratic indices. The overall MAE of 5-fold cross-validation were of 2.11 oC, 0.53 cm3, 0.032 cm3, 0.32 KJ/mol, 5.34 oC, 0.64 atm, 0.23 dyn/cm for each property, respectively; b) boiling points of 58 alkyl alcohols also were well described by the present approach; in this sense, two QSPR models were obtained; the first one was developed using the complete set of 58 alcohols [R=0.9938, q2=0.986, s=4.006oC, overall MAE of 5-fold cross-validation=3.824 oC] and the second one was developed using 29 compounds as a training set [R=0.9979, q2=0.992, s=2.97 oC, overall MAE of 5-fold cross-validation=2.580 oC] and 29 compounds as a test set [R=0.9938, s=3.17 oC]; c) good relationships were obtained for the boiling points property (using 80 and 26 cycloalkanes in the training and test sets, respectively) using 2 and 5 total quadratic indices: [Training set: R=0.9823 (q2=0.961 and overall MAE of 5-fold cross-validation=6.429 oC) and R=0.9927 (q2=0.977 and overall MAE of 5-fold cross-validation=4.801 oC); Test set: R=0.9726 and R=0.9927] and d) the linear model developed to describe the boiling points of 70 organic compounds containing aromatic rings has shown good statistical features, with a squared correlation coefficient (R2) of 0.981 (s=7.61 oC). Internal validation procedures (q2=0.9763 and overall MAE of 5-fold cross-validation=7.34 oC) allowed the predictability and robustness of the model found to be assessed. The predictive performance of the obtained QSPR model also was tested on an extra set of 20 aromatic organic compounds (R=0.9930 and s=7.8280 oC). The results obtained are valid to establish that these new indices fulfill some of the ideal requirements proposed by Randić for a new molecular descriptor.
Collapse
|
30
|
Ivanciuc O, Ivanciuc T, Cabrol-Bass D. QSAR for dihydrofolate reductase inhibitors with molecular graph structural descriptors. ACTA ACUST UNITED AC 2002. [DOI: 10.1016/s0166-1280(01)00772-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
31
|
Ivanciuc O, Klein DJ. Computing wiener-type indices for virtual combinatorial libraries generated from heteroatom-containing building blocks. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:8-22. [PMID: 11855961 DOI: 10.1021/ci010072p] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The expensive and time-consuming process of drug lead discovery is significantly accelerated by efficiently screening molecular libraries with a high structural diversity and selecting subsets of molecules according to their similarity toward specific collections of active compounds. To characterize the molecular similarity/diversity or to quantify the drug-like character of compounds the process of screening virtual and synthetic combinatorial libraries uses various classes of structural descriptors, such as structure keys, fingerprints, graph invariants, and various topological indices computed from atomic connectivities or graph distances. In this paper we present efficient algorithms for the computation of several distance-based topological indices of a molecular graph from the distance invariants of its subgraphs. The procedures utilize vertex- and edge-weighted molecular graphs representing organic compounds containing heteroatoms and multiple bonds. These equations offer an effective way to compute for weighted molecular graphs the Wiener index, even/odd Wiener index, and resistance-distance index. The proposed algorithms are especially efficient in computing distance-based structural descriptors in combinatorial libraries without actually generating the compounds, because only distance-based indices of the building blocks are needed to generate the topological indices of any compound assembled from the building blocks.
Collapse
Affiliation(s)
- Ovidiu Ivanciuc
- Department of Marine Sciences, Texas A&M University at Galveston, Fort Crockett Campus, 5007 Avenue U, Galveston, Texas 77551, USA.
| | | |
Collapse
|