1
|
Morales N, Valdés-Muñoz E, González J, Valenzuela-Hormazábal P, Palma JM, Galarza C, Catagua-González Á, Yáñez O, Pereira A, Bustos D. Machine Learning-Driven Classification of Urease Inhibitors Leveraging Physicochemical Properties as Effective Filter Criteria. Int J Mol Sci 2024; 25:4303. [PMID: 38673888 PMCID: PMC11049951 DOI: 10.3390/ijms25084303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024] Open
Abstract
Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.
Collapse
Affiliation(s)
- Natalia Morales
- Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile; (N.M.); (J.G.)
| | - Elizabeth Valdés-Muñoz
- Doctorado en Biotecnología Traslacional, Centro de Biotecnología de los Recursos Naturales, Universidad Católica del Maule, Talca 3480094, Chile;
| | - Jaime González
- Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile; (N.M.); (J.G.)
| | - Paulina Valenzuela-Hormazábal
- Departamento de Farmacología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción 4030000, Chile;
| | - Jonathan M. Palma
- Facultad de Ingeniería, Universidad de Talca, Curicó 3344158, Chile;
| | - Christian Galarza
- Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador; (C.G.); (Á.C.-G.)
| | - Ángel Catagua-González
- Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador; (C.G.); (Á.C.-G.)
| | - Osvaldo Yáñez
- Núcleo de Investigación en Data Science, Facultad de Ingeniería y Negocios, Universidad de las Américas, Santiago 7500000, Chile;
| | - Alfredo Pereira
- Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Bellavista 7, Santiago 8420524, Chile
| | - Daniel Bustos
- Laboratorio de Bioinformática y Química Computacional, Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca 3480094, Chile
| |
Collapse
|
2
|
Rudrapal M, Kirboga KK, Abdalla M, Maji S. Explainable artificial intelligence-assisted virtual screening and bioinformatics approaches for effective bioactivity prediction of phenolic cyclooxygenase-2 (COX-2) inhibitors using PubChem molecular fingerprints. Mol Divers 2024:10.1007/s11030-023-10782-9. [PMID: 38200203 DOI: 10.1007/s11030-023-10782-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/22/2023] [Indexed: 01/12/2024]
Abstract
Cyclooxygenase-2 (COX-2) inhibitors are nonsteroidal anti-inflammatory drugs that treat inflammation, pain and fever. This study determined the interaction mechanisms of COX-2 inhibitors and the molecular properties needed to design new drug candidates. Using machine learning and explainable AI methods, the inhibition activity of 1488 molecules was modelled, and essential properties were identified. These properties included aromatic rings, nitrogen-containing functional groups and aliphatic hydrocarbons. They affected the water solubility, hydrophobicity and binding affinity of COX-2 inhibitors. The binding mode, stability and ADME properties of 16 ligands bound to the Cyclooxygenase active site of COX-2 were investigated by molecular docking, molecular dynamics simulation and MM-GBSA analysis. The results showed that ligand 339,222 was the most stable and effective COX-2 inhibitor. It inhibited prostaglandin synthesis by disrupting the protein conformation of COX-2. It had good ADME properties and high clinical potential. This study demonstrated the potential of machine learning and bioinformatics methods in discovering COX-2 inhibitors.
Collapse
Affiliation(s)
- Mithun Rudrapal
- Department of Pharmaceutical Sciences, School of Biotechnology and Pharmaceutical Sciences, Vignan's Foundation for Science, Technology & Research (Deemed to Be University), Guntur, 522213, India.
| | - Kevser Kübra Kirboga
- Informatics Institute, Istanbul Technical University, 34469, Maslak, Istanbul, Turkey.
- Bioengineering Department, BilecikSeyhEdebali University, 11230, Bilecik, Turkey.
| | - Mohnad Abdalla
- Pediatric Research Institute, Children's Hospital Affiliated to Shandong University, Jinan, 250022, Shandong, People's Republic of China
| | - Siddhartha Maji
- Department of Chemistry, Oklahoma State University, Stillwater, OK, USA
| |
Collapse
|
3
|
Regioselective C-H arylation of imidazoles employing macrocyclic palladium(II) complex of organoselenium ligand. J Organomet Chem 2021. [DOI: 10.1016/j.jorganchem.2021.121907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
4
|
Gupta P, Mohanty D. SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2. Brief Bioinform 2021; 22:6220172. [PMID: 33839740 PMCID: PMC8083326 DOI: 10.1093/bib/bbab111] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/18/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022] Open
Abstract
Small molecule modulators of protein–protein interactions (PPIs) are being pursued as novel anticancer, antiviral and antimicrobial drug candidates. We have utilized a large data set of experimentally validated PPI modulators and developed machine learning classifiers for prediction of new small molecule modulators of PPI. Our analysis reveals that using random forest (RF) classifier, general PPI Modulators independent of PPI family can be predicted with ROC-AUC higher than 0.9, when training and test sets are generated by random split. The performance of the classifier on data sets very different from those used in training has also been estimated by using different state of the art protocols for removing various types of bias in division of data into training and test sets. The family-specific PPIM predictors developed in this work for 11 clinically important PPI families also have prediction accuracies of above 90% in majority of the cases. All these ML-based predictors have been implemented in a freely available software named SMMPPI for prediction of small molecule modulators for clinically relevant PPIs like RBD:hACE2, Bromodomain_Histone, BCL2-Like_BAX/BAK, LEDGF_IN, LFA_ICAM, MDM2-Like_P53, RAS_SOS1, XIAP_Smac, WDR5_MLL1, KEAP1_NRF2 and CD4_gp120. We have identified novel chemical scaffolds as inhibitors for RBD_hACE PPI involved in host cell entry of SARS-CoV-2. Docking studies for some of the compounds reveal that they can inhibit RBD_hACE2 interaction by high affinity binding to interaction hotspots on RBD. Some of these new scaffolds have also been found in SARS-CoV-2 viral growth inhibitors reported recently; however, it is not known if these molecules inhibit the entry phase.
Collapse
Affiliation(s)
| | - Debasisa Mohanty
- Bioinformatics & Computational Biology research group at NII, New Delhi 110067, India
| |
Collapse
|
5
|
Palladium complexes of chalcogenoethanamine (S/Se) bidentate ligands: Applications in catalytic arylation of C H and O H bonds. Polyhedron 2020. [DOI: 10.1016/j.poly.2020.114597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
6
|
Keyvanpour MR, Shirzad MB. An Analysis of QSAR Research Based on Machine Learning Concepts. Curr Drug Discov Technol 2020; 18:17-30. [PMID: 32178612 DOI: 10.2174/1570163817666200316104404] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/22/2019] [Accepted: 10/28/2019] [Indexed: 11/22/2022]
Abstract
Quantitative Structure-Activity Relationship (QSAR) is a popular approach developed to correlate chemical molecules with their biological activities based on their chemical structures. Machine learning techniques have proved to be promising solutions to QSAR modeling. Due to the significant role of machine learning strategies in QSAR modeling, this area of research has attracted much attention from researchers. A considerable amount of literature has been published on machine learning based QSAR modeling methodologies whilst this domain still suffers from lack of a recent and comprehensive analysis of these algorithms. This study systematically reviews the application of machine learning algorithms in QSAR, aiming to provide an analytical framework. For this purpose, we present a framework called 'ML-QSAR'. This framework has been designed for future research to: a) facilitate the selection of proper strategies among existing algorithms according to the application area requirements, b) help to develop and ameliorate current methods and c) providing a platform to study existing methodologies comparatively. In ML-QSAR, first a structured categorization is depicted which studied the QSAR modeling research based on machine models. Then several criteria are introduced in order to assess the models. Finally, inspired by aforementioned criteria the qualitative analysis is carried out.
Collapse
Affiliation(s)
| | - Mehrnoush Barani Shirzad
- Data Mining Research Laboratory, Department of Computer Engineering, Alzahra University, Tehran, Iran
| |
Collapse
|
7
|
Bhatt R, Bhuvanesh N, Sharma KN, Joshi H. Palladium Complexes of Thio/Seleno-Ether Containing N
-Heterocyclic Carbenes: Efficient and Reusable Catalyst for Regioselective C-H Bond Arylation. Eur J Inorg Chem 2020. [DOI: 10.1002/ejic.201901259] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Ramprasad Bhatt
- Department of Chemistry; Birla Institute of Technology and Science; Pilani Campus 333031 Pilani India
| | - Nattamai Bhuvanesh
- Department of Chemistry; Texas A&M University; PO Box 30012 College Station 77842-3012 Texas USA
| | - Kamal Nayan Sharma
- Department of Chemistry; Malaviya National Institute of Technology Jaipur; J.L.N. Marg 302017 Jaipur Rajasthan India
- Department of Chemistry; ASAS, Amity University Haryana (AUH); Manesar; 122413 Gurgaon India
| | - Hemant Joshi
- Department of Chemistry; School of Chemical Sciences and Pharmacy; Central University of Rajasthan; NH-8, Bandarsindri 305817 Ajmer Rajasthan India
| |
Collapse
|
8
|
Qin Z, Xi Y, Zhang S, Tu G, Yan A. Classification of Cyclooxygenase-2 Inhibitors Using Support Vector Machine and Random Forest Methods. J Chem Inf Model 2019; 59:1988-2008. [PMID: 30762371 DOI: 10.1021/acs.jcim.8b00876] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.
Collapse
Affiliation(s)
- Zijian Qin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering , Beijing University of Chemical Technology , P.O. Box 53, 15 BeiSanHuan East Road , Beijing 100029 , P. R. China
| | - Yao Xi
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering , Beijing University of Chemical Technology , P.O. Box 53, 15 BeiSanHuan East Road , Beijing 100029 , P. R. China
| | - Shengde Zhang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering , Beijing University of Chemical Technology , P.O. Box 53, 15 BeiSanHuan East Road , Beijing 100029 , P. R. China
| | - Guiping Tu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering , Beijing University of Chemical Technology , P.O. Box 53, 15 BeiSanHuan East Road , Beijing 100029 , P. R. China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering , Beijing University of Chemical Technology , P.O. Box 53, 15 BeiSanHuan East Road , Beijing 100029 , P. R. China
| |
Collapse
|
9
|
Bhaskar R, Sharma AK, Singh AK. Palladium(II) Complexes of N-Heterocyclic Carbene Amidates Derived from Chalcogenated Acetamide-Functionalized 1H-Benzimidazolium Salts: Recyclable Catalyst for Regioselective Arylation of Imidazoles under Aerobic Conditions. Organometallics 2018. [DOI: 10.1021/acs.organomet.8b00246] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Renu Bhaskar
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Alpesh K. Sharma
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Ajai K. Singh
- Department of Chemistry, Indian Institute of Technology Delhi, New Delhi 110016, India
| |
Collapse
|
10
|
Pérez DJ, Sarabia O, Villanueva-García M, Pineda-Urbina K, Ramos-Organillo Á, Gonzalez-Gonzalez J, Gómez-Sandoval Z, Razo-Hernández RS. In silico receptor-based drug design of X,Y-benzenesulfonamide derivatives as selective COX-2 inhibitors. CR CHIM 2017. [DOI: 10.1016/j.crci.2016.05.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
11
|
Quantitative structure activity relationship and docking studies of imidazole-based derivatives as P-glycoprotein inhibitors. Med Chem Res 2014. [DOI: 10.1007/s00044-014-1029-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
12
|
Models for anti-inflammatory activity of 8-substituted-4-anilino-6-aminoquinoline-3-carbonitriles. Med Chem Res 2012. [DOI: 10.1007/s00044-011-9613-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Cheng Z, Zhang Y, Zhou C. QSAR Models for Phosphoramidate Prodrugs of 2′-Methylcytidine as Inhibitors of Hepatitis C Virus Based on PSO Boosting. Chem Biol Drug Des 2011; 78:948-59. [DOI: 10.1111/j.1747-0285.2011.01236.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Models for anti-tumor activity of bisphosphonates using refined topochemical descriptors. THE SCIENCE OF NATURE - NATURWISSENSCHAFTEN 2011; 98:871-87. [PMID: 21892780 DOI: 10.1007/s00114-011-0839-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Revised: 08/16/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
An in silico approach comprising of decision tree (DT), random forest (RF) and moving average analysis (MAA) was successfully employed for development of models for prediction of anti-tumor activity of bisphosphonates. A dataset consisting of 65 analogues of both nitrogen-containing and non-nitrogen-containing bisphosphonates was selected for the present study. Four refinements of eccentric distance sum topochemical index termed as augmented eccentric distance sum topochemical indices 1-4 [formula: see text] have been proposed so as to significantly augment discriminating power. Proposed topological indices (TIs) along with the exiting TIs (>1,400) were subsequently utilized for development of models for prediction of anti-tumor activity of bisphosphonates. A total of 43 descriptors of diverse nature, from a large pool of molecular descriptors, calculated through E-Dragon software (version 1.0) and an in-house computer program were selected for development of suitable models by employing DT, RF and MAA. DT identified two TIs as most important and classified the analogues of the dataset with an accuracy of 97% in training set and 90.7% in tenfold cross-validated set. Random forest correctly classified the analogues with an accuracy of 89.2%. Four independent models developed through MAA predicted the activity of analogues of the dataset with an accuracy of 87.6% to 89%. The statistical significance of proposed models was assessed through intercorrelation analysis, specificity, sensitivity and Matthew's correlation coefficient. The proposed models offer a vast potential for providing lead structures for development of potent anti-tumor agents for treatment of cancer that has spread to the bone.
Collapse
|
15
|
Cheng Z, Zhang Y, Fu W. Predictive QSAR models of 3-acylamino-2-aminopropionic acid derivatives as partial agonists of the glycine site on the NMDA receptor. Med Chem Res 2010. [DOI: 10.1007/s00044-010-9464-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
16
|
Goyal RK, Dureja H, Singh G, Madan AK. Models for antitubercular activity of 5â-O-[(N-Acyl)sulfamoyl]adenosines. Sci Pharm 2010; 78:791-820. [PMID: 21179317 PMCID: PMC3007618 DOI: 10.3797/scipharm.1006-03] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2010] [Accepted: 08/12/2010] [Indexed: 11/26/2022] Open
Abstract
The relationship between topological indices and antitubercular activity of 5â-O-[(N-Acyl)sulfamoyl]adenosines has been investigated. A data set consisting of 31 analogues of 5â-O-[(N-Acyl)sulfamoyl]adenosines was selected for the present study. The values of numerous topostructural and topochemical indices for each of 31 differently substituted analogues of the data set were computed using an in-house computer program. Resulting data was analyzed and suitable models were developed through decision tree, random forest and moving average analysis (MAA). The goodness of the models was assessed by calculating overall accuracy of prediction, sensitivity, specificity and Mathews correlation coefficient. Pendentic eccentricity index â a novel highly discriminating, non-correlating pendenticity based topochemical descriptor â was also conceptualized and successfully utilized for the development of a model for antitubercular activity of 5â-O-[(N-Acyl)sulfamoyl]adenosines. The proposed index exhibited not only high sensitivity towards both the presence as well as relative position(s) of pendent/heteroatom(s) but also led to significant reduction in degeneracy. Random forest correctly classified the analogues into active and inactive with an accuracy of 67.74%. A decision tree was also employed for determining the importance of molecular descriptors. The decision tree learned the information from the input data with an accuracy of 100% and correctly predicted the cross-validated (10 fold) data with accuracy up to 77.4%. Statistical significance of proposed models was also investigated using intercorrelation analysis. Accuracy of prediction of proposed MAA models ranged from 90.4 to 91.6%.
Collapse
Affiliation(s)
- Rakesh K Goyal
- Faculty of Pharmaceutical Sciences, Pt. B.D. Sharma University of Health Sciences, Rohtak,124 001, India.
| | | | | | | |
Collapse
|
17
|
Classification of 5-HT(1A) receptor ligands on the basis of their binding affinities by using PSO-Adaboost-SVM. Int J Mol Sci 2009; 10:3316-3337. [PMID: 20111683 PMCID: PMC2812826 DOI: 10.3390/ijms10083316] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2009] [Revised: 07/20/2009] [Accepted: 07/22/2009] [Indexed: 12/31/2022] Open
Abstract
In the present work, the support vector machine (SVM) and Adaboost-SVM have been used to develop a classification model as a potential screening mechanism for a novel series of 5-HT1A selective ligands. Each compound is represented by calculated structural descriptors that encode topological features. The particle swarm optimization (PSO) and the stepwise multiple linear regression (Stepwise-MLR) methods have been used to search descriptor space and select the descriptors which are responsible for the inhibitory activity of these compounds. The model containing seven descriptors found by Adaboost-SVM, has showed better predictive capability than the other models. The total accuracy in prediction for the training and test set is 100.0% and 95.0% for PSO-Adaboost-SVM, 99.1% and 92.5% for PSO-SVM, 99.1% and 82.5% for Stepwise-MLR-Adaboost-SVM, 99.1% and 77.5% for Stepwise-MLR-SVM, respectively. The results indicate that Adaboost-SVM can be used as a useful modeling tool for QSAR studies.
Collapse
|
18
|
Li S, Xi L, Wang C, Li J, Lei B, Liu H, Yao X. A novel method for protein-ligand binding affinity prediction and the related descriptors exploration. J Comput Chem 2009; 30:900-9. [DOI: 10.1002/jcc.21078] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
19
|
Clark RD. A ligand's-eye view of protein binding. J Comput Aided Mol Des 2008; 22:507-21. [PMID: 18217215 DOI: 10.1007/s10822-008-9177-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 01/09/2008] [Indexed: 11/24/2022]
Abstract
Docking tools created for structure-based design and virtual screening have also been used to automate ligand alignment for comparative molecular field analysis (CoMFA). Models based on such alignments have been compared with those obtained based solely on shared ligand substructures, but such comparisons have generally failed to distinguish between conformational specification (alignment in the internal coordinate space) and embedding in a shared external frame of reference (Cartesian alignment). Here, large sets of inhibitors were docked into two cyclooxygenase and two reverse transcriptase crystal structures, and the poses generated were evaluated in terms of the CoMFA models they produced. Realigning the conformers obtained by docking by rigid-body rotation and translation to overlay their common substructures improved model statistics and interpretability, provided the protein structure used for docking was reasonably appropriate to the ligands being considered.
Collapse
Affiliation(s)
- Robert D Clark
- Tripos Informatics Research Center, 1699 South Hanley Road, Saint Louis, MO, 63144, USA.
| |
Collapse
|
20
|
Li H, Yap CW, Ung CY, Xue Y, Li ZR, Han LY, Lin HH, Chen YZ. Machine learning approaches for predicting compounds that interact with therapeutic and ADMET related proteins. J Pharm Sci 2007; 96:2838-60. [PMID: 17786989 DOI: 10.1002/jps.20985] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational methods for predicting compounds of specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) property are useful for facilitating drug discovery and evaluation. Recently, machine learning methods such as neural networks and support vector machines have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic and ADMET property. These methods are particularly useful for compounds of diverse structures to complement QSAR methods, and for cases of unavailable receptor 3D structure to complement structure-based methods. A number of studies have demonstrated the potential of these methods for predicting such compounds as substrates of P-glycoprotein and cytochrome P450 CYP isoenzymes, inhibitors of protein kinases and CYP isoenzymes, and agonists of serotonin receptor and estrogen receptor. This article is intended to review the strategies, current progresses and underlying difficulties in using machine learning methods for predicting these protein binders and as potential virtual screening tools. Algorithms for proper representation of the structural and physicochemical properties of compounds are also evaluated.
Collapse
Affiliation(s)
- H Li
- Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Silakari P, Shrivastava SD, Silakari G, Kohli DV, Rambabu G, Srivastava S, Shrivastava SK, Silakari O. QSAR analysis of 1,3-diaryl-4,5,6,7-tetrahydro-2H-isoindole derivatives as selective COX-2 inhibitors. Eur J Med Chem 2007; 43:1559-69. [PMID: 18023931 DOI: 10.1016/j.ejmech.2007.09.028] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 09/15/2007] [Accepted: 09/27/2007] [Indexed: 10/22/2022]
Abstract
Quantitative structure-activity relationship (QSAR) analysis was performed on a series of 1,3-diaryl-4,5,6,7-tetrahydro-2H-isoindole for their cyclooxygenase-2 (COX-2) inhibition. QSAR investigations were based on Hansch's extra thermodynamic multi-parameter approach and receptor surface analysis (RSA). QSAR investigations reveal that steric and electrostatic interactions are primarily responsible for COX-2 enzyme-ligand interaction. QSAR model derived from Hansch analysis demonstrated that COX-2 inhibitory activity is correlated with sum of atomic polarizability (Apol), number of hydrogen-bond donor groups (HBD), energy of the highest occupied molecular orbital (HOMO), desolvation free energy for water (F(H(2)O)) and fraction of areas of molecular shadow in the XY and ZX planes over area of enclosing rectangle (Sxyf and Sxzf) with r ranges 0.870-0.904. The best model was obtained from RSA model having r = 0.940 with good predictive ability (predicted compounds in training set and test set within +/- 1.0 unit of pIC(50)) and can be used in designing better selective COX-2 inhibitors among the congeners in future.
Collapse
Affiliation(s)
- Pratigya Silakari
- Department of Chemistry, Dr HS Gour University, Madhya Pradesh, India
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Liu H, Papa E, Walker JD, Gramatica P. In silico screening of estrogen-like chemicals based on different nonlinear classification models. J Mol Graph Model 2007; 26:135-44. [PMID: 17293141 DOI: 10.1016/j.jmgm.2007.01.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2006] [Revised: 01/10/2007] [Accepted: 01/12/2007] [Indexed: 01/28/2023]
Abstract
Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that are adversely affecting human and wildlife health through a variety of mechanisms. There is a great need for an effective means of rapidly assessing endocrine-disrupting activity, especially estrogen-simulating activity, because of the large number of such chemicals in the environment. In this study, quantitative structure activity relationship (QSAR) models were developed to quickly and effectively identify possible estrogen-like chemicals based on 232 structurally-diverse chemicals (training set) by using several nonlinear classification methodologies (least-square support vector machine (LS-SVM), counter-propagation artificial neural network (CP-ANN), and k nearest neighbour (kNN)) based on molecular structural descriptors. The models were externally validated by 87 chemicals (prediction set) not included in the training set. All three methods can give satisfactory prediction results both for training and prediction sets, and the most accurate model was obtained by the LS-SVM approach through the comparison of performance. In addition, our model was also applied to about 58,000 discrete organic chemicals; about 76% were predicted not to bind to Estrogen Receptor. The obtained results indicate that the proposed QSAR models are robust, widely applicable and could provide a feasible and practical tool for the rapid screening of potential estrogens.
Collapse
Affiliation(s)
- Huanxiang Liu
- Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy
| | | | | | | |
Collapse
|
23
|
Pugazhenth D, Rajagopala S. Machine Learning Technique Approaches in Drug Discovery, Design and Development. ACTA ACUST UNITED AC 2007. [DOI: 10.3923/itj.2007.718.724] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
24
|
Li S, Yao X, Liu H, Li J, Fan B. Prediction of T-cell epitopes based on least squares support vector machines and amino acid properties. Anal Chim Acta 2007; 584:37-42. [PMID: 17386582 DOI: 10.1016/j.aca.2006.11.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2006] [Revised: 11/07/2006] [Accepted: 11/08/2006] [Indexed: 10/23/2022]
Abstract
T-lymphocyte (T-cell) is a very important component in human immune system. It possesses a receptor (TCR) that is specific for the foreign epitopes which are in a form of short peptides bound to the major histocompatibility complex (MHC). When T-cell receives the message about the peptides bound to MHC, it makes the immune system active and results in the disposal of the immunogen. The antigenic determinants recognized and bound by the T-cell receptor is known as T-cell epitope. The accurate prediction of T-cell epitopes is crucial for vaccine development and clinical immunology. For the first time we developed new models using least squares support vector machine (LSSVM) and amino acid properties for T-cell epitopes prediction. A dataset including 203 short peptides (167 non-epitopes and 36 epitopes) was used as the input dataset and it was randomly divided into a training set and a test set. The models based on LSSVM and amino acid properties were evaluated using leave-one-out cross-validation method and the predictive ability of the test set, and obtained the results of 0.9875 and 0.9734 under the ROC curves, respectively. This result is more satisfactory than that were reported before. Especially, the accuracy of true positive gets a marked enhancement.
Collapse
Affiliation(s)
- Shuyan Li
- Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | | | | | | | | |
Collapse
|
25
|
Maldonado AG, Doucet JP, Petitjean M, Fan BT. Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 2006; 10:39-79. [PMID: 16404528 DOI: 10.1007/s11030-006-8697-1] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Accepted: 06/14/2005] [Indexed: 01/04/2023]
Abstract
This review is dedicated to a survey on molecular similarity and diversity. Key findings reported in recent investigations are selectively highlighted and summarized. Even if this overview is mainly centered in chemoinformatics, applications in other areas (pharmaceutical and medical chemistry, combinatorial chemistry, chemical databases management, etc.) are also introduced. The approaches used to define and describe the concepts of molecular similarity and diversity in the context of chemoinformatics are discussed in the first part of this review. We introduce, in the second and third parts, the descriptions and analyses of different methods and techniques. Finally, current applications and problems are enumerated and discussed in the last part.
Collapse
Affiliation(s)
- Ana G Maldonado
- ITODYS, Université Paris 7--Denis Diderot, CNRS UMR-7086, 1 rue Guy de la Brosse, 75005, Paris, France
| | | | | | | |
Collapse
|
26
|
Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. ADVANCES IN INTELLIGENT AND SOFT COMPUTING 2006. [DOI: 10.1007/978-3-540-36266-1_10] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
27
|
Liu HX, Yao XJ, Zhang RS, Liu MC, Hu ZD, Fan BT. Prediction of the tissue/blood partition coefficients of organic compounds based on the molecular structure using least-squares support vector machines. J Comput Aided Mol Des 2005; 19:499-508. [PMID: 16317501 DOI: 10.1007/s10822-005-9003-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2005] [Accepted: 07/06/2005] [Indexed: 11/29/2022]
Abstract
The accurate nonlinear model for predicting the tissue/blood partition coefficients (PC) of organic compounds in different tissues was firstly developed based on least-squares support vector machines (LS-SVM), as a novel machine learning technique, by using the compounds' molecular descriptors calculated from the structure alone and the composition features of tissues. The heuristic method (HM) was used to select the appropriate molecular descriptors and build the linear model. The prediction result of the LS-SVM model is much better than that obtained by HM method and the prediction values of tissue/blood partition coefficients based on the LS-SVM model are in good agreement with the experimental values, which proved that nonlinear model can simulate the relationship between the structural descriptors, the tissue composition and the tissue/blood partition coefficients more accurately as well as LS-SVM was a powerful and promising tool in the prediction of the tissue/blood partition behaviour of compounds. Furthermore, this paper provided a new and effective method for predicting the tissue/blood partition behaviour of the compounds in the different tissues from their structures and gave some insight into structural features related to the partition process of the organic compounds in different tissues.
Collapse
Affiliation(s)
- H X Liu
- Department of Chemistry, Lanzhou University, 730000, Lanzhou, China
| | | | | | | | | | | |
Collapse
|
28
|
Liu H, Yao X, Zhang R, Liu M, Hu Z, Fan B. Accurate Quantitative Structure−Property Relationship Model To Predict the Solubility of C60 in Various Solvents Based on a Novel Approach Using a Least-Squares Support Vector Machine. J Phys Chem B 2005; 109:20565-71. [PMID: 16853662 DOI: 10.1021/jp052223n] [Citation(s) in RCA: 108] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A least-squares support vector machine (LSSVM) was used for the first time as a novel machine-learning technique for the prediction of the solubility of C60 in a large number of diverse solvents using calculated molecular descriptors from the molecular structure alone and on the basis of the software CODESSA as inputs. The heuristic method of CODESSA was used to select the correlated descriptors and build the linear model. Both the linear and the nonlinear models can give very satisfactory prediction results: the square of the correlation coefficient R(2) was 0.892 and 0.903, and the root-mean-square error was 0.126 and 0.116, respectively, for the whole data set. The prediction result of the LSSVM model is better than that obtained by the heuristic method and the reference, which proved LSSVM was a useful tool in the prediction of the solubility of C60. In addition, this paper provided a new and effective method for predicting the solubility of C60 from its structures and gave some insight into the structural features related to the solubility of C60 in different solvents.
Collapse
Affiliation(s)
- Huanxiang Liu
- Department of Chemistry, Lanzhou University, Lanzhou 730000, People's Republic of China
| | | | | | | | | | | |
Collapse
|
29
|
Li H, Yap CW, Ung CY, Xue Y, Cao ZW, Chen YZ. Effect of Selection of Molecular Descriptors on the Prediction of Blood−Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods. J Chem Inf Model 2005; 45:1376-84. [PMID: 16180914 DOI: 10.1021/ci050135u] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The ability or inability of a drug to penetrate into the brain is a key consideration in drug design. Drugs for treating central nervous system (CNS) disorders need to be able to penetrate the blood-brain barrier (BBB). BBB nonpenetration is desirable for non-CNS-targeting drugs to minimize potential CNS-related side effects. Computational methods have been employed for the prediction of BBB-penetrating (BBB+) and -nonpenetrating (BBB-) agents at impressive accuracies of 75-92% and 60-80%, respectively. However, the majority of these studies give a substantially lower BBB- accuracy, and thus overall accuracy, than the BBB+ accuracy. This work examined whether proper selection of molecular descriptors can improve both the BBB- and the overall accuracies of statistical learning methods. The methods tested include logistic regression, linear discriminate analysis, k nearest neighbor, C4.5 decision tree, probabilistic neural network, and support vector machine. Molecular descriptors were selected by using a feature selection method, recursive feature elimination (RFE). Results by using 415 BBB+ and BBB- agents show that RFE substantially improves both the BBB- and the overall accuracy for all of the methods studied. This suggests that statistical learning methods combined with proper feature selection is potentially useful for facilitating a more balanced and improved prediction of BBB+ and BBB- agents.
Collapse
Affiliation(s)
- Hu Li
- Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, PR China
| | | | | | | | | | | |
Collapse
|
30
|
Park H, Lee S. Free energy perturbation approach to the critical assessment of selective cyclooxygenase-2 inhibitors. J Comput Aided Mol Des 2005; 19:17-31. [PMID: 16059664 DOI: 10.1007/s10822-005-0098-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2004] [Accepted: 12/29/2004] [Indexed: 11/24/2022]
Abstract
The discovery of selective cyclooxygenase-2 (COX-2) inhibitors represents a major achievement of the efforts over the past few decades to develop therapeutic treatments for inflammation. To gain insights into designing new COX-2-selective inhibitors, we address the energetic and structural basis for the selective inhibition of COX isozymes by means of a combined computational protocol involving docking experiment, force field design for the heme prothetic group, and free energy perturbation (FEP) simulation. We consider both COX-2- and COX-1-selective inhibitors taking the V523I mutant of COX-2 to be a relevant structural model for COX-1 as confirmed by a variety of experimental and theoretical evidences. For all COX-2-selective inhibitors under consideration, we find that free energies of binding become less favorable as the receptor changes from COX-2 to COX-1, due to the weakening and/or loss of hydrogen bond and hydrophobic interactions that stabilize the inhibitors in the COX-2 active site. On the other hand, COX-1-selective oxicam inhibitors gain extra stabilization energy with the change of residue 523 from valine to isoleucine because of the formations of new hydrogen bonds in the enzyme-inhibitor complexes. The utility of the combined computational approach, as a valuable tool for in silico screening of COX-2-selective inhibitors, is further exemplified by identifying the physicochemical origins of the enantiospecific selective inhibition of COX-2 by alpha-substituted indomethacin ethanolamide inhibitors.
Collapse
Affiliation(s)
- Hwangseo Park
- School of Chemistry and Molecular Engineering, and Center for Molecular Catalysis, Seoul National University, Seoul 151-747, South Korea.
| | | |
Collapse
|