1
|
Yu T, Nantasenamat C, Anuwongcharoen N, Piacham T. Machine Learning Approaches to Investigate the Structure-Activity Relationship of Angiotensin-Converting Enzyme Inhibitors. ACS OMEGA 2023; 8:43500-43510. [PMID: 38027387 PMCID: PMC10666249 DOI: 10.1021/acsomega.3c03225] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/25/2023] [Accepted: 10/31/2023] [Indexed: 12/01/2023]
Abstract
Angiotensin-converting enzyme inhibitors (ACEIs) play a crucial role in treating conditions such as hypertension, heart failure, and kidney diseases. Nevertheless, the ACEIs currently available on the market are linked to a variety of adverse effects including renal insufficiency, which restricts their usage. There is thus an urgent need to optimize the currently available ACEIs. This study represents a structure-activity relationship investigation of ACEIs, employing machine learning to analyze data sets sourced from the ChEMBL database. Exploratory data analysis was performed to visualize the physicochemical properties of compounds by investigating the distributions, patterns, and statistical significance among the different bioactivity groups. Further scaffold analysis has identified 9 representative Murcko scaffolds with frequencies ≥10. Scaffold diversity has revealed that active ACEIs had more scaffold diversity than their intermediate and inactive counterparts, thereby indicating the significance of performing lead optimization on scaffolds of active ACEIs. Scaffolds 1, 3, 6, and 8 are unfavorable in comparison with scaffolds 2, 3, 5, 7, and 9. QSAR investigation of compiled data sets consisting of 549 compounds led to the selection of Mordred descriptor and Random Forest algorithm as the best model, which afforded robust model performance (accuracy: 0.981, 0.77, and 0.745; MCC: 0.972, 0.658, and 0.617 for the training set, 10-fold cross-validation set, and testing set, respectively). To enhance the model's robustness and predictability, we reduced the chemical diversity of the input compounds by using the 9 most prevalent Murcko scaffold-matched compounds (comprising a total of 168) followed by a subsequent QSAR model investigation using Mordred descriptor and extremely gradient boost algorithm (accuracy: 0.973, 0.849, and 0.823; MCC: 0.959, 0.786, and 0.742 for the training set, 10-fold cross-validation set, and testing set, respectively). Further illustration of the structure-activity relationship using SALI plots has enabled the identification of clusters of compounds that create activity cliffs. These findings, as presented in this study, contribute to the advancement of drug discovery and the optimization of ACEIs.
Collapse
Affiliation(s)
- Tianshi Yu
- Center
of Data Mining and Biomedical informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Chanin Nantasenamat
- Streamlit
Open Source, Snowflake Inc., San Mateo, California 94402, United States
| | - Nuttapat Anuwongcharoen
- Center
of Data Mining and Biomedical informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Theeraphon Piacham
- Department
of Clinical Microbiology and Applied Technology, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
2
|
Yu T, Nantasenamat C, Kachenton S, Anuwongcharoen N, Piacham T. Cheminformatic Analysis and Machine Learning Modeling to Investigate Androgen Receptor Antagonists to Combat Prostate Cancer. ACS OMEGA 2023; 8:6729-6742. [PMID: 36844574 PMCID: PMC9948163 DOI: 10.1021/acsomega.2c07346] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
Prostate cancer (PCa) is a major leading cause of mortality of cancer among males. There have been numerous studies to develop antagonists against androgen receptor (AR), a crucial therapeutic target for PCa. This study is a systematic cheminformatic analysis and machine learning modeling to study the chemical space, scaffolds, structure-activity relationship, and landscape of human AR antagonists. There are 1678 molecules as final data sets. Chemical space visualization by physicochemical property visualization has demonstrated that molecules from the potent/active class generally have a mildly smaller molecular weight (MW), octanol-water partition coefficient (log P), number of hydrogen-bond acceptors (nHA), number of rotatable bonds (nRot), and topological polar surface area (TPSA) than molecules from intermediate/inactive class. The chemical space visualization in the principal component analysis (PCA) plot shows significant overlapping distributions between potent/active class molecules and intermediate/inactive class molecules; potent/active class molecules are intensively distributed, while intermediate/inactive class molecules are widely and sparsely distributed. Murcko scaffold analysis has shown low scaffold diversity in general, and scaffold diversity of potent/active class molecules is even lower than intermediate/inactive class molecules, indicating the necessity for developing molecules with novel scaffolds. Furthermore, scaffold visualization has identified 16 representative Murcko scaffolds. Among them, scaffolds 1, 2, 3, 4, 7, 8, 10, 11, 15, and 16 are highly favorable scaffolds due to their high scaffold enrichment factor values. Based on scaffold analysis, their local structure-activity relationships (SARs) were investigated and summarized. In addition, the global SAR landscape was explored by quantitative structure-activity relationship (QSAR) modelings and structure-activity landscape visualization. A QSAR classification model incorporating all of the 1678 molecules stands out as the best model from a total of 12 candidate models for AR antagonists (built on PubChem fingerprint, extra trees algorithm, accuracy for training set: 0.935, 10-fold cross-validation set: 0.735 and test set: 0.756). Deeper insights into the structure-activity landscape highlighted a total of seven significant activity cliff (AC) generators (ChEMBL molecule IDs: 160257, 418198, 4082265, 348918, 390728, 4080698, and 6530), which provide valuable SAR information for medicinal chemistry. The findings in this study provide new insights and guidelines for hit identification and lead optimization for the development of novel AR antagonists.
Collapse
Affiliation(s)
- Tianshi Yu
- Center
of Data Mining and Biomedical informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Chanin Nantasenamat
- Streamlit
Open Source, Snowflake Inc., San Mateo, California 94402, United States
| | - Supicha Kachenton
- Department
of Clinical Microbiology and Applied Technology, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Center
of Data Mining and Biomedical informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Theeraphon Piacham
- Department
of Clinical Microbiology and Applied Technology, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
3
|
Spiegel J, Senderowitz H. Towards an Enrichment Optimization Algorithm (EOA)-based Target Specific Docking Functions for Virtual Screening. Mol Inform 2022; 41:e2200034. [PMID: 35790469 PMCID: PMC9786651 DOI: 10.1002/minf.202200034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/05/2022] [Indexed: 12/30/2022]
Abstract
Docking-based virtual screening (VS) is a common starting point in many drug discovery projects. While ligand-based approaches may sometimes provide better results, the advantage of docking lies in its ability to provide reliable ligand binding modes and approximated binding free energies, two factors that are important for hit selection and optimization. Most docking programs were developed to be as general as possible and consequently their performances on specific targets may be sub-optimal. With this in mind, in this work we present a method for the development of target-specific scoring functions using our recently reported Enrichment Optimization Algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations by optimizing an enrichment-like metric. Since EOA requires target-specific active and inactive (or decoy) compounds, we retrieved such data for six targets from the DUD-E database, and used them to re-derive the weights associated with the components that make up GOLD's ChemPLP scoring function yielding target-specific, modified functions. We then used the original ChemPLP function in small-scale VS experiments on the six targets and subsequently rescored the resulting poses with the modified functions. In addition, we used the modified functions for compounds re-docking. We found that in many although not all cases, either rescoring the original ChemPLP poses or repeating the entire docking process with the modified functions, yielded better results in terms of AUC and EF1% , two metrics, common for the evaluation of VS performances. While work on additional datasets and docking tools is clearly required, we propose that the results obtained thus far hint to the potential benefits in using EOA-based optimization for the derivation of target-specific functions in the context of virtual screening. To this end, we discuss the downsides of the methods and how it could be improved.
Collapse
Affiliation(s)
- Jacob Spiegel
- Department of ChemistryBar-Ilan UniversityRamat-Gan5290002Israel
| | | |
Collapse
|
4
|
Spiegel J, Senderowitz H. A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening. Int J Mol Sci 2021; 23:43. [PMID: 35008467 PMCID: PMC8744642 DOI: 10.3390/ijms23010043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/18/2021] [Accepted: 12/19/2021] [Indexed: 12/30/2022] Open
Abstract
Virtual screening (VS) is a well-established method in the initial stages of many drug and material design projects. VS is typically performed using structure-based approaches such as molecular docking, or various ligand-based approaches. Most docking tools were designed to be as global as possible, and consequently only require knowledge on the 3D structure of the biotarget. In contrast, many ligand-based approaches (e.g., 3D-QSAR and pharmacophore) require prior development of project-specific predictive models. Depending on the type of model (e.g., classification or regression), predictive ability is typically evaluated using metrics of performance on either the training set (e.g.,QCV2) or the test set (e.g., specificity, selectivity or QF1/F2/F32). However, none of these metrics were developed with VS in mind, and consequently, their ability to reliably assess the performances of a model in the context of VS is at best limited. With this in mind we have recently reported the development of the enrichment optimization algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations for VS by optimizing an enrichment-based metric in the space of the descriptors. Here we present an improved version of the algorithm which better handles active compounds and which also takes into account information on inactive (either known inactive or decoy) compounds. We compared the improved EOA in small-scale VS experiments with three common docking tools, namely, Glide-SP, GOLD and AutoDock Vina, employing five molecular targets (acetylcholinesterase, human immunodeficiency virus type 1 protease, MAP kinase p38 alpha, urokinase-type plasminogen activator, and trypsin I). We found that EOA consistently outperformed all docking tools in terms of the area under the ROC curve (AUC) and EF1% metrics that measured the overall and initial success of the VS process, respectively. This was the case when the docking metrics were calculated based on a consensus approach and when they were calculated based on two different sets of single crystal structures. Finally, we propose that EOA could be combined with molecular docking to derive target-specific scoring functions.
Collapse
Affiliation(s)
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, Ramat-Gan 5290002, Israel;
| |
Collapse
|
5
|
Deep Learning Approach for Discovery of In Silico Drugs for Combating COVID-19. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:6668985. [PMID: 34326978 PMCID: PMC8302400 DOI: 10.1155/2021/6668985] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 07/08/2021] [Indexed: 12/26/2022]
Abstract
Early diagnosis of pandemic diseases such as COVID-19 can prove beneficial in dealing with difficult situations and helping radiologists and other experts manage staffing more effectively. The application of deep learning techniques for genetics, microscopy, and drug discovery has created a global impact. It can enhance and speed up the process of medical research and development of vaccines, which is required for pandemics such as COVID-19. However, current drugs such as remdesivir and clinical trials of other chemical compounds have not shown many impressive results. Therefore, it can take more time to provide effective treatment or drugs. In this paper, a deep learning approach based on logistic regression, SVM, Random Forest, and QSAR modeling is suggested. QSAR modeling is done to find the drug targets with protein interaction along with the calculation of binding affinities. Then deep learning models were used for training the molecular descriptor dataset for the robust discovery of drugs and feature extraction for combating COVID-19. Results have shown more significant binding affinities (greater than −18) for many molecules that can be used to block the multiplication of SARS-CoV-2, responsible for COVID-19.
Collapse
|
6
|
Pinacho-Castellanos SA, García-Jacas CR, Gilson MK, Brizuela CA. Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set. J Chem Inf Model 2021; 61:3141-3157. [PMID: 34081438 DOI: 10.1021/acs.jcim.1c00251] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In the last two decades, a large number of machine-learning-based predictors for the activities of antimicrobial peptides (AMPs) have been proposed. These predictors differ from one another in the learning method and in the training and testing data sets used. Unfortunately, the training data sets present several drawbacks, such as a low representativeness regarding the experimentally validated AMP space, and duplicated peptide sequences between negative and positive data sets. These limitations give a low confidence to most of the approaches to be used in prospective studies. To address these weaknesses, we propose novel modeling and assessing data sets from the largest experimentally validated nonredundant peptide data set reported to date. From these novel data sets, alignment-free quantitative sequence-activity models (AF-QSAMs) based on Random Forest are created to identify general AMPs and their antibacterial, antifungal, antiparasitic, and antiviral functional types. An applicability domain analysis is carried out to determine the reliability of the predictions obtained, which, to the best of our knowledge, is performed for the first time for AMP recognition. A benchmarking is undertaken between the models proposed and several models from the literature that are freely available in 13 programs (ClassAMP, iAMP-2L, ADAM, MLAMP, AMPScanner v2.0, AntiFP, AMPfun, PEPred-suite, AxPEP, CAMPR3, iAMPpred, APIN, and Meta-iAVP). The models proposed are those with the best performance in all of the endpoints modeled, while most of the methods from the literature have weak-to-random predictive agreements. The models proposed are also assessed through Y-scrambling and repeated k-fold cross-validation tests, demonstrating that the outcomes obtained by them are not given by chance. Three chemometric analyses also confirmed the relevance of the peptides descriptors used in the modeling. Therefore, it can be concluded that the models built by fixing the drawbacks existing in the literature contribute to identifying antibacterial, antifungal, antiparasitic, and antiviral peptides with high effectivity and reliability. Models are freely available via the AMPDiscover tool at https://biocom-ampdiscover.cicese.mx/.
Collapse
Affiliation(s)
- Sergio A Pinacho-Castellanos
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.,Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI), Instituto Politécnico Nacional (IPN), 22435 Tijuana, Baja California, México
| | - César R García-Jacas
- Cátedras CONACYT-Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, United States
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| |
Collapse
|
7
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|