1
|
Mansouri K, Taylor K, Auerbach S, Ferguson S, Frawley R, Hsieh JH, Jahnke G, Kleinstreuer N, Mehta S, Moreira-Filho JT, Parham F, Rider C, Rooney AA, Wang A, Sutherland V. Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity. ENVIRONMENTAL HEALTH PERSPECTIVES 2024; 132:85002. [PMID: 39106156 DOI: 10.1289/ehp14001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
BACKGROUND The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications. OBJECTIVES This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses. DISCUSSION Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.
Collapse
Affiliation(s)
- Kamel Mansouri
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kyla Taylor
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Scott Auerbach
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Stephen Ferguson
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Rachel Frawley
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Jui-Hua Hsieh
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Gloria Jahnke
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Suril Mehta
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - José T Moreira-Filho
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Fred Parham
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Cynthia Rider
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Andrew A Rooney
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Amy Wang
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Vicki Sutherland
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| |
Collapse
|
2
|
Zięba A, Bartuzi D, Stępnicki P, Matosiuk D, Wróbel TM, Laitinen T, Castro M, Kaczor AA. Discovery and in vitro Evaluation of Novel Serotonin 5-HT 2A Receptor Ligands Identified Through Virtual Screening. ChemMedChem 2024; 19:e202400080. [PMID: 38619283 DOI: 10.1002/cmdc.202400080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/16/2024]
Abstract
The 5-HT2A receptor is a molecular target of high pharmacological importance. Ligands of this protein, particularly atypical antipsychotics, are useful in the treatment of numerous mental disorders, including schizophrenia and major depressive disorder. Structure-based virtual screening using a 5-HT2A receptor complex was performed to identify novel ligands for the 5-HT2A receptor, serving as potential antidepressants. From the Enamine screening library, containing over 4 million compounds, 48 molecules were selected for subsequent experimental validation. These compounds were tested against the 5-HT2A receptor in radioligand binding assays. From the tested batch, six molecules were identified as ligands of the main molecular target and were forwarded to a more detailed in vitro profiling. This included radioligand binding assays at 5-HT1A, 5-HT7, and D2 receptors and functional studies at 5-HT2A receptors. These compounds were confirmed to show a binding affinity for at least one of the targets tested in vitro. The success rate for the inactive template-based screening reached 17 %, while it was 9 % for the active template-based screening. Similarity and fragment analysis indicated the structural novelty of the identified compounds. Pharmacokinetics for these molecules was determined using in silico approaches.
Collapse
Affiliation(s)
- Agata Zięba
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
| | - Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 75124, Uppsala, Sweden
| | - Piotr Stępnicki
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
| | - Tomasz M Wróbel
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
| | - Tuomo Laitinen
- School of Pharmacy, University of Eastern Finland, Yliopistonranta 1, P.O. Box 1627, 70211, Kuopio, Finland
| | - Marián Castro
- Department of Pharmacology, Universidade de Santiago de Compostela, Center for Research in Molecular Medicine and Chronic Diseases (CIMUS), Avda. de Barcelona, 15782, Santiago de Compostela, Spain
- Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS), Travesía da Choupana s/n, E-15706, Santiago de Compostela, Spain
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy, Medical University of Lublin, 4A Chodźki St., 20059, Lublin, Poland
- School of Pharmacy, University of Eastern Finland, Yliopistonranta 1, P.O. Box 1627, 70211, Kuopio, Finland
| |
Collapse
|
3
|
Croy A. From Local Atomic Environments to Molecular Information Entropy. ACS OMEGA 2024; 9:20616-20622. [PMID: 38737089 PMCID: PMC11080039 DOI: 10.1021/acsomega.4c02770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 04/01/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024]
Abstract
The similarity of local atomic environments is an important concept in many machine learning techniques, which find applications in computational chemistry and material science. Here, we present and discuss a connection between the information entropy and the similarity matrix of a molecule. The resulting entropy can be used as a measure of the complexity of a molecule. Exemplarily, we introduce and evaluate two specific choices for defining the similarity: one is based on a SMILES representation of local substructures, and the other is based on the SOAP kernel. By tuning the sensitivity of the latter, we can achieve good agreement between the respective entropies. Finally, we consider the entropy of two molecules in a mixture. The gain of entropy due to the mixing can be used as a similarity measure of the molecules. We compare this measure to the average and best-match kernel. The results indicate a connection between the different approaches and demonstrate the usefulness and broad applicability of the similarity-based entropy approach.
Collapse
Affiliation(s)
- Alexander Croy
- Institute of Physical Chemistry, Friedrich Schiller University Jena, 07737 Jena, Germany
| |
Collapse
|
4
|
Kirchoff KE, Wellnitz J, Hochuli JE, Maxfield T, Popov KI, Gomez S, Tropsha A. Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search. ADVANCES IN INFORMATION RETRIEVAL : ... EUROPEAN CONFERENCE ON IR RESEARCH, ECIR ... PROCEEDINGS. EUROPEAN CONFERENCE ON IR RESEARCH 2024; 14609:34-49. [PMID: 38585224 PMCID: PMC10998712 DOI: 10.1007/978-3-031-56060-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.
Collapse
Affiliation(s)
| | | | | | | | | | - Shawn Gomez
- Department of Pharmacology, UNC Chapel Hill
- Joint Department of Biomedical Engineering at UNC Chapel Hill and NCSU
| | | |
Collapse
|
5
|
Liu T, Zhang B, Gao Y, Zhang X, Tong J, Li Z. Identification of ACHE as the hub gene targeting solasonine associated with non-small cell lung cancer (NSCLC) using integrated bioinformatics analysis. PeerJ 2023; 11:e16195. [PMID: 37842037 PMCID: PMC10573390 DOI: 10.7717/peerj.16195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Solasonine, as a major biological component of Solanum nigrum L., has demonstrated anticancer effects against several malignancies. However, little is understood regarding its biological target and mechanism in non-small cell lung cancer (NSCLC). Methods We conducted an analysis on transcriptomic data to identify differentially expressed genes (DEGs), and employed an artificial intelligence (AI) strategy to predict the target protein for solasonine. Subsequently, genetic dependency analysis and molecular docking were performed, with Acetylcholinesterase (ACHE) selected as a pivotal marker for solasonine. We then employed a range of bioinformatic approaches to explore the relationship between ACHE and solasonine. Furthermore, we investigated the impact of solasonine on A549 cells, a human lung cancer cell line. Cell inhibition of A549 cells following solasonine treatment was analyzed using the CCK8 assay. Additionally, we assessed the protein expression of ACHE, as well as markers associated with apoptosis and inflammation, using western blotting. To investigate their functions, we employed a plasmid-based ACHE overexpression system. Finally, we performed dynamics simulations to simulate the interaction mode between solasonine and ACHE. Results The results of the genetic dependency analysis revealed that ACHE could be identified as the pivotal target with the highest docking affinity. The cell experiments yielded significant findings, as evidenced by the negative regulatory effect of solasonine treatment on tumor cells, as demonstrated by the CCK8 assay. Western blotting analysis revealed that solasonine treatment resulted in the downregulation of the Bcl-2/Bax ratio and upregulation of cleaved caspase-3 protein expression levels. Moreover, we observed that ACHE overexpression promoted the expression of the Bcl-2/Bax ratio and decreased cleaved caspase-3 expression in the OE-ACHE group. Notably, solasonine treatment rescued the Bcl-2/Bax ratio and cleaved caspase-3 expression in OE-ACHE cells compared to OE-ACHE cells without solasonine treatment, suggesting that solasonine induces apoptosis. Besides, solasonine exhibited its anti-inflammatory effects by inhibiting P38 MAPK. This was supported by the decline in protein levels of IL-1β and TNF-α, as well as the phosphorylated forms of JNK and P38 MAPK. The results from the molecular docking and dynamics simulations further confirmed the potent binding affinity and effective inhibitory action between solasonine and ACHE. Conclusions The findings of the current investigation show that solasonine exerts its pro-apoptosis and anti-inflammatory effects by suppressing the expression of ACHE.
Collapse
Affiliation(s)
- Tong Liu
- Anhui University of Chinese Medicine, Hefei, Anhui, China
- Key Laboratory of Xin’An Medicine, Ministry of Education, Hefei, Anhui, China
| | - Boke Zhang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yating Gao
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Xingxing Zhang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Jiabing Tong
- Anhui University of Chinese Medicine, Hefei, Anhui, China
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
- Key Laboratory of Anhui Provincial Department of Education, Hefei, Anhui, China
- Center for Xin’an Medicine and Modernization of Traditional Chinese Medicine, Institute of Health and Medicine, Hefei Comprehensive National Science Center, Hefei, Anhui, China
| | - Zegeng Li
- Anhui University of Chinese Medicine, Hefei, Anhui, China
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, Anhui, China
- Key Laboratory of Anhui Provincial Department of Education, Hefei, Anhui, China
| |
Collapse
|
6
|
Spiers RC, Norby C, Kalivas JH. Physicochemical Responsive Integrated Similarity Measure (PRISM) for a Comprehensive Quantitative Perspective of Sample Similarity Dynamically Assessed with NIR Spectra. Anal Chem 2023; 95:12776-12784. [PMID: 37594455 DOI: 10.1021/acs.analchem.3c01616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023]
Abstract
Determining sample similarity underlies many foundational principles in analytical chemistry. For example, calibration models are unsuitable to predict outliers. Calibration transfer methods assume a moderate degree of sample and measurement dissimilarities between a calibration set and target prediction samples. Classification approaches link target sample similarities to groups of similar class samples. Although similarity is ubiquitous in analytical chemistry and everyday life, quantifying sample similarity is without a straightforward solution, especially when target domain samples are unlabeled and the only known features are measurable, such as spectra (the focus of this paper). The process proposed to assess sample similarity integrates spectral similarity information with contextual considerations among source analyte contents, model, and analyte predictions. This hybrid approach named the physicochemical responsive integrated similarity measure (PRISM) amplifies hidden-but-essential physicochemical properties encoded within respective spectra. PRISM is tested on four near-infrared (NIR) data sets for four diverse application areas to show efficacy. These applications are the assessment of prediction reliability and model updating for model generalizability, outlier detection, and basic matrix matching evaluation. Discussion is provided on adapting PRISM to classification problems. Results indicate that PRISM collects large amounts of similarity information and effectively integrates it to produce a quantitative similarity evaluation between the target sample and a source domain. The approach is also useful for biological samples with additional physiochemical variations. While PRISM is dynamically tested on NIR data, parts of PRISM were previously applied to other data types, and PRISM should be applicable to other measurement systems perturbed by matrix effects.
Collapse
Affiliation(s)
- Robert C Spiers
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| | - Callan Norby
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| | - John H Kalivas
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| |
Collapse
|
7
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
8
|
Quartier J, Lapteva M, Boulaguiem Y, Guerrier S, Kalia YN. Influence of Molecular Structure and Physicochemical Properties of Immunosuppressive Drugs on Micelle Formulation Characteristics and Cutaneous Delivery. Pharmaceutics 2023; 15:pharmaceutics15041278. [PMID: 37111763 PMCID: PMC10142028 DOI: 10.3390/pharmaceutics15041278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/12/2023] [Accepted: 04/14/2023] [Indexed: 04/29/2023] Open
Abstract
The aim of this study was to investigate whether subtle differences in molecular properties affected polymeric micelle characteristics and their ability to deliver poorly water-soluble drugs into the skin. D-α-tocopherol-polyethylene glycol 1000 was used to prepare micelles containing ascomycin-derived immunosuppressants-sirolimus (SIR), pimecrolimus (PIM) and tacrolimus (TAC)-which have similar structures and physicochemical properties and have dermatological applications. Micelle formulations were prepared by thin-film hydration and extensively characterized. Cutaneous delivery and biodistribution were determined and compared. Sub-10 nm micelles were obtained for the three immunosuppressants with incorporation efficiencies >85%. However, differences were observed for drug loading, stability (at the highest concentration), and their in vitro release kinetics. These were attributed to differences in drug aqueous solubility and lipophilicity. Differences between the cutaneous biodistribution profiles and drug deposition in the different skin compartments pointed to the impact of differences in thermodynamic activity. Therefore, despite their structural similarities, SIR, TAC and PIM did not demonstrate the same behaviour either in the micelles or when applied to the skin. These outcomes indicate that polymeric micelles should be optimized even for closely related drug molecules and support the hypothesis that drugs are released from micelles prior to skin penetration.
Collapse
Affiliation(s)
- Julie Quartier
- School of Pharmaceutical Sciences, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
| | - Maria Lapteva
- School of Pharmaceutical Sciences, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
| | - Younes Boulaguiem
- Geneva School of Economics and Management, University of Geneva, 40 Boulevard du Pont d'Arve, 1204 Geneva, Switzerland
| | - Stéphane Guerrier
- School of Pharmaceutical Sciences, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
- Geneva School of Economics and Management, University of Geneva, 40 Boulevard du Pont d'Arve, 1204 Geneva, Switzerland
| | - Yogeshvar N Kalia
- School of Pharmaceutical Sciences, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CMU-1 rue Michel Servet, 1211 Geneva, Switzerland
| |
Collapse
|
9
|
Trudeau SJ, Hwang H, Mathur D, Begum K, Petrey D, Murray D, Honig B. PrePCI: A structure- and chemical similarity-informed database of predicted protein compound interactions. Protein Sci 2023; 32:e4594. [PMID: 36776141 PMCID: PMC10019447 DOI: 10.1002/pro.4594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 02/07/2023] [Accepted: 02/09/2023] [Indexed: 02/14/2023]
Abstract
We describe the Predicting Protein-Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome-wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence- and structural similarity-based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT-scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.
Collapse
Affiliation(s)
- Stephen J. Trudeau
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
- Integrated Graduate Program in Cellular, Molecular and Biomedical Studies (CMBS), Columbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Howook Hwang
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
- Schrodinger, Inc.New YorkNew YorkUSA
| | - Deepika Mathur
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Kamrun Begum
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Donald Petrey
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Diana Murray
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Barry Honig
- Department of Systems BiologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
- Department of MedicineColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain and Behavior InstituteColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
10
|
Lazare J, Tebes-Stevens C, Weber EJ. A multiple linear regression approach to the estimation of carboxylic acid ester and lactone alkaline hydrolysis rate constants. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:183-210. [PMID: 36951517 PMCID: PMC10547131 DOI: 10.1080/1062936x.2023.2188608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/25/2023] [Indexed: 05/03/2023]
Abstract
Pesticides, pharmaceuticals, and other organic contaminants often undergo hydrolysis when released into the environment; therefore, measured or estimated hydrolysis rates are needed to assess their environmental persistence. An intuitive multiple linear regression (MLR) approach was used to develop robust QSARs for predicting base-catalyzed rate constants of carboxylic acid esters (CAEs) and lactones. We explored various combinations of independent descriptors, resulting in four primary models (two for lactones and two for CAEs), with a total of 15 and 11 parameters included in the CAE and lactone QSAR models, respectively. The most significant descriptors include pKa, electronegativity, charge density, and steric parameters. Model performance is assessed using Drug Theoretics and Cheminformatics Laboratory's DTC-QSAR tool, demonstrating high accuracy for both internal validation (r2 = 0.93 and RMSE = 0.41-0.43 for CAEs; r2 = 0.90-0.93 and RMSE = 0.38-0.46 for lactones) and external validation (r2 = 0.93 and RMSE = 0.43-0.45 for CAEs; r2 = 0.94-0.98 and RMSE = 0.33-0.41 for lactones). The developed models require only low-cost computational resources and have substantially improved performance compared to existing hydrolysis rate prediction models (HYDROWIN and SPARC).
Collapse
Affiliation(s)
- Jovian Lazare
- Oak Ridge Institute for Science and Education (ORISE), hosted at U.S. Environmental Protection Agency, Athens, Georgia 30605, United States
| | - Caroline Tebes-Stevens
- Center for Environmental Measurement and Modeling, United States Environmental Protection Agency, Athens, Georgia 30605, United States
| | - Eric J. Weber
- Center for Environmental Measurement and Modeling, United States Environmental Protection Agency, Athens, Georgia 30605, United States
| |
Collapse
|
11
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|
12
|
Lungu CN, Mangalagiu V, Mangalagiu II, Mehedinti MC. Benzoquinoline Chemical Space: A Helpful Approach in Antibacterial and Anticancer Drug Design. Molecules 2023; 28:molecules28031069. [PMID: 36770739 PMCID: PMC9921191 DOI: 10.3390/molecules28031069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/09/2023] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open
Abstract
Benzoquinolines are used in many drug design projects as starting molecules subject to derivatization. This computational study aims to characterize e benzoquinone drug space to ease future drug design processes based on these molecules. The drug space is composed of all benzoquinones, which are active on topoisomerase II and ATP synthase. Topological, chemical, and bioactivity spaces are explored using computational methodologies based on virtual screening and scaffold hopping and molecular docking, respectively. Topological space is a geometrical space in which the elements composing it can be defined as a set of neighbors (which satisfy a particular axiom). In such space, a chemical space can be defined as the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. In this chemical space, the potentially pharmacologically active molecules form the bioactivity space. Results show a poly-morphological chemical space that suggests distinct characteristics. The chemical space is correlated with properties such as steric energy, the number of hydrogen bonds, the presence of halogen atoms, and membrane permeability-related properties. Lastly, novel chemical compounds (such as oxadiazole methybenzamide and floro methylcyclohexane diene) with drug-like potential, active on TOPO II and ATP synthase have been identified.
Collapse
Affiliation(s)
- Claudiu N. Lungu
- Department of Surgery, Emergency Country Clinical Hospital, 800010 Galati, Romania
- Faculty of Chemistry, Alexandru Ioan Cuza University of Iasi, 11 Carol 1st Bvd, 700506 Iasi, Romania
- Department of Morphological and Functional Science, University of Medicine and Pharmacy, Dunarea de Jos, 800017 Galati, Romania
- Correspondence: (C.N.L.); (I.I.M.)
| | - Violeta Mangalagiu
- Faculty of Chemistry, Alexandru Ioan Cuza University of Iasi, 11 Carol 1st Bvd, 700506 Iasi, Romania
- Faculty of Food Engineering, Stefan cel Mare University of Suceava, 13 Universitatii Str., 720229 Suceava, Romania
| | - Ionel I. Mangalagiu
- Faculty of Chemistry, Alexandru Ioan Cuza University of Iasi, 11 Carol 1st Bvd, 700506 Iasi, Romania
- Institute of Interdisciplinary Research-CERNESIM Centre, Alexandru Ioan Cuza University of Iasi, 11 Carol I, 700506 Iasi, Romania
- Correspondence: (C.N.L.); (I.I.M.)
| | - Mihaela C. Mehedinti
- Faculty of Chemistry, Alexandru Ioan Cuza University of Iasi, 11 Carol 1st Bvd, 700506 Iasi, Romania
- Department of Morphological and Functional Science, University of Medicine and Pharmacy, Dunarea de Jos, 800017 Galati, Romania
| |
Collapse
|
13
|
Xenos A, Malod-Dognin N, Zambrana C, Pržulj N. Integrated Data Analysis Uncovers New COVID-19 Related Genes and Potential Drug Re-Purposing Candidates. Int J Mol Sci 2023; 24:ijms24021431. [PMID: 36674947 PMCID: PMC9863794 DOI: 10.3390/ijms24021431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/23/2022] [Accepted: 01/09/2023] [Indexed: 01/12/2023] Open
Abstract
The COVID-19 pandemic is an acute and rapidly evolving global health crisis. To better understand this disease's molecular basis and design therapeutic strategies, we built upon the recently proposed concept of an integrated cell, iCell, fusing three omics, tissue-specific human molecular interaction networks. We applied this methodology to construct infected and control iCells using gene expression data from patient samples and three cell lines. We found large differences between patient-based and cell line-based iCells (both infected and control), suggesting that cell lines are ill-suited to studying this disease. We compared patient-based infected and control iCells and uncovered genes whose functioning (wiring patterns in iCells) is altered by the disease. We validated in the literature that 18 out of the top 20 of the most rewired genes are indeed COVID-19-related. Since only three of these genes are targets of approved drugs, we applied another data fusion step to predict drugs for re-purposing. We confirmed with molecular docking that the predicted drugs can bind to their predicted targets. Our most interesting prediction is artenimol, an antimalarial agent targeting ZFP62, one of our newly identified COVID-19-related genes. This drug is a derivative of artemisinin drugs that are already under clinical investigation for their potential role in the treatment of COVID-19. Our results demonstrate further applicability of the iCell framework for integrative comparative studies of human diseases.
Collapse
Affiliation(s)
- Alexandros Xenos
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, Universitat Politecnica de Catalunya (UPC), 08034 Barcelona, Spain
| | - Noël Malod-Dognin
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Carme Zambrana
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, Universitat Politecnica de Catalunya (UPC), 08034 Barcelona, Spain
| | - Nataša Pržulj
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
- Correspondence:
| |
Collapse
|
14
|
Chaikuad A, Merk D. An Introduction to Chemogenomics. Methods Mol Biol 2023; 2706:1-10. [PMID: 37558937 DOI: 10.1007/978-1-0716-3397-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Chemogenomics is an innovative approach in chemical biology that synergizes combinatorial chemistry and genomic and proteomic biology to systematically study the response of a biological system to a set of compounds, which can aid the identification and validation of biological targets as well as biologically active small-molecule agents responsible for a phenotypic outcome. Central to this strategy is a collection of chemically diverse compounds, a so-called chemogenomics library. Selection and annotation of vastly available chemogenomic compound candidates for an inclusion in such set present a challenge, but optimal compound selection is critical for success of chemogenomics. The library can be used in a wide variety of research applications from biological mechanism deconvolution to drug discovery. However, phenotypic screening methods are typically required to be high-throughput and equipped with a systematic analysis of complex biological-chemical interactions. This chapter provides a general outline to the chemogenomics approach, including concept and critical steps in all stages of this innovative chemical biology strategy.
Collapse
Affiliation(s)
- Apirat Chaikuad
- Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, Frankfurt, Germany
| | - Daniel Merk
- Institute of Pharmaceutical Chemistry, Goethe University Frankfurt, Frankfurt, Germany.
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
15
|
Zhao Z, Bourne PE. Harnessing systematic protein-ligand interaction fingerprints for drug discovery. Drug Discov Today 2022; 27:103319. [PMID: 35850431 DOI: 10.1016/j.drudis.2022.07.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 07/04/2022] [Accepted: 07/12/2022] [Indexed: 12/15/2022]
Abstract
Determining protein-ligand interaction characteristics and mechanisms is crucial to the drug discovery process. Here, we review recent progress and successful applications of a systematic protein-ligand interaction fingerprint (IFP) approach for investigating proteome-wide protein-ligand interactions for drug development. Specifically, we review the use of this IFP approach for revealing polypharmacology across the kinome, predicting promising targets from which to design allosteric inhibitors and covalent kinase inhibitors, uncovering the binding mechanisms of drugs of interest, and demonstrating resistant mechanisms of specific drugs. Together, we demonstrate that the IFP strategy is efficient and practical for drug design research for protein kinases as targets and is extensible to other protein families.
Collapse
Affiliation(s)
- Zheng Zhao
- School of Data Science and Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22904, USA.
| | - Philip E Bourne
- School of Data Science and Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22904, USA.
| |
Collapse
|
16
|
Rauf A, Ishtiaq M, Muhammad MH, Siddiqui MK, Rubbab Q. Algebraic Polynomial Based Topological Study of Graphite Carbon Nitride (g-) Molecular Structure. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2021.1934044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Abdul Rauf
- Department of Computer Science and Mathematics, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Mathematics, Air University Multan Campus, Multan, Pakistan
| | | | - Muhammad Kamran Siddiqui
- College of chemistry, School of Chemical Engineering and Energy, Zhengzhou University, Zhengzhou, China
| | - Qammar Rubbab
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, Lahore, Pakistan
- Department of Mathematics, The Woman University Multan, Multan, Pakistan
| |
Collapse
|
17
|
Ishtiaq M, Rauf A, Rubbab Q, Siddiqui MK, Rehman AU, Cancan M. A Degree Based Topological Study of Two Carbon Nanosheets VC5C7 and HC5C7. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2021.1901125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Muhammad Ishtiaq
- Faculty of Agriculture and Environmental Sciences, Muhammad Nawaz Sharif University of Agriculture, Multan, Pakistan
| | - Abdul Rauf
- Department of Computer Science and Mathematics, Air University Multan Campus, Multan, Pakistan
| | - Qammar Rubbab
- Department of Mathematics, The Woman University Multan, Multan, Pakistan
| | | | - Ata-ur- Rehman
- Department of Electrical Engineering (RCET), University of Engineering and Technology, Lahore, Pakistan
| | - Murat Cancan
- Faculty of Education, Van Yuzuncu Yil University, Van, Turkey
| |
Collapse
|
18
|
Sharma K, Bhat VK. On Topological Descriptors of Polycyclic Aromatic Benzenoid Systems. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2022.2086273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Karnika Sharma
- School of Mathematics, Shri Mata Vaishno Devi University, Jammu And Kashmir, India
| | - Vijay Kumar Bhat
- School of Mathematics, Shri Mata Vaishno Devi University, Jammu And Kashmir, India
| |
Collapse
|
19
|
Lenci E, Trabocchi A. Diversity‐Oriented Synthesis and Chemoinformatics: A Fruitful Synergy towards Better Chemical Libraries. European J Org Chem 2022. [DOI: 10.1002/ejoc.202200575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Elena Lenci
- Universita degli Studi di Firenze Department of Chemistry Via della Lastruccia 1350019Italia 50019 Sesto Fiorentino ITALY
| | - Andrea Trabocchi
- University of Florence: Universita degli Studi di Firenze Department of Chemistry "Ugo Schiff" ITALY
| |
Collapse
|
20
|
Procacci P. Relative Binding Free Energy between Chemically Distant Compounds Using a Bidirectional Nonequilibrium Approach. J Chem Theory Comput 2022; 18:4014-4026. [PMID: 35642423 PMCID: PMC9202353 DOI: 10.1021/acs.jctc.2c00295] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
![]()
In the context of
advanced hit-to-lead drug design based on atomistic
molecular dynamics simulations, we propose a dual topology alchemical
approach for calculating the relative binding free energy (RBFE) between
two chemically distant compounds. The method (termed NE-RBFE) relies
on the enhanced sampling of the end-states in bulk and in the bound
state via Hamiltonian Replica Exchange, alchemically connected by
a series of independent and fast nonequilibrium (NE) simulations.
The technique has been implemented in a bidirectional fashion, applying
the Crooks theorem to the NE work distributions for RBFE predictions.
The dissipation of the NE process, negatively affecting accuracy,
has been minimized by introducing a smooth regularization based on
shifted electrostatic and Lennard-Jones non bonded potentials. As
a challenging testbed, we have applied our method to the calculation
of the RBFEs in the recent host–guest SAMPL international contest,
featuring a macrocyclic host with guests varying in the net charge,
volume, and chemical fingerprints. Closure validation has been successfully
verified in cycles involving compounds with disparate Tanimoto coefficients,
volume, and net charge. NE-RBFE is specifically tailored for massively
parallel facilities and can be used with little or no code modification
on most of the popular software packages supporting nonequilibrium
alchemical simulations, such as Gromacs, Amber, NAMD, or OpenMM. The
proposed methodology bypasses most of the entanglements and limitations
of the standard single topology RBFE approach for strictly congeneric
series based on free-energy perturbation, such as slowly relaxing
cavity water, sampling issues along the alchemical stratification,
and the need for highly overlapping molecular fingerprints.
Collapse
Affiliation(s)
- Piero Procacci
- Dipartimento di Chimica "Ugo Schiff", Università degli Studi di Firenze, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
21
|
Hutter MC. Differential Multimolecule Fingerprint for Similarity Search─Making Use of Active and Inactive Compound Sets in Virtual Screening. J Chem Inf Model 2022; 62:2726-2736. [PMID: 35613341 DOI: 10.1021/acs.jcim.2c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In conventional fingerprint methods, the similarity between two molecules is calculated using the Tanimoto index as a numerical criterion. Thus, the query molecules in virtual screening should be most representative of the wanted compound class at hand. In the concept introduced here, all available active molecules form a multimolecule fingerprint in which the appearing features are weighted according to their respective frequency. The features of inactive molecules are treated likewise and the resulting values are subtracted from those of the active ones. The obtained differential multimolecule fingerprint (DMMFP) is thus specific for the respective class of compounds. To account for the noninteger representation within this fingerprint, a modified Sørensen-Dice coefficient is used to compute the similarity. Potentially active molecules yield positive scores, whereas presumably inactive ones are denoted by negative values. The concept was applied to Angiotensin-converting enzyme (ACE) inhibitors, β2-adrenoceptor ligands, leukotriene A4 hydrolase inhibitors, dopamine D3 antagonists, and cytochrome CYP2C9 substrates, for which experimental binding affinities are known and was tested against decoys from DUD-E and a further background database consisting of molecules from the dark chemical matter, which comprises compounds that appear as frequent hitters across multiple assays. Using the 166 publicly available keys of the MACCS fingerprint and the larger PubChem fingerprint, actives were recovered with very high sensitivity. Furthermore, three marketed ACE inhibitors as well as the carbonic anhydrase II inhibitor dorzolamide were detected in the dark chemical matter data set. For comparison, the DMMFP was also used with a Bayesian classifier, for which the specificity (correctly classified inactives) and likewise the accuracy was superior. Conversely, the similarity score produced by the Sørensen-Dice coefficient showed its potential for the early recognition of (potentially) active molecules.
Collapse
Affiliation(s)
- Michael C Hutter
- Center for Bioinformatics, Saarland University, Campus E2.1, 66123 Saarbruecken, Germany
| |
Collapse
|
22
|
Aliagas I, Gobbi A, Lee ML, Sellers BD. Comparison of logP and logD correction models trained with public and proprietary data sets. J Comput Aided Mol Des 2022; 36:253-262. [PMID: 35359246 DOI: 10.1007/s10822-022-00450-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/15/2022] [Indexed: 10/18/2022]
Abstract
In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
Collapse
Affiliation(s)
- Ignacio Aliagas
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Alberto Gobbi
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Man-Ling Lee
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Benjamin D Sellers
- Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| |
Collapse
|
23
|
Martínez-Martínez CT, Méndez-Bermúdez JA, Rodrigues FA, Estrada E. Nonuniform random graphs on the plane: A scaling study. Phys Rev E 2022; 105:034304. [PMID: 35428102 DOI: 10.1103/physreve.105.034304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
We consider random geometric graphs on the plane characterized by a nonuniform density of vertices. In particular, we introduce a graph model where n vertices are independently distributed in the unit disk with positions, in polar coordinates (l,θ), obeying the probability density functions ρ(l) and ρ(θ). Here we choose ρ(l) as a normal distribution with zero mean and variance σ∈(0,∞) and ρ(θ) as a uniform distribution in the interval θ∈[0,2π). Then, two vertices are connected by an edge if their Euclidean distance is less than or equal to the connection radius ℓ. We characterize the topological properties of this random graph model, which depends on the parameter set (n,σ,ℓ), by the use of the average degree 〈k〉 and the number of nonisolated vertices V_{×}, while we approach their spectral properties with two measures on the graph adjacency matrix: the ratio of consecutive eigenvalue spacings r and the Shannon entropy S of eigenvectors. First we propose a heuristic expression for 〈k(n,σ,ℓ)〉. Then, we look for the scaling properties of the normalized average measure 〈X[over ¯]〉 (where X stands for V_{×}, r, and S) over graph ensembles. We demonstrate that the scaling parameter of 〈V_{×}[over ¯]〉=〈V_{×}〉/n is indeed 〈k〉, with 〈V_{×}[over ¯]〉≈1-exp(-〈k〉). Meanwhile, the scaling parameter of both 〈r[over ¯]〉 and 〈S[over ¯]〉 is proportional to n^{-γ}〈k〉 with γ≈0.16.
Collapse
Affiliation(s)
- C T Martínez-Martínez
- Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico
| | - J A Méndez-Bermúdez
- Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico
| | - Francisco A Rodrigues
- Departamento de Matemática Aplicada e Estatística, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo-Campus de São Carlos, Caixa Postal 668, 13560-970 São Carlos, São Paulo, Brazil
| | - Ernesto Estrada
- Institute for Cross-Disciplinary Physics and Complex Systems (IFISC-CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain
| |
Collapse
|
24
|
Ajjarapu SM, Tiwari A, Ramteke PW, Singh DB, Kumar S. Ligand-based drug designing. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00018-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
25
|
Abstract
Vanadium is a biologically active product with significant industrial and biological applications. Vanadium is found in a variety of minerals and fossil fuels, the most common of which are sandstones, crude oil, and coal. Topological descriptors are numerical numbers assigned to the molecular structures and have the ability to predict certain of their physical/chemical properties. In this paper, we have studied topological descriptors of vanadium carbide structure based on ev and ve degrees. In particular, we have computed the closed forms of Zagreb, Randic, geometric-arithmetic, and atom-bond connectivity (ABC) indices of vanadium carbide structure based on ev and ve degrees. This kind of study may be useful for understanding the biological and chemical behavior of the structure.
Collapse
|
26
|
Rica E, Álvarez S, Serratosa F. Ligand-Based Virtual Screening Based on the Graph Edit Distance. Int J Mol Sci 2021; 22:12751. [PMID: 34884555 PMCID: PMC8658044 DOI: 10.3390/ijms222312751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 11/12/2021] [Accepted: 11/13/2021] [Indexed: 11/25/2022] Open
Abstract
Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets-CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS-have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.
Collapse
Affiliation(s)
- Elena Rica
- Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain; (S.Á.); (F.S.)
| | | | | |
Collapse
|
27
|
Flores-Gallegos N. Rényi's divergence as a chemical similarity criterion. JOURNAL OF MATHEMATICAL CHEMISTRY 2021; 60:239-254. [PMID: 34840396 PMCID: PMC8607974 DOI: 10.1007/s10910-021-01307-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/09/2021] [Indexed: 06/13/2023]
Abstract
In this work, a new version of Rényi's divergence is presented. The expression obtained is used as a tool to identify molecules that could share some chemical or structural properties, and a data basis set of 1641 molecules is used in this study. Our results suggest that this new form of Rényi divergence could be a useful tool that will eventually permit complementary studies in which the main goal is to obtain molecules with similar properties.
Collapse
Affiliation(s)
- N. Flores-Gallegos
- Centro Universitario de los Valles, Universidad de Guadalajara, Carretera Guadalajara - Ameca Km. 45.5, C.P. 46600 Ameca, Jalisco Mexico
| |
Collapse
|
28
|
Ishtiaq M, Rauf A, Rubbab Q, Siddiqui MK, Ibrahim H. Algebraic Polynomial Based Topological Properties of Anti-Tumor Drug; Hyaluronic Acid-Doxorubicin (HAD). Polycycl Aromat Compd 2021. [DOI: 10.1080/10406638.2021.1995011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Muhammad Ishtiaq
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| | - Abdul Rauf
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| | - Qammar Rubbab
- Department of Mathematics, The Woman University Multan, Multan, Punjab, Pakistan
| | - Muhammad Kamran Siddiqui
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, Lahore, Punjab, Pakistan
| | - Humaira Ibrahim
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| |
Collapse
|
29
|
Parisutham N, Rethnasamy N. Eigenvector centrality based algorithm for finding a maximal common connected vertex induced molecular substructure of two chemical graphs. J Mol Struct 2021. [DOI: 10.1016/j.molstruc.2021.130980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
Gantzer P, Creton B, Nieto-Draghi C. Comparisons of Molecular Structure Generation Methods Based on Fragment Assemblies and Genetic Graphs. J Chem Inf Model 2021; 61:4245-4258. [PMID: 34405674 DOI: 10.1021/acs.jcim.1c00803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The use of quantitative structure-property relationships (QSPRs) helps in predicting molecular properties for several decades, while the automatic design of new molecular structures is still emerging. The choice of algorithms to generate molecules is not obvious and is related to several factors such as the desired chemical diversity (according to an initial dataset's content) and the level of construction (the use of atoms, fragments, pattern-based methods). In this paper, we address the problem of molecular structure generation by revisiting two approaches: fragment-based methods (FMs) and genetic-based methods (GMs). We define a set of indices to compare generation methods on a specific task. New indices inform about the explored data space (coverage), compare how the data space is explored (representativeness), and quantifies the ratio of molecules satisfying requirements (generation specificity) without the use of a database composed of real chemicals as a reference. These indices were employed to compare generations of molecules fulfilling the desired property criterion, evaluated by QSPR.
Collapse
Affiliation(s)
- Philippe Gantzer
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Carlos Nieto-Draghi
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| |
Collapse
|
31
|
Zambrana C, Xenos A, Böttcher R, Malod-Dognin N, Pržulj N. Network neighbors of viral targets and differentially expressed genes in COVID-19 are drug target candidates. Sci Rep 2021; 11:18985. [PMID: 34556735 PMCID: PMC8460804 DOI: 10.1038/s41598-021-98289-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/23/2021] [Indexed: 12/12/2022] Open
Abstract
The COVID-19 pandemic is raging. It revealed the importance of rapid scientific advancement towards understanding and treating new diseases. To address this challenge, we adapt an explainable artificial intelligence algorithm for data fusion and utilize it on new omics data on viral-host interactions, human protein interactions, and drugs to better understand SARS-CoV-2 infection mechanisms and predict new drug-target interactions for COVID-19. We discover that in the human interactome, the human proteins targeted by SARS-CoV-2 proteins and the genes that are differentially expressed after the infection have common neighbors central in the interactome that may be key to the disease mechanisms. We uncover 185 new drug-target interactions targeting 49 of these key genes and suggest re-purposing of 149 FDA-approved drugs, including drugs targeting VEGF and nitric oxide signaling, whose pathways coincide with the observed COVID-19 symptoms. Our integrative methodology is universal and can enable insight into this and other serious diseases.
Collapse
Affiliation(s)
| | | | | | - Noël Malod-Dognin
- Barcelona Supercomputing Center, Barcelona, Spain
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Nataša Pržulj
- Barcelona Supercomputing Center, Barcelona, Spain.
- Department of Computer Science, University College London, London, WC1E 6BT, UK.
- ICREA, Pg. Lluís Companys 23, Barcelona, Spain.
| |
Collapse
|
32
|
Moran KR, Dunson D, Wheeler MW, Herring AH. Bayesian joint modeling of chemical structure and dose response curves. Ann Appl Stat 2021; 15:1405-1430. [DOI: 10.1214/21-aoas1461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - David Dunson
- Department of Statistical Science, Duke University
| | - Matthew W. Wheeler
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences
| | | |
Collapse
|
33
|
Clark AE, Adams H, Hernandez R, Krylov AI, Niklasson AMN, Sarupria S, Wang Y, Wild SM, Yang Q. The Middle Science: Traversing Scale In Complex Many-Body Systems. ACS CENTRAL SCIENCE 2021; 7:1271-1287. [PMID: 34471670 PMCID: PMC8393217 DOI: 10.1021/acscentsci.1c00685] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A roadmap is developed that integrates simulation methodology and data science methods to target new theories that traverse the multiple length- and time-scale features of many-body phenomena.
Collapse
Affiliation(s)
- Aurora E. Clark
- Department of Chemistry, Washington State University, Pullman, Washington 99163, United States
| | - Henry Adams
- Department of Mathematics, Colorado State
University, Fort Collins, Colorado 80523, United States
| | - Rigoberto Hernandez
- Departments
of Chemistry, Chemical and Biomolecular Engineering, and Materials
Science and Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Anna I. Krylov
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Anders M. N. Niklasson
- Theoretical
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sapna Sarupria
- Department of Chemical and Biomolecular Engineering, Center for Optical
Materials Science and Engineering Technologies (COMSET), Clemson University, Clemson, South Carolina 29670, United States
- Department
of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Yusu Wang
- Halıcıŏglu Data Science Institute, University of California, San Diego, La Jolla, California 92093, United States
| | - Stefan M. Wild
- Mathematics
and Computer Science Division, Argonne National
Laboratory, Lemont, Illinois 60439, United
States
| | - Qian Yang
- Computer Science and Engineering Department, University of Connecticut, Storrs, Connecticut 06269-4155, United States
| |
Collapse
|
34
|
Sagandykova G, Buszewski B. Perspectives and recent advances in quantitative structure-retention relationships for high performance liquid chromatography. How far are we? Trends Analyt Chem 2021. [DOI: 10.1016/j.trac.2021.116294] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
35
|
Ekaney LYE, Eni DB, Ntie-Kang F. Chemical similarity methods for analyzing secondary metabolite structures. PHYSICAL SCIENCES REVIEWS 2021. [DOI: 10.1515/psr-2018-0129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
The relation that exists between the structure of a compound and its function is an integral part of chemoinformatics. The similarity principle states that “structurally similar molecules tend to have similar properties and similar molecules exert similar biological activities”. The similarity of the molecules can either be studied at the structure level or at the descriptor level (properties level). Generally, the objective of chemical similarity measures is to enhance prediction of the biological activities of molecules. In this article, an overview of various methods used to compare the similarity between metabolite structures has been provided, including two-dimensional (2D) and three-dimensional (3D) approaches. The focus has been on methods description; e.g. fingerprint-based similarity in which the molecules under study are first fragmented and their fingerprints are computed, 2D structural similarity by comparing the Tanimoto coefficients and Euclidean distances, as well as the use of physiochemical properties descriptor-based similarity methods. The similarity between molecules could also be measured by using data mining (clustering) techniques, e.g. by using virtual screening (VS)-based similarity methods. In this approach, the molecules with the desired descriptors or /and structures are screened from large databases. Lastly, SMILES-based chemical similarity search is an important method for studying the exact structure search, substructure search and also descriptor similarity. The use of a particular method depends upon the requirements of the researcher.
Collapse
Affiliation(s)
- Lena Y. E. Ekaney
- Faculty of Science, Department of Chemistry , University of Buea , P.O. Box 63 , Buea , Cameroon
| | - Donatus B. Eni
- Faculty of Science, Department of Chemistry , University of Buea , P.O. Box 63 , Buea , Cameroon
- Department of Inorganic Chemistry, Faculty of Science , University of Yaoundé I , Yaoundé , Cameroon
| | - Fidele Ntie-Kang
- Faculty of Science, Department of Chemistry , University of Buea , P.O. Box 63 , Buea , Cameroon
- Department of Pharmaceutical Chemistry , Martin-Luther University Halle-Wittenberg , Kurt-Mothes-Str. 3 , Halle (Saale) , 06120 Germany
- Department of Informatics and Chemistry , University of Chemistry and Technology Prague , Technická 5 Prague 6 , Dejvice , 166 28 Czech Republic
| |
Collapse
|
36
|
Artificial intelligence in drug design: algorithms, applications, challenges and ethics. FUTURE DRUG DISCOVERY 2021. [DOI: 10.4155/fdd-2020-0028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI algorithms, the most common of which are summarized in this review. In addition, AI is fraught with challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review also includes examples depicting the implementation of AI and ML in tackling intractable diseases such as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also covered in this review.
Collapse
|
37
|
Chen SB, Rauf A, Ishtiaq M, Naeem M, Aslam A. On ve-degree- and ev-degree-based topological properties of crystallographic structure of cuprite Cu2O. OPEN CHEM 2021. [DOI: 10.1515/chem-2021-0051] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
In the study of chemical graph theory, an enormous number of research analyses have confirmed that the characteristics of chemicals have a nearby connection with their atomic structure. Topological indices were the critical tools for the analysis of these chemical substances to consider the essential topology of chemical structures. Topological descriptors are the significant numerical quantities or invariant in the fields of chemical graph theory. In this study, we have studied the crystal structure of copper oxide (
Cu
2
O
{{\rm{Cu}}}_{2}{\rm{O}}
) chemical graph, and further, we have calculated the ev-degree- and ve-degree-based topological indices of copper oxide chemical graph. This kind of study may be useful for understanding the atomic mechanisms of corrosion and stress–corrosion cracking of copper.
Collapse
Affiliation(s)
- Shu-Bo Chen
- School of Science, Hunan City University , Yiyang 413000 , People’s Republic of China
| | - Abdul Rauf
- Department of Mathematics, Air University Multan Campus , Multan , Pakistan
| | - Muhammad Ishtiaq
- Muhammad Nawaz Sharif University of Agriculture , Multan , Pakistan
| | - Muhammad Naeem
- Department of Mathematics, Air University Multan Campus , Multan , Pakistan
| | - Adnan Aslam
- Department of Natural Sciences and Humanities, University of Engineering and Technology , Lahore , Pakistan (RCET) , Pakistan
| |
Collapse
|
38
|
Abdo A, Pupin M. LINGO-DL: a text-based approach for molecular similarity searching. J Comput Aided Mol Des 2021; 35:657-665. [PMID: 33797669 DOI: 10.1007/s10822-021-00383-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/26/2021] [Indexed: 11/24/2022]
Abstract
The line notations of chemical structures are more compact than those of graphs and connection tables, so they can be useful for storing and transferring a large number of molecular structures. The simplified molecular input line system (SMILES) representation is the most extensively used, as it is much easier to utilise and comprehend than others, and it can be generated automatically from connection tables. A SMILES represents and encodes the molecule structure. It has been used by an existing method, LINGO, to calculate the molecular similarities and predict the structure-related properties. The LINGO method decomposes a canonical SMILES into a set of substrings of four characters referred to as LINGOs. The purpose of LINGO method is to measure the similarity between a pair of molecules by comparing the LINGOs that occur in each molecule. This paper aims to introduce an alternative version of the LINGO method using LINGOs of different lengths, called LINGO-DL. LINGO-DL is based on the fragmentation of canonical SMILES into substrings of three different lengths rather than one in LINGO method. Retrospective virtual screening experiments with MDDR, DUD, and MUV datasets show that the LINGO-DL outperforms the LINGO method, especially when the active molecules being sought have a high degree of structural heterogeneity.
Collapse
Affiliation(s)
- Ammar Abdo
- Universite de Lille, Villeneuve d'Ascq cedex, France.
| | - Maude Pupin
- Universite de Lille, Villeneuve d'Ascq cedex, France
| |
Collapse
|
39
|
Zhou H, Cao H, Skolnick J. FRAGSITE: A Fragment-Based Approach for Virtual Ligand Screening. J Chem Inf Model 2021; 61:2074-2089. [PMID: 33724022 DOI: 10.1021/acs.jcim.0c01160] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
To reduce time and cost, virtual ligand screening (VLS) often precedes experimental ligand screening in modern drug discovery. Traditionally, high-resolution structure-based docking approaches rely on experimental structures, while ligand-based approaches need known binders to the target protein and only explore their nearby chemical space. In contrast, our structure-based FINDSITEcomb2.0 approach takes advantage of predicted, low-resolution structures and information from ligands that bind distantly related proteins whose binding sites are similar to the target protein. Using a boosted tree regression machine learning framework, we significantly improved FINDSITEcomb2.0 by integrating ligand fragment scores as encoded by molecular fingerprints with the global ligand similarity scores of FINDSITEcomb2.0. The new approach, FRAGSITE, exploits our observation that ligand fragments, e.g., rings, tend to interact with stereochemically conserved protein subpockets that also occur in evolutionarily unrelated proteins. FRAGSITE was benchmarked on the 102 protein DUD-E set, where any template protein whose sequence identify >30% to the target was excluded. Within the top 100 ranked molecules, FRAGSITE improves VLS precision and recall by 14.3 and 18.5%, respectively, relative to FINDSITEcomb2.0. Moreover, the mean top 1% enrichment factor increases from 25.2 to 30.2. On average, both outperform state-of-the-art deep learning-based methods such as AtomNet. On the more challenging unbiased set LIT-PCBA, FRAGSITE also shows better performance than ligand similarity-based and docking approaches such as two-dimensional ECFP4 and Surflex-Dock v.3066. On a subset of 23 targets from DEKOIS 2.0, FRAGSITE shows much better performance than the boosted tree regression-based, vScreenML scoring function. Experimental testing of FRAGSITE's predictions shows that it has more hits and covers a more diverse region of chemical space than FINDSITEcomb2.0. For the two proteins that were experimentally tested, DHFR, a well-studied protein that catalyzes the conversion of dihydrofolate to tetrahydrofolate, and the kinase ACVR1, FRAGSITE identified new small-molecule nanomolar binders. Interestingly, one new binder of DHFR is a kinase inhibitor predicted to bind in a new subpocket. For ACVR1, FRAGSITE identified new molecules that have diverse scaffolds and estimated nanomolar to micromolar affinities. Thus, FRAGSITE shows significant improvement over prior state-of-the-art ligand virtual screening approaches. A web server is freely available for academic users at http:/sites.gatech.edu/cssb/FRAGSITE.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| |
Collapse
|
40
|
Rauf A, Ishtiaq M, Siddiqui MK. Topological Study of Hydroxychloroquine Conjugated Molecular Structure Used for Novel Coronavirus (COVID-19) Treatment. Polycycl Aromat Compd 2021. [PMCID: PMC7852296 DOI: 10.1080/10406638.2021.1873807] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The novel coronavirus disease 2019 (Covid-19) is a mutating and recombining pandemic that potentially spreading through an infected person in droplet-generated forms that have affected more than 200 countries and endanger the entire globe. There is no clear strategy for the care of COVID-19 cases. Moreover, experts across the globe are working actively to develop medicinal or anti-virus drugs. On the basis of recent clinical findings and recommendations, the study examined a variety of new medications that have shown antiviral activity against SARS-CoV-2, among other drugs, antimalarial medications Chloroquine (CQ) and Hydroxychloroquine (HCQ) have gained significant publicity to have promising effects against SARS-CoV-2. Linking a bioactive substance to a biocompatible polymer typically provides various concerns, such as improved drug solubilization, improved modification, precise restriction, and controlled discharge. An enormous number of medical analyses have confirmed that the characteristics of medical drugs have a nearby connection with their atomic structure. Medication properties can be acquired by considering the atomic structure of relating drugs. The calculation of the topological index of a medication structure empowers researchers to have a superior comprehension of the physical science and bio-organic attributes of drugs. Ev-degree and ve-degree based topological indices are two novel degrees based indices as of late defined in graph theory. Ev-degree and ve-degree based topological indices have been defined as corresponding to their relating partners. In this paper, we have computed topological indices based on ev-degree and ve-degree for the Hydroxyethyl Starch and Hydroxychloroquine (HCQ-HEC) bioconjugate molecular structure.
Collapse
Affiliation(s)
- Abdul Rauf
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | | |
Collapse
|
41
|
Chu YM, Muhammad MH, Rauf A, Ishtiaq M, Siddiqui MK. Topological Study of Polycyclic Graphite Carbon Nitride. Polycycl Aromat Compd 2020. [DOI: 10.1080/10406638.2020.1857271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Yu-Ming Chu
- Department of Mathematics, Huzhou University, Huzhou, PR China
- Hunan Provincial Key Laboratory of Mathematical Modeling and Analysis in Engineering, Changsha University of Science & Technology, Changsha, PR China
| | - Mehwish Hussain Muhammad
- College of chemistry, School of Chemical Engin eering and Energy, Zhengzhou University, Zhengzhou, China
| | - Abdul Rauf
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | | |
Collapse
|
42
|
Zhao D, Siddiqui MK, Cheema IZ, Muhammad MH, Rauf A, Ishtiaq M. On Molecular Descriptors of Polycyclic Aromatic Hydrocarbon. Polycycl Aromat Compd 2020. [DOI: 10.1080/10406638.2020.1867203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Dongming Zhao
- School of Automation, Wuhan University of Technology, Wuhan, China
| | | | - Imran Zulfiqar Cheema
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, Lahore, Pakistan
| | - Mehwish Hussain Muhammad
- College of chemistry, School of Chemical Engineering and Energy, Zhengzhou University, Zhengzhou, China
| | - Abdul Rauf
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| |
Collapse
|
43
|
Saldívar-González FI, Huerta-García CS, Medina-Franco JL. Chemoinformatics-based enumeration of chemical libraries: a tutorial. J Cheminform 2020; 12:64. [PMID: 33372622 PMCID: PMC7590480 DOI: 10.1186/s13321-020-00466-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/05/2020] [Indexed: 11/10/2022] Open
Abstract
Virtual compound libraries are increasingly being used in computer-assisted drug discovery applications and have led to numerous successful cases. This paper aims to examine the fundamental concepts of library design and describe how to enumerate virtual libraries using open source tools. To exemplify the enumeration of chemical libraries, we emphasize the use of pre-validated or reported reactions and accessible chemical reagents. This tutorial shows a step-by-step procedure for anyone interested in designing and building chemical libraries with or without chemoinformatics experience. The aim is to explore various methodologies proposed by synthetic organic chemists and explore affordable chemical space using open-access chemoinformatics tools. As part of the tutorial, we discuss three examples of design: a Diversity-Oriented-Synthesis library based on lactams, a bis-heterocyclic combinatorial library, and a set of target-oriented molecules: isoindolinone based compounds as potential acetylcholinesterase inhibitors. This manuscript also seeks to contribute to the critical task of teaching and learning chemoinformatics.
Collapse
Affiliation(s)
- Fernanda I. Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510 Mexico, Mexico
| | - C. Sebastian Huerta-García
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510 Mexico, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510 Mexico, Mexico
| |
Collapse
|
44
|
Antelo-Collado A, Carrasco-Velar R, García-Pedrajas N, Cerruela-García G. Maximum common property: a new approach for molecular similarity. J Cheminform 2020; 12:61. [PMID: 33372638 PMCID: PMC7547443 DOI: 10.1186/s13321-020-00462-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 09/14/2020] [Indexed: 12/02/2022] Open
Abstract
The maximum common property similarity (MCPhd) method is presented using descriptors as a new approach to determine the similarity between two chemical compounds or molecular graphs. This method uses the concept of maximum common property arising from the concept of maximum common substructure and is based on the electrotopographic state index for atoms. A new algorithm to quantify the similarity values of chemical structures based on the presented maximum common property concept is also developed in this paper. To verify the validity of this approach, the similarity of a sample of compounds with antimalarial activity is calculated and compared with the results obtained by four different similarity methods: the small molecule subgraph detector (SMSD), molecular fingerprint based (OBabel_FP2), ISIDA descriptors and shape-feature similarity (SHAFTS). The results obtained by the MCPhd method differ significantly from those obtained by the compared methods, improving the quantification of the similarity. A major advantage of the proposed method is that it helps to understand the analogy or proximity between physicochemical properties of the molecular fragments or subgraphs compared with the biological response or biological activity. In this new approach, more than one property can be potentially used. The method can be considered a hybrid procedure because it combines descriptor and the fragment approaches.
Collapse
Affiliation(s)
- Aurelio Antelo-Collado
- University of Informatics Science, Carretera San Antonio de los Baños Km. 2 1/2 , Boyeros, La Habana, Cuba, Havana, Cuba
| | - Ramón Carrasco-Velar
- University of Informatics Science, Carretera San Antonio de los Baños Km. 2 1/2 , Boyeros, La Habana, Cuba, Havana, Cuba
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Cordoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| | - Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Cordoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| |
Collapse
|
45
|
Aguilar-Sánchez R, Méndez-Bermúdez JA, Rodrigues FA, Sigarreta JM. Topological versus spectral properties of random geometric graphs. Phys Rev E 2020; 102:042306. [PMID: 33212571 DOI: 10.1103/physreve.102.042306] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 09/27/2020] [Indexed: 11/07/2022]
Abstract
In this work we perform a detailed statistical analysis of topological and spectral properties of random geometric graphs (RGGs), a graph model used to study the structure and dynamics of complex systems embedded in a two-dimensional space. RGGs, G(n,ℓ), consist of n vertices uniformly and independently distributed on the unit square, where two vertices are connected by an edge if their Euclidian distance is less than or equal to the connection radius ℓ∈[0,sqrt[2]]. To evaluate the topological properties of RGGs we chose two well-known topological indices, the Randić index R(G) and the harmonic index H(G). We characterize the spectral and eigenvector properties of the corresponding randomly weighted adjacency matrices by the use of random matrix theory measures: the ratio between consecutive eigenvalue spacings, the inverse participation ratios, and the information or Shannon entropies S(G). First, we review the scaling properties of the averaged measures, topological and spectral, on RGGs. Then we show that (i) the averaged-scaled indices, 〈R(G)〉 and 〈H(G)〉, are highly correlated with the average number of nonisolated vertices 〈V_{×}(G)〉; and (ii) surprisingly, the averaged-scaled Shannon entropy 〈S(G)〉 is also highly correlated with 〈V_{×}(G)〉. Therefore, we suggest that very reliable predictions of eigenvector properties of RGGs could be made by computing topological indices.
Collapse
Affiliation(s)
- R Aguilar-Sánchez
- Facultad de Ciencias Químicas, Benemérita Universidad Autónoma de Puebla, Puebla 72570, Mexico
| | - J A Méndez-Bermúdez
- Departamento de Matemática Aplicada e Estatística, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo - Campus de São Carlos, Caixa Postal 668, 13560-970 São Carlos, SP, Brazil.,Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico
| | - Francisco A Rodrigues
- Departamento de Matemática Aplicada e Estatística, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo - Campus de São Carlos, Caixa Postal 668, 13560-970 São Carlos, SP, Brazil
| | - José M Sigarreta
- Facultad de Matemáticas, Universidad Autónoma de Guerrero, Carlos E. Adame No.54 Col. Garita, Acapulco Gro. 39650, Mexico
| |
Collapse
|
46
|
Chu YM, Rauf A, Ishtiaq M, Siddiqui MK, Muhammad MH. Topological Properties of Polycyclic Aromatic Nanostars Dendrimers. Polycycl Aromat Compd 2020. [DOI: 10.1080/10406638.2020.1821227] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Yu-Ming Chu
- Department of Mathematics, Huzhou University, Huzhou, P. R. China
- Hunan Provincial Key Laboratory of Mathematical Modeling and Analysis in Engineering, Changsha University of Science & Technology, Changsha, P. R. China
| | - Abdul Rauf
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | | | - Mehwish Hussain Muhammad
- College of chemistry, School of Chemical Engineering and Energy, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
47
|
Monomer structure fingerprints: an extension of the monomer composition version for peptide databases. J Comput Aided Mol Des 2020; 34:1147-1156. [PMID: 32812076 DOI: 10.1007/s10822-020-00336-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 08/12/2020] [Indexed: 10/23/2022]
Abstract
Previously a fingerprint based on monomer composition (MCFP) of nonribosomal peptides (NRPs) has been introduced. MCFP is a novel method for obtaining a representative description of NRP structures from their monomer composition in a fingerprint form. An effective screening and prediction of biological activities has been obtained from Norine NRPs database. In this paper, we present an extension of the MCFP fingerprint. This extension is based on adding few columns into the fingerprint; representing monomer clusters, 2D structures, peptide categories, and peptide diversity. All these data have been extracted from the NRP structure. Experiments with Norine NRPs database showed that the extended MCFP, that can be called Monomer Structure FingerPrint (MSFP) produced high prediction accuracy (> 95%) together with a high recall rate (86%) obtained when MSFP was used for prediction and similarity searching. From this study it appeared that MSFP mainly built from monomer composition can substantially be improved by adding more columns representing useful information about monomer composition and 2D structure of NRPs.
Collapse
|
48
|
Ahamed TKS, Muraleedharan K. A cheminformatic study on chemical space characterization and diversity analysis of 5-LOX inhibitors. J Mol Graph Model 2020; 100:107699. [PMID: 32799052 DOI: 10.1016/j.jmgm.2020.107699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 06/19/2020] [Accepted: 07/10/2020] [Indexed: 10/23/2022]
Abstract
The process of blocking 5-lipoxygenase (5-LOX) catalyzed leukotriene biosynthesis has been recognized for the past few decades as a promising therapeutic strategy for acute inflammatory, allergic, and respiratory diseases. Due to the toxicity effect of FDA approved 5-LOX inhibitor zileuton, novel 5-LOX inhibitors have been sought by the scientific community. As a result, a significant and relevant amount of information on the structure-activity of 5-LOX inhibitors has been released and stored in public databases. In this study, we aimed at the comprehensive cheminformatic characterization of the diversity and complexity of the chemical space of 5-LOX inhibitors and its activating protein FLAP inhibitors by comparing it with the Approved drug space and virtual LOX library. The visual representation of the property space indicates some compounds in the 5-LOX inhibitors space broaden the traditional medicinal space. The structural diversity of the databases is computed using complementary approaches, including Physicochemical Property (PCP) descriptors, molecular fingerprints, and molecular scaffold. With the apparent exception of approved drugs, the 5-LOX dataset shows more diversity compared to FLAP and LOX virtual library set. This study was able to identify the underlying patterns in the chemical and pharmacological properties space that were decisive for the drug discovery and development of 5-LOX inhibitors.
Collapse
Affiliation(s)
| | - K Muraleedharan
- Department of Chemistry, University of Calicut, Malappuram, 673635, India.
| |
Collapse
|
49
|
Ball N, Madden J, Paini A, Mathea M, Palmer AD, Sperber S, Hartung T, van Ravenzwaay B. Key read across framework components and biology based improvements. MUTATION RESEARCH-GENETIC TOXICOLOGY AND ENVIRONMENTAL MUTAGENESIS 2020; 853:503172. [DOI: 10.1016/j.mrgentox.2020.503172] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/09/2020] [Accepted: 03/11/2020] [Indexed: 12/18/2022]
|
50
|
Cai ZQ, Rauf A, Ishtiaq M, Siddiqui MK. On Ve-Degree and Ev-Degree Based Topological Properties of Silicon Carbide Si2C3-II[p, q]. Polycycl Aromat Compd 2020. [DOI: 10.1080/10406638.2020.1747095] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Zheng-Qun Cai
- School of Foreign Studies, Anhui Jianzhu University, Hefei, PR China
| | - Abdul Rauf
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | - Muhammad Ishtiaq
- Department of Computer Science and Engineering, Air University Multan Campus, Multan, Pakistan
| | | |
Collapse
|