1
|
Song C, Qiu J, Luo M, Fu Y, Hu S, Liu W, Zhang D, Chen M, Cao Z, Yang X, Ke B. Identification of N-(((1S,3R,5S)-adamantan-1-yl)methyl)-3-((4-chlorophenyl)sulfonyl)benzenesulfonamide as novel Nav1.8 inhibitor with analgesic profile. Bioorg Med Chem Lett 2024; 110:129862. [PMID: 38944398 DOI: 10.1016/j.bmcl.2024.129862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 06/13/2024] [Accepted: 06/25/2024] [Indexed: 07/01/2024]
Abstract
Chronic pain is a common and challenging clinical problem that significantly impacts patients' quality of life. The sodium channel Nav1.8 plays a crucial role in the occurrence and development of chronic pain, making it one of the key targets for treating chronic pain. In this article, we combined virtual screening with cell membrane chromatography techniques to establish a novel method for rapid high-throughput screening of selective Nav1.8 inhibitors. Using this approach, we identified a small molecule compound 6, which not only demonstrated high affinity and inhibitory activity against Nav1.8 but also exhibited significant inhibitory effects on CFA-induced chronic inflammatory pain. Compared to the positive drug VX-150, compound 6 showed a more prolonged analgesic effect, making it a promising candidate as a Nav1.8 inhibitor with potential clinical applications. This discovery provides a new therapeutic option for the treatment of chronic pain.
Collapse
Affiliation(s)
- Chi Song
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Jie Qiu
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Menglan Luo
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Yihang Fu
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Shilong Hu
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Wencheng Liu
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Di Zhang
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Meiyuan Chen
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Zhihua Cao
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China
| | - Xi Yang
- Department of Anesthesiology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Bowen Ke
- Department of Anesthesiology, Laboratory of Anesthesia and Critical Care Medicine, National-Local Joint Engineering Research Centre of Translational Medicine of Anesthesiology, West China Hospital, Sichuan University, Chengdu 610041 Sichuan, China.
| |
Collapse
|
2
|
Whitehouse AJ, Sanchez-Martinez M, Salehi SM, Kurbatova N, Dean E. Open-Source Approach to GPU-Accelerated Substructure Search. J Chem Inf Model 2024. [PMID: 39225069 DOI: 10.1021/acs.jcim.4c00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Chemical substructure search is a critical task in medicinal chemistry and small-molecule drug discovery, enabling the retrieval of molecules from databases based on specific chemical features. While systems exist for this purpose, the challenge of efficient and swift searching persists, particularly as data storage migrates to the cloud, introducing new complexities. This study provides a comprehensive analysis of chemical substructure searches, showcasing the benefits of graphics processing unit-accelerated fingerprint screening. The research highlights strategies for optimizing performance, making significant advancements in substructure searching, a pivotal aspect of drug discovery and molecular research. The accessible and scalable nature of the proposed approach makes it a valuable resource for scientists aiming to enhance their substructure search capabilities.
Collapse
Affiliation(s)
- Andrew J Whitehouse
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | | | - Seyedeh Maryam Salehi
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | - Natalja Kurbatova
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| | - Euan Dean
- Zifo Technologies Ltd, Office 7, 37-39 Shakespeare Street, Southport, Merseyside PR8 5AB, U.K
| |
Collapse
|
3
|
López-Pérez K, Kim TD, Miranda-Quintana RA. iSIM: instant similarity. DIGITAL DISCOVERY 2024; 3:1160-1171. [PMID: 38873032 PMCID: PMC11167700 DOI: 10.1039/d4dd00041b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/06/2024] [Indexed: 06/15/2024]
Abstract
The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida Gainesville Florida 32611 USA
| | - Taewon D Kim
- Department of Chemistry and Quantum Theory Project, University of Florida Gainesville Florida 32611 USA
| | | |
Collapse
|
4
|
Liu H, Chen P, Hu B, Wang S, Wang H, Luan J, Wang J, Lin B, Cheng M. FaissMolLib: An efficient and easy deployable tool for ligand-based virtual screening. Comput Biol Chem 2024; 110:108057. [PMID: 38581840 DOI: 10.1016/j.compbiolchem.2024.108057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 03/06/2024] [Accepted: 03/20/2024] [Indexed: 04/08/2024]
Abstract
Virtual screening-based molecular similarity and fingerprint are crucial in drug design, target prediction, and ADMET prediction, aiding in identifying potential hits and optimizing lead compounds. However, challenges such as lack of comprehensive open-source molecular fingerprint databases and efficient search methods for virtual screening are prevalent. To address these issues, we introduce FaissMolLib, an open-source virtual screening tool that integrates 2.8 million compounds from ChEMBL and ZINC databases. Notably, FaissMolLib employs the highly efficient Faiss search algorithm, outperforming the Tanimoto algorithm in identifying similar molecules with its tighter clustering in scatter plots and lower mean, standard deviation, and variance in key molecular properties. This feature enables FaissMolLib to screen 2.8 million compounds in just 0.05 seconds, offering researchers an efficient, easily deployable solution for virtual screening on laptops and building unique compound databases. This significant advancement holds great potential for accelerating drug discovery efforts and enhancing chemical data analysis. FaissMolLib is freely available at http://liuhaihan.gnway.cc:80. The code and dataset of FaissMolLib are freely available at https://github.com/Superhaihan/FiassMolLib.
Collapse
Affiliation(s)
- Haihan Liu
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Peiying Chen
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Baichun Hu
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Shizun Wang
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Hanxun Wang
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jiasi Luan
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China.
| | - Bin Lin
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China.
| | - Maosheng Cheng
- Key Laboratory of Structure-Based Drug Design &Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China; School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China.
| |
Collapse
|
5
|
Krishnan SR, Bung N, Srinivasan R, Roy A. Target-specific novel molecules with their recipe: Incorporating synthesizability in the design process. J Mol Graph Model 2024; 129:108734. [PMID: 38442440 DOI: 10.1016/j.jmgm.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
Application of Artificial intelligence (AI) in drug discovery has led to several success stories in recent times. While traditional methods mostly relied upon screening large chemical libraries for early-stage drug-design, de novo design can help identify novel target-specific molecules by sampling from a much larger chemical space. Although this has increased the possibility of finding diverse and novel molecules from previously unexplored chemical space, this has also posed a great challenge for medicinal chemists to synthesize at least some of the de novo designed novel molecules for experimental validation. To address this challenge, in this work, we propose a novel forward synthesis-based generative AI method, which is used to explore the synthesizable chemical space. The method uses a structure-based drug design framework, where the target protein structure and a target-specific seed fragment from co-crystal structures can be the initial inputs. A random fragment from a purchasable fragment library can also be the input if a target-specific fragment is unavailable. Then a template-based forward synthesis route prediction and molecule generation is performed in parallel using the Monte Carlo Tree Search (MCTS) method where, the subsequent fragments for molecule growth can again be obtained from a purchasable fragment library. The rewards for each iteration of MCTS are computed using a drug-target affinity (DTA) model based on the docking pose of the generated reaction intermediates at the binding site of the target protein of interest. With the help of the proposed method, it is now possible to overcome one of the major obstacles posed to the AI-based drug design approaches through the ability of the method to design novel target-specific synthesizable molecules.
Collapse
Affiliation(s)
| | - Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Rajgopal Srinivasan
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India.
| |
Collapse
|
6
|
Thomas JR, Shelton C, Murphy J, Brittain S, Bray MA, Aspesi P, Concannon J, King FJ, Ihry RJ, Ho DJ, Henault M, Hadjikyriacou A, Neri M, Sigoillot FD, Pham HT, Shum M, Barys L, Jones MD, Martin EJ, Blechschmidt A, Rieffel S, Troxler TJ, Mapa FA, Jenkins JL, Jain RK, Kutchukian PS, Schirle M, Renner S. Enhancing the Small-Scale Screenable Biological Space beyond Known Chemogenomics Libraries with Gray Chemical Matter─Compounds with Novel Mechanisms from High-Throughput Screening Profiles. ACS Chem Biol 2024; 19:938-952. [PMID: 38565185 PMCID: PMC11040606 DOI: 10.1021/acschembio.3c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/04/2024]
Abstract
Phenotypic assays have become an established approach to drug discovery. Greater disease relevance is often achieved through cellular models with increased complexity and more detailed readouts, such as gene expression or advanced imaging. However, the intricate nature and cost of these assays impose limitations on their screening capacity, often restricting screens to well-characterized small compound sets such as chemogenomics libraries. Here, we outline a cheminformatics approach to identify a small set of compounds with likely novel mechanisms of action (MoAs), expanding the MoA search space for throughput limited phenotypic assays. Our approach is based on mining existing large-scale, phenotypic high-throughput screening (HTS) data. It enables the identification of chemotypes that exhibit selectivity across multiple cell-based assays, which are characterized by persistent and broad structure activity relationships (SAR). We validate the effectiveness of our approach in broad cellular profiling assays (Cell Painting, DRUG-seq, and Promotor Signature Profiling) and chemical proteomics experiments. These experiments revealed that the compounds behave similarly to known chemogenetic libraries, but with a notable bias toward novel protein targets. To foster collaboration and advance research in this area, we have curated a public set of such compounds based on the PubChem BioAssay dataset and made it available for use by the scientific community.
Collapse
Affiliation(s)
- Jason R. Thomas
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Claude Shelton
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jason Murphy
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Scott Brittain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Mark-Anthony Bray
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Peter Aspesi
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - John Concannon
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Frederick J. King
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Robert J. Ihry
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Daniel J. Ho
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Martin Henault
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Marilisa Neri
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | | | - Helen T. Pham
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Matthew Shum
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Louise Barys
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | - Michael D. Jones
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Eric J. Martin
- Novartis
Biomedical Research, Emeryville, California 94608, United States
| | | | | | | | - Felipa A. Mapa
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jeremy L. Jenkins
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Rishi K. Jain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Markus Schirle
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | |
Collapse
|
7
|
Marin E, Kovaleva M, Kadukova M, Mustafin K, Khorn P, Rogachev A, Mishin A, Guskov A, Borshchevskiy V. Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking. J Chem Inf Model 2024; 64:2612-2623. [PMID: 38157481 PMCID: PMC11005039 DOI: 10.1021/acs.jcim.3c01661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/28/2023] [Accepted: 12/04/2023] [Indexed: 01/03/2024]
Abstract
Structure-based drug discovery is a process for both hit finding and optimization that relies on a validated three-dimensional model of a target biomolecule, used to rationalize the structure-function relationship for this particular target. An ultralarge virtual screening approach has emerged recently for rapid discovery of high-affinity hit compounds, but it requires substantial computational resources. This study shows that active learning with simple linear regression models can accelerate virtual screening, retrieving up to 90% of the top-1% of the docking hit list after docking just 10% of the ligands. The results demonstrate that it is unnecessary to use complex models, such as deep learning approaches, to predict the imprecise results of ligand docking with a low sampling depth. Furthermore, we explore active learning meta-parameters and find that constant batch size models with a simple ensembling method provide the best ligand retrieval rate. Finally, our approach is validated on the ultralarge size virtual screening data set, retrieving 70% of the top-0.05% of ligands after screening only 2% of the library. Altogether, this work provides a computationally accessible approach for accelerated virtual screening that can serve as a blueprint for the future design of low-compute agents for exploration of the chemical space via large-scale accelerated docking. With recent breakthroughs in protein structure prediction, this method can significantly increase accessibility for the academic community and aid in the rapid discovery of high-affinity hit compounds for various targets.
Collapse
Affiliation(s)
- Egor Marin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Margarita Kovaleva
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Maria Kadukova
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- University
Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Khalid Mustafin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Polina Khorn
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Andrey Rogachev
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| | - Alexey Mishin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Albert Guskov
- Groningen
Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Valentin Borshchevskiy
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| |
Collapse
|
8
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
9
|
Bachorz RA, Pastwińska J, Nowak D, Karaś K, Karwaciak I, Ratajewski M. The application of machine learning methods to the prediction of novel ligands for ROR γ/ROR γT receptors. Comput Struct Biotechnol J 2023; 21:5491-5505. [PMID: 38022699 PMCID: PMC10663739 DOI: 10.1016/j.csbj.2023.10.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/11/2023] [Accepted: 10/11/2023] [Indexed: 12/01/2023] Open
Abstract
In this work, we developed and applied a computational procedure for creating and validating predictive models capable of estimating the biological activity of ligands. The combination of modern machine learning methods, experimental data, and the appropriate setup of molecular descriptors led to a set of well-performing models. We thoroughly inspected both the methodological space and various possibilities for creating a chemical feature space. The resulting models were applied to the virtual screening of the ZINC20 database to identify new, biologically active ligands of RORγ receptors, which are a subfamily of nuclear receptors. Based on the known ligands of RORγ, we selected candidates and calculate their predicted activities with the best-performing models. We chose two candidates that were experimentally verified. One of these candidates was confirmed to induce the biological activity of the RORγ receptors, which we consider proof of the efficacy of the proposed methodology.
Collapse
Affiliation(s)
- Rafał A. Bachorz
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| | - Joanna Pastwińska
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| | - Damian Nowak
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| | - Kaja Karaś
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| | - Iwona Karwaciak
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| | - Marcin Ratajewski
- Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, Łódź, 93-232, Poland
| |
Collapse
|
10
|
Sivula T, Yetukuri L, Kalliokoski T, Käsnänen H, Poso A, Pöhner I. Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries. J Chem Inf Model 2023; 63:5773-5783. [PMID: 37655823 PMCID: PMC10523430 DOI: 10.1021/acs.jcim.3c01239] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Indexed: 09/02/2023]
Abstract
The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.
Collapse
Affiliation(s)
- Toni Sivula
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| | | | - Tuomo Kalliokoski
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Heikki Käsnänen
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Antti Poso
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
- Department
of Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls University, Tübingen DE-72076, Germany
- Cluster
of Excellence iFIT (EXC 2180) “Image-Guided and Functionally
Instructed Tumor Therapies”, University
of Tübingen, Tübingen DE-72076, Germany
- Tübingen
Center for Academic Drug Discovery & Development (TüCAD2), Tübingen DE-72076, Germany
| | - Ina Pöhner
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| |
Collapse
|
11
|
Rajan K, Brinkhaus HO, Agea MI, Zielesny A, Steinbeck C. DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat Commun 2023; 14:5045. [PMID: 37598180 PMCID: PMC10439916 DOI: 10.1038/s41467-023-40782-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 08/09/2023] [Indexed: 08/21/2023] Open
Abstract
The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervention - especially the mining of chemical structure depictions. As an open-source platform that leverages recent advancements in deep learning, computer vision, and natural language processing, DECIMER.ai (Deep lEarning for Chemical IMagE Recognition) strives to automatically segment, classify, and translate chemical structure depictions from the printed literature. The segmentation and classification tools are the only openly available packages of their kind, and the optical chemical structure recognition (OCSR) core application yields outstanding performance on all benchmark datasets. The source code, the trained models and the datasets developed in this work have been published under permissive licences. An instance of the DECIMER web application is available at https://decimer.ai .
Collapse
Affiliation(s)
- Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - Henning Otto Brinkhaus
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - M Isabel Agea
- Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technicka 5, 166 28, Prague, Czech Republic
| | - Achim Zielesny
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany.
| |
Collapse
|
12
|
Jung S, Vatheuer H, Czodrowski P. VSFlow: an open-source ligand-based virtual screening tool. J Cheminform 2023; 15:40. [PMID: 37004101 PMCID: PMC10064649 DOI: 10.1186/s13321-023-00703-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 02/18/2023] [Indexed: 04/03/2023] Open
Abstract
Ligand-based virtual screening is a widespread method in modern drug design. It allows for a rapid screening of large compound databases in order to identify similar structures. Here we report an open-source command line tool which includes a substructure-, fingerprint- and shape-based virtual screening. Most of the implemented features fully rely on the RDKit cheminformatics framework. VSFlow accepts a wide range of input file formats and is highly customizable. Additionally, a quick visualization of the screening results as pdf and/or pymol file is supported.
Collapse
Affiliation(s)
- Sascha Jung
- grid.5675.10000 0001 0416 9637Department of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Helge Vatheuer
- grid.5675.10000 0001 0416 9637Department of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Paul Czodrowski
- grid.5802.f0000 0001 1941 7111Department of Chemistry, Johannes Gutenberg University Mainz, Duesbergweg 10-14, 55128 Mainz, Germany
| |
Collapse
|
13
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
14
|
Lehtola S, Karttunen AJ. Free and open source software for computational chemistry education. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Susi Lehtola
- Molecular Sciences Software Institute Blacksburg Virginia USA
| | - Antti J. Karttunen
- Department of Chemistry and Materials Science Aalto University Espoo Finland
| |
Collapse
|
15
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
16
|
Machine Learning-Based Retention Time Prediction of Trimethylsilyl Derivatives of Metabolites. Biomedicines 2022; 10:biomedicines10040879. [PMID: 35453629 PMCID: PMC9024754 DOI: 10.3390/biomedicines10040879] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/04/2022] [Accepted: 04/06/2022] [Indexed: 11/16/2022] Open
Abstract
In gas chromatography–mass spectrometry-based untargeted metabolomics, metabolites are identified by comparing mass spectra and chromatographic retention time with reference databases or standard materials. In that sense, machine learning has been used to predict the retention time of metabolites lacking reference data. However, the retention time prediction of trimethylsilyl derivatives of metabolites, typically analyzed in untargeted metabolomics using gas chromatography, has been poorly explored. Here, we provide a rationalized framework for machine learning-based retention time prediction of trimethylsilyl derivatives of metabolites in gas chromatography. We compared different machine learning paradigms, in addition to exploring the influence of the computational molecular structure representation to train the prediction models: fingerprint class and fingerprint calculation software. Our study challenged predicted retention time when using chemical ionization and electron impact ionization sources in simulated and real cases, demonstrating a good correct identity ranking capability by machine learning, despite observing a limited false identity filtering power in cases where a spectrum or a monoisotopic mass match to multiple candidates. Specifically, machine learning prediction yielded median absolute and relative retention index (relative retention time) errors of 37.1 retention index units and 2%, respectively. In addition, fingerprint class and fingerprint calculation software, as well as the molecular structural similarity between the training and test or real case sets, showed to be critical modulators of the prediction performance. Finally, we leveraged the structural similarity between the training and test or real case set to determine the probability that the prediction error is below a specific threshold. Overall, our study demonstrates that predicted retention time can provide insights into the true structure of unknown metabolites by ranking from the most to the least plausible molecular identity, and sets the guidelines to assess the confidence in metabolite identification using predicted retention time data.
Collapse
|
17
|
Doğan T, Akhan Güzelcan E, Baumann M, Koyas A, Atas H, Baxendale IR, Martin M, Cetin-Atalay R. Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases. PLoS Comput Biol 2021; 17:e1009171. [PMID: 34843456 PMCID: PMC8659301 DOI: 10.1371/journal.pcbi.1009171] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 12/09/2021] [Accepted: 11/09/2021] [Indexed: 12/23/2022] Open
Abstract
Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins' structure/function, and bias in system training datasets. Here, we propose a new method "DRUIDom" (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound-target pairs (~2.9M data points), and used as training data for calculating parameters of compound-domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound-protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound-domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Institute of Informatics, Hacettepe University, Ankara, Turkey
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ece Akhan Güzelcan
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Center for Genomics and Rare Diseases & Biobank for Rare Diseases, Hacettepe University, Ankara, Turkey
| | - Marcus Baumann
- School of Chemistry, University College Dublin, Dublin, Ireland
| | - Altay Koyas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Heval Atas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ian R. Baxendale
- Department of Chemistry, University of Durham, Durham, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Rengul Cetin-Atalay
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
18
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
19
|
Mathai N, Stork C, Kirchmair J. BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space. Int J Mol Sci 2021; 22:ijms22157773. [PMID: 34360558 PMCID: PMC8346018 DOI: 10.3390/ijms22157773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/13/2021] [Accepted: 07/15/2021] [Indexed: 12/21/2022] Open
Abstract
Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the "fitness" of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle ("BonMOLière").
Collapse
Affiliation(s)
- Neann Mathai
- Computational Biology Unit (CBU) and Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany;
| | - Johannes Kirchmair
- Computational Biology Unit (CBU) and Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
20
|
Accelerating Population Count with a Hardware Co-Processor for MicroBlaze. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS 2021. [DOI: 10.3390/jlpea11020020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper proposes a Field-Programmable Gate Array (FPGA)-based hardware accelerator for assisting the embedded MicroBlaze soft-core processor in calculating population count. The population count is frequently required to be executed in cyber-physical systems and can be applied to large data sets, such as in the case of molecular similarity search in cheminformatics, or assisting with computations performed by binarized neural networks. The MicroBlaze instruction set architecture (ISA) does not support this operation natively, so the count has to be realized as either a sequence of native instructions (in software) or in parallel in a dedicated hardware accelerator. Different hardware accelerator architectures are analyzed and compared to one another and to implementing the population count operation in MicroBlaze. The achieved experimental results with large vector lengths (up to 217) demonstrate that the best hardware accelerator with DMA (Direct Memory Access) is ~31 times faster than the best software version running on MicroBlaze. The proposed architectures are scalable and can easily be adjusted to both smaller and bigger input vector lengths. The entire system was implemented and tested on a Nexys-4 prototyping board containing a low-cost/low-power Artix-7 FPGA.
Collapse
|
21
|
Čmelo I, Voršilák M, Svozil D. Profiling and analysis of chemical compounds using pointwise mutual information. J Cheminform 2021; 13:3. [PMID: 33423694 PMCID: PMC7798221 DOI: 10.1186/s13321-020-00483-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/24/2020] [Indexed: 12/21/2022] Open
Abstract
Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound's feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.
Collapse
Affiliation(s)
- I. Čmelo
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
| | - M. Voršilák
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20 Prague 4, Czech Republic
| | - D. Svozil
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20 Prague 4, Czech Republic
| |
Collapse
|
22
|
Zhu CJ, Song M, Liu Q, Becquey C, Bi J. Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search. J Chem Inf Model 2020; 60:6167-6184. [PMID: 33095006 DOI: 10.1021/acs.jcim.0c00393] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
Collapse
|
23
|
Tomberg A, Boström J. Can easy chemistry produce complex, diverse, and novel molecules? Drug Discov Today 2020; 25:2174-2181. [DOI: 10.1016/j.drudis.2020.09.027] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/27/2020] [Accepted: 09/25/2020] [Indexed: 11/24/2022]
|
24
|
Dalke A. Correction to: The chemfp project. J Cheminform 2020. [PMCID: PMC7523378 DOI: 10.1186/s13321-020-00459-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|