1
|
Martínez-Urrutia F, Medina-Franco JL. BIOMX-DB: A web application for the BIOFACQUIM natural product database. Mol Inform 2024:e202400060. [PMID: 38837557 DOI: 10.1002/minf.202400060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/23/2024] [Accepted: 03/30/2024] [Indexed: 06/07/2024]
Abstract
Natural product databases are an integral part of chemoinformatics and computer-aided drug design. Despite their pivotal role, a distinct scarcity of projects in Latin America, particularly in Mexico, provides accessible tools of this nature. Herein, we introduce BIOMX-DB, an open and freely accessible web-based database designed to address this gap. BIOMX-DB enhances the features of the existing Mexican natural product database, BIOFACQUIM, by incorporating advanced search, filtering, and download capabilities. The user-friendly interface of BIOMX-DB aims to provide an intuitive experience for researchers. For seamless access, BIOMX-DB is freely available at www.biomx-db.com.
Collapse
Affiliation(s)
- Fernando Martínez-Urrutia
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
2
|
Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024; 25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open
Abstract
The screening of enzymes for catalyzing specific substrate-product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we have developed the Substrate-product Pair-based Enzyme Promiscuity Prediction (SPEPP) model. This innovative approach utilizes transfer learning and transformer architecture to predict enzyme promiscuity, thereby elucidating the intricate interplay between enzymes and substrate-product pairs. SPEPP exhibited robust predictive ability, eliminating the need for prior knowledge of reactions and allowing users to define their own candidate-enzyme libraries. It can be seamlessly integrated into various applications, including metabolic engineering, de novo pathway design, and hazardous material degradation. To better assist metabolic engineers in designing and refining biochemical pathways, particularly those without programming skills, we also designed EnzyPick, an easy-to-use web server for enzyme screening based on SPEPP. EnzyPick is accessible at http://www.biosynther.com/enzypick/.
Collapse
Affiliation(s)
- Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Yingying Le
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Institute of Environmental Engineering, ETH Zurich, Laura-Hezner-Weg 7, 8093 Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
3
|
Wang R, Feng H, Wei GW. ChatGPT in Drug Discovery: A Case Study on Anticocaine Addiction Drug Development with Chatbots. J Chem Inf Model 2023; 63:7189-7209. [PMID: 37956228 PMCID: PMC11021135 DOI: 10.1021/acs.jcim.3c01429] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The birth of ChatGPT, a cutting-edge language model-based chatbot developed by OpenAI, ushered in a new era in AI. However, due to potential pitfalls, its role in rigorous scientific research is not clear yet. This paper vividly showcases its innovative application within the field of drug discovery. Focused specifically on developing anticocaine addiction drugs, the study employs GPT-4 as a virtual guide, offering strategic and methodological insights to researchers working on generative models for drug candidates. The primary objective is to generate optimal drug-like molecules with desired properties. By leveraging the capabilities of ChatGPT, the study introduces a novel approach to the drug discovery process. This symbiotic partnership between AI and researchers transforms how drug development is approached. Chatbots become facilitators, steering researchers toward innovative methodologies and productive paths for creating effective drug candidates. This research sheds light on the collaborative synergy between human expertise and AI assistance, wherein ChatGPT's cognitive abilities enhance the design and development of pharmaceutical solutions. This paper not only explores the integration of advanced AI in drug discovery but also reimagines the landscape by advocating for AI-powered chatbots as trailblazers in revolutionizing therapeutic innovation.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
4
|
Probst D. An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification. J Cheminform 2023; 15:113. [PMID: 37996942 PMCID: PMC10668483 DOI: 10.1186/s13321-023-00784-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023] Open
Abstract
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given the lack of large and balanced annotated data sets of enzyme-catalysed reactions, assigning an enzyme to a reaction still relies on expert-curated rules and databases. Here, we present a data-driven explainable human-in-the-loop machine learning approach to support and ultimately automate the association of a catalysing enzyme with a given biochemical reaction. In addition, the proposed method is capable of predicting enzymes as candidate catalysts for organic reactions amendable to biocatalysis. Finally, the introduced explainability and visualisation methods can easily be generalised to support other machine-learning approaches involving chemical and biochemical reactions.
Collapse
Affiliation(s)
- Daniel Probst
- Signal Processing Laboratory 2, Institute of Electrical and Micro Engineering, School of Engineering, EPFL, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland.
| |
Collapse
|
5
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2023. [PMID: 37870486 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
6
|
Diedrich K, Krause B, Berg O, Rarey M. PoseEdit: enhanced ligand binding mode communication by interactive 2D diagrams. J Comput Aided Mol Des 2023; 37:491-503. [PMID: 37515714 PMCID: PMC10440272 DOI: 10.1007/s10822-023-00522-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/13/2023] [Indexed: 07/31/2023]
Abstract
In this article, we present PoseEdit, a new, interactive frontend of the popular pose visualization tool PoseView. PoseEdit automatically produces high-quality 2D diagrams of intermolecular interactions in 3D binding sites calculated from ligands in complex with protein, DNA, and RNA. The PoseView diagrams have been improved in several aspects, most notably in their interactivity. Thanks to the easy-to-use 2D editor of PoseEdit, the diagrams are extensively editable and extendible by the user, can be merged with other diagrams, and even be created from scratch. A large variety of graphical objects in the diagram can be moved, rotated, selected and highlighted, mirrored, removed, or even newly added. Furthermore, PoseEdit enables a synchronized 2D-3D view of macromolecule-ligand complexes simplifying the analysis of structural features and interactions. The representation of individual diagram objects regarding their visualized chemical properties, like stereochemistry, and general graphical styles, like the color of interactions, can additionally be edited. The primary objective of PoseEdit is to support scientists with an enhanced way to communicate ligand binding mode information through graphical 2D representations optimized with the scientist's input in accordance with objective criteria and individual needs. PoseEdit is freely available on the ProteinsPlus web server ( https://proteins.plus ).
Collapse
Affiliation(s)
- Konrad Diedrich
- Universität Hamburg, ZBH-Center for Bioinformatics, 20146, Hamburg, Germany
| | - Bennet Krause
- Universität Hamburg, ZBH-Center for Bioinformatics, 20146, Hamburg, Germany
- Capgemini, 10785, Berlin, Germany
| | - Ole Berg
- Universität Hamburg, ZBH-Center for Bioinformatics, 20146, Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH-Center for Bioinformatics, 20146, Hamburg, Germany.
| |
Collapse
|
7
|
Rath S, Panda S, Sacchettini JC, Berthel SJ. DAIKON: A Data Acquisition, Integration, and Knowledge Capture Web Application for Target-Based Drug Discovery. ACS Pharmacol Transl Sci 2023; 6:1043-1051. [PMID: 37470023 PMCID: PMC10353056 DOI: 10.1021/acsptsci.3c00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Indexed: 07/21/2023]
Abstract
Primitive data organization practices struggle to deliver at the scale and consistency required to meet multidisciplinary collaborations in drug discovery. For effective data sharing and coordination, a unified platform that can collect and analyze scientific information is essential. We present DAIKON, an open-source framework that integrates targets, screens, hits, and manages projects within a target-based drug discovery portfolio. Its knowledge capture components enable teams to record subsequent molecules as their properties improve, facilitate team collaboration through discussion threads, and include modules that visually illustrate the progress of each target as it advances through the pipeline. It serves as a repository for scientists sourcing data from Mycobrowser, UniProt, PDB. The goal is to globalize several variations of the drug-discovery program without compromising local aspects of specific workflows. DAIKON is modularized by abstracting the database and creating separate layers for entities, business logic, infrastructure, APIs, and frontend, with each tier allowing for extensions. Using Docker, the framework is packaged into two solutions: daikon-server-core and daikon-client. Organizations may deploy the project to on-premises servers or VPC. Active-Directory/SSO is supported for user administration. End users can access the application with a web browser. Currently, DAIKON is implemented in the TB Drug Accelerator program (TBDA).
Collapse
Affiliation(s)
- Siddhant Rath
- Department
of Biochemistry & Biophysics, Texas
A&M University, College
Station, Texas 77843, United States
| | - Saswati Panda
- Department
of Biochemistry & Biophysics, Texas
A&M University, College
Station, Texas 77843, United States
| | - James C. Sacchettini
- Department
of Biochemistry & Biophysics, Texas
A&M University, College
Station, Texas 77843, United States
| | | |
Collapse
|
8
|
Design of New Dispersants Using Machine Learning and Visual Analytics. Polymers (Basel) 2023; 15:polym15051324. [PMID: 36904566 PMCID: PMC10007083 DOI: 10.3390/polym15051324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 02/23/2023] [Accepted: 02/25/2023] [Indexed: 03/09/2023] Open
Abstract
Artificial intelligence (AI) is an emerging technology that is revolutionizing the discovery of new materials. One key application of AI is virtual screening of chemical libraries, which enables the accelerated discovery of materials with desired properties. In this study, we developed computational models to predict the dispersancy efficiency of oil and lubricant additives, a critical property in their design that can be estimated through a quantity named blotter spot. We propose a comprehensive approach that combines machine learning techniques with visual analytics strategies in an interactive tool that supports domain experts' decision-making. We evaluated the proposed models quantitatively and illustrated their benefits through a case study. Specifically, we analyzed a series of virtual polyisobutylene succinimide (PIBSI) molecules derived from a known reference substrate. Our best-performing probabilistic model was Bayesian Additive Regression Trees (BART), which achieved a mean absolute error of 5.50±0.34 and a root mean square error of 7.56±0.47, as estimated through 5-fold cross-validation. To facilitate future research, we have made the dataset, including the potential dispersants used for modeling, publicly available. Our approach can help accelerate the discovery of new oil and lubricant additives, and our interactive tool can aid domain experts in making informed decisions based on blotter spot and other key properties.
Collapse
|
9
|
XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J Cheminform 2023; 15:2. [PMID: 36609340 PMCID: PMC9817292 DOI: 10.1186/s13321-022-00673-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Explainable artificial intelligence (XAI) methods have shown increasing applicability in chemistry. In this context, visualization techniques can highlight regions of a molecule to reveal their influence over a predicted property. For this purpose, some XAI techniques calculate attribution scores associated with tokens of SMILES strings or with atoms of a molecule. While an association of a score with an atom can be directly visually represented on a molecule diagram, scores computed for SMILES non-atom tokens cannot. For instance, a substring [N+] contains 3 non-atom tokens, i.e., [, [Formula: see text], and ], and their attributions, depending on the model, are not necessarily revealing an influence of the nitrogen atom over the predicted property; for that reason, it is not possible to represent the scores on a molecule diagram. Moreover, SMILES's notation is complex, foregrounding the need for techniques to facilitate the analysis of explanations associated with their tokens. RESULTS We propose XSMILES, an interactive visualization technique, to explore explainable artificial intelligence attributions scores and support the interpretation of SMILES. Users can input any type of score attributed to atom and non-atom tokens and visualize them on top of a 2D molecule diagram coordinated with a bar chart that represents a SMILES string. We demonstrate how attributions calculated for SMILES strings can be evaluated and better interpreted through interactivity with two use cases. CONCLUSIONS Data scientists can use XSMILES to understand their models' behavior and compare multiple modeling approaches. The tool provides a set of parameters to adapt the visualization to users' needs and it can be integrated into different platforms. We believe XSMILES can support data scientists to develop, improve, and communicate their models by making it easier to identify patterns and compare attributions through interactive exploratory visualization.
Collapse
|
10
|
Allard PM, Gaudry A, Quirós-Guerrero LM, Rutz A, Dounoue-Kubo M, Walker TWN, Defossez E, Long C, Grondin A, David B, Wolfender JL. Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts. Gigascience 2022; 12:giac124. [PMID: 36649739 PMCID: PMC9845059 DOI: 10.1093/gigascience/giac124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/15/2022] [Accepted: 11/29/2022] [Indexed: 01/19/2023] Open
Abstract
As privileged structures, natural products often display potent biological activities. However, the discovery of novel bioactive scaffolds is often hampered by the chemical complexity of the biological matrices they are found in. Large natural extract collections are thus extremely valuable for their chemical novelty potential but also complicated to exploit in the frame of drug-discovery projects. In the end, it is the pure chemical substances that are desired for structural determination purposes and bioactivity evaluation. Researchers interested in the exploration of large and chemodiverse extract collections should thus establish strategies aiming to efficiently tackle such chemical complexity and access these structures. Establishing carefully crafted digital layers documenting the spectral and chemical complexity as well as bioactivity results of natural extracts collections can help prioritize time-consuming but mandatory isolation efforts. In this note, we report the results of our initial exploration of a collection of 1,600 plant extracts in the frame of a drug-discovery effort. After describing the taxonomic coverage of this collection, we present the results of its liquid chromatography high-resolution mass spectrometric profiling and the exploitation of these profiles using computational solutions. The resulting annotated mass spectral dataset and associated chemical and taxonomic metadata are made available to the community, and data reuse cases are proposed. We are currently continuing our exploration of this plant extract collection for drug-discovery purposes (notably looking for novel antitrypanosomatids, anti-infective and prometabolic compounds) and ecometabolomics insights. We believe that such a dataset can be exploited and reused by researchers interested in computational natural products exploration.
Collapse
Affiliation(s)
- Pierre-Marie Allard
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, 1211 Geneva, Switzerland
- Department of Biology, University of Fribourg, 1700 Fribourg, Switzerland
| | - Arnaud Gaudry
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, 1211 Geneva, Switzerland
| | - Luis-Manuel Quirós-Guerrero
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, 1211 Geneva, Switzerland
| | - Adriano Rutz
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, 1211 Geneva, Switzerland
| | - Miwa Dounoue-Kubo
- Faculty of Pharmaceutical Sciences, Tokushima Bunri University, 770-8514 Tokushima, Japan
| | - Tom W N Walker
- Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| | - Emmanuel Defossez
- Department of Biology, University of Fribourg, 1700 Fribourg, Switzerland
- Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| | - Christophe Long
- Direction Scientifique Naturactive, Pierre Fabre Medicament, 81100 Castres, France
| | - Antonio Grondin
- Green Mission Pierre Fabre, Institut de Recherche Pierre Fabre, 31562 Toulouse, France
| | - Bruno David
- Green Mission Pierre Fabre, Institut de Recherche Pierre Fabre, 31562 Toulouse, France
| | - Jean-Luc Wolfender
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, 1211 Geneva, Switzerland
| |
Collapse
|
11
|
d'Oelsnitz S, Love JD, Diaz DJ, Ellington AD. GroovDB: A Database of Ligand-Inducible Transcription Factors. ACS Synth Biol 2022; 11:3534-3537. [PMID: 36178800 DOI: 10.1021/acssynbio.2c00382] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Genetic biosensors are integral to synthetic biology. In particular, ligand-inducible prokaryotic transcription factors are frequently used in high-throughput screening, for dynamic feedback regulation, as multilayer logic gates, and in diagnostic applications. In order to provide a curated source that users can rely on for engineering applications, we have developed GroovDB (available at https://groov.bio), a Web-accessible database of ligand-inducible transcription factors that contains all information necessary to build chemically responsive genetic circuits, including biosensor sequence, ligand, and operator data. Ligand and DNA interaction data have been verified against the literature, while an automated data curation pipeline is used to programmatically fetch metadata, structural information, and references for every entry. A custom tool to visualize the natural genetic context of biosensor entries provides potential insights into alternative ligands and systems biology.
Collapse
Affiliation(s)
- Simon d'Oelsnitz
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, United States
| | - Joshua D Love
- Independent Web Developer, Bentonville, Arizona 72712, United States
| | - Daniel J Diaz
- Department of Chemistry, University of Texas at Austin, Austin, Texas 78712, United States
| | - Andrew D Ellington
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
12
|
Panei FP, Torchet R, Ménager H, Gkeka P, Bonomi M. HARIBOSS: a curated database of RNA-small molecules structures to aid rational drug design. Bioinformatics 2022; 38:4185-4193. [PMID: 35799352 DOI: 10.1093/bioinformatics/btac483] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION RNA molecules are implicated in numerous fundamental biological processes and many human pathologies, such as cancer, neurodegenerative disorders, muscular diseases and bacterial infections. Modulating the mode of action of disease-implicated RNA molecules can lead to the discovery of new therapeutical agents and even address pathologies linked to 'undruggable' protein targets. This modulation can be achieved by direct targeting of RNA with small molecules. As of today, only a few RNA-targeting small molecules are used clinically. One of the main obstacles that have hampered the development of a rational drug design protocol to target RNA with small molecules is the lack of a comprehensive understanding of the molecular mechanisms at the basis of RNA-small molecule (RNA-SM) recognition. RESULTS Here, we present Harnessing RIBOnucleic acid-Small molecule Structures (HARIBOSS), a curated collection of RNA-SM structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and cryo-electron microscopy. HARIBOSS facilitates the exploration of drug-like compounds known to bind RNA, the analysis of ligands and pockets properties and ultimately the development of in silico strategies to identify RNA-targeting small molecules. AVAILABILITY AND IMPLEMENTATION HARIBOSS can be explored via a web interface available at http://hariboss.pasteur.cloud. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- F P Panei
- Sanofi, R&D, Data & In Silico Sciences, 91385 Chilly Mazarin, France.,Department of Structural Biology and Chemistry, Institut Pasteur, Université Paris Cité, CNRS UMR 3528, 75015 Paris, France.,Ecole Doctorale Complexité du Vivant, Sorbonne Université, 75005 Paris, France
| | - R Torchet
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - H Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - P Gkeka
- Sanofi, R&D, Data & In Silico Sciences, 91385 Chilly Mazarin, France
| | - M Bonomi
- Department of Structural Biology and Chemistry, Institut Pasteur, Université Paris Cité, CNRS UMR 3528, 75015 Paris, France
| |
Collapse
|
13
|
Terlouw BR, Vromans SPJM, Medema MH. PIKAChU: a Python-based informatics kit for analysing chemical units. J Cheminform 2022; 14:34. [PMID: 35672769 PMCID: PMC9172152 DOI: 10.1186/s13321-022-00616-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 05/21/2022] [Indexed: 11/15/2022] Open
Abstract
As efforts to computationally describe and simulate the biochemical world become more commonplace, computer programs that are capable of in silico chemistry play an increasingly important role in biochemical research. While such programs exist, they are often dependency-heavy, difficult to navigate, or not written in Python, the programming language of choice for bioinformaticians. Here, we introduce PIKAChU (Python-based Informatics Kit for Analysing CHemical Units): a cheminformatics toolbox with few dependencies implemented in Python. PIKAChU builds comprehensive molecular graphs from SMILES strings, which allow for easy downstream analysis and visualisation of molecules. While the molecular graphs PIKAChU generates are extensive, storing and inferring information on aromaticity, chirality, charge, hybridisation and electron orbitals, PIKAChU limits itself to applications that will be sufficient for most casual users and downstream Python-based tools and databases, such as Morgan fingerprinting, similarity scoring, substructure matching and customisable visualisation. In addition, it comes with a set of functions that assists in the easy implementation of reaction mechanisms. Its minimalistic design makes PIKAChU straightforward to use and install, in stark contrast to many existing toolkits, which are more difficult to navigate and come with a plethora of dependencies that may cause compatibility issues with downstream tools. As such, PIKAChU provides an alternative for researchers for whom basic cheminformatic processing suffices, and can be easily integrated into downstream bioinformatics and cheminformatics tools. PIKAChU is available at https://github.com/BTheDragonMaster/pikachu.
Collapse
|
14
|
Rutz A, Sorokina M, Galgonek J, Mietchen D, Willighagen E, Gaudry A, Graham JG, Stephan R, Page R, Vondrášek J, Steinbeck C, Pauli GF, Wolfender JL, Bisson J, Allard PM. The LOTUS initiative for open knowledge management in natural products research. eLife 2022; 11:e70780. [PMID: 35616633 PMCID: PMC9135406 DOI: 10.7554/elife.70780] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 03/22/2022] [Indexed: 12/17/2022] Open
Abstract
Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.
Collapse
Affiliation(s)
- Adriano Rutz
- School of Pharmaceutical Sciences, University of GenevaGenevaSwitzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of GenevaGenevaSwitzerland
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaJenaGermany
| | - Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CASPragueCzech Republic
| | - Daniel Mietchen
- Ronin InstituteMontclairUnited States
- Leibniz Institute of Freshwater Ecology and Inland FisheriesBerlinGermany
- School of Data Science, University of VirginiaCharlottesvilleUnited States
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, Maastricht UniversityMaastrichtNetherlands
| | - Arnaud Gaudry
- School of Pharmaceutical Sciences, University of GenevaGenevaSwitzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of GenevaGenevaSwitzerland
| | - James G Graham
- Center for Natural Product Technologies and WHO Collaborating Centre for Traditional Medicine (WHO CC/TRM), Pharmacognosy Institute; College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
| | - Ralf Stephan
- Ontario Institute for Cancer Research (OICR), University Ave SuiteTorontoCanada
| | | | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CASPragueCzech Republic
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University JenaJenaGermany
| | - Guido F Pauli
- Center for Natural Product Technologies and WHO Collaborating Centre for Traditional Medicine (WHO CC/TRM), Pharmacognosy Institute; College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
| | - Jean-Luc Wolfender
- School of Pharmaceutical Sciences, University of GenevaGenevaSwitzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of GenevaGenevaSwitzerland
| | - Jonathan Bisson
- Center for Natural Product Technologies and WHO Collaborating Centre for Traditional Medicine (WHO CC/TRM), Pharmacognosy Institute; College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Illinois at ChicagoChicagoUnited States
| | - Pierre-Marie Allard
- School of Pharmaceutical Sciences, University of GenevaGenevaSwitzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of GenevaGenevaSwitzerland
- Department of Biology, University of FribourgFribourgSwitzerland
| |
Collapse
|
15
|
Rodrigues CHM, Ascher DB. CSM-Potential: mapping protein interactions and biological ligands in 3D space using geometric deep learning. Nucleic Acids Res 2022; 50:W204-W209. [PMID: 35609999 PMCID: PMC9252741 DOI: 10.1093/nar/gkac381] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/19/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Recent advances in protein structural modelling have enabled the accurate prediction of the holo 3D structures of almost any protein, however protein function is intrinsically linked to the interactions it makes. While a number of computational approaches have been proposed to explore potential biological interactions, they have been limited to specific interactions, and have not been readily accessible for non-experts or use in bioinformatics pipelines. Here we present CSM-Potential, a geometric deep learning approach to identify regions of a protein surface that are likely to mediate protein-protein and protein-ligand interactions in order to provide a link between 3D structure and biological function. Our method has shown robust performance, outperforming existing methods for both predictive tasks. By assessing the performance of CSM-Potential on independent blind tests, we show that our method was able to achieve ROC AUC values of up to 0.81 for the identification of potential protein-protein binding sites, and up to 0.96 accuracy on biological ligand classification. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
16
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
17
|
Expanding biochemical knowledge and illuminating metabolic dark matter with ATLASx. Nat Commun 2022; 13:1560. [PMID: 35322036 PMCID: PMC8943196 DOI: 10.1038/s41467-022-29238-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 03/07/2022] [Indexed: 12/23/2022] Open
Abstract
Metabolic “dark matter” describes currently unknown metabolic processes, which form a blind spot in our general understanding of metabolism and slow down the development of biosynthetic cell factories and naturally derived pharmaceuticals. Mapping the dark matter of metabolism remains an open challenge that can be addressed globally and systematically by existing computational solutions. In this work, we use 489 generalized enzymatic reaction rules to map both known and unknown metabolic processes around a biochemical database of 1.5 million biological compounds. We predict over 5 million reactions and integrate nearly 2 million naturally and synthetically-derived compounds into the global network of biochemical knowledge, named ATLASx. ATLASx is available to researchers as a powerful online platform that supports the prediction and analysis of biochemical pathways and evaluates the biochemical vicinity of molecule classes (https://lcsb-databases.epfl.ch/Atlas2). “Mapping the dark matter of metabolism remains an open challenge that can be addressed globally and systematically by existing computational solutions. Here the authors present ATLASx, a repository of known and predicted enzymatic reaction, connecting millions of compounds to help synthetic biologists and metabolic engineers to design and explore metabolic pathways.”
Collapse
|
18
|
Rodrigues CHM, Pires DEV, Ascher DB. pdCSM-PPI: Using Graph-Based Signatures to Identify Protein-Protein Interaction Inhibitors. J Chem Inf Model 2021; 61:5438-5445. [PMID: 34719929 DOI: 10.1021/acs.jcim.1c01135] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-protein interactions are promising sites for development of selective drugs; however, they have generally been viewed as challenging targets. Molecules targeting protein-protein interactions tend to be larger and more lipophilic than other drug-like molecules, mimicking the properties of interacting interfaces. Here, we propose a machine learning approach that uses a graph-based representation of small molecules to guide identification of inhibitors modulating protein-protein interactions, pdCSM-PPI. This approach was applied to 21 different PPI targets. We developed interaction-specific models that were able to accurately identify active compounds achieving MCC and F1 scores up to 1, and Pearson's correlations up to 0.87, outperforming previous approaches. Using insights from these individual models, we developed a generic protein-protein interaction modulator predictive model, which accurately predicted IC50 with a Pearson's correlation of 0.64 on a low redundancy blind test. Importantly, we were able to accurately identify active from inactive compounds, achieving an AUC of 0.77 and sensitivity and specificity of 76% and 78%, respectively. We believe pdCSM-PPI will be an important tool to help guide more efficient screening of new PPI inhibitors; it is freely available as an easy-to-use web server and API at http://biosig.unimelb.edu.au/pdcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
19
|
Xu Z, Wauchope OR, Frank AT. Navigating Chemical Space by Interfacing Generative Artificial Intelligence and Molecular Docking. J Chem Inf Model 2021; 61:5589-5600. [PMID: 34633194 DOI: 10.1021/acs.jcim.1c00746] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Here, we report the implementation and application of a simple, structure-aware framework to generate target-specific screening libraries. Our approach combines advances in generative artificial intelligence (AI) with conventional molecular docking to explore chemical space conditioned on the unique physicochemical properties of the active site of a biomolecular target. As a demonstration, we used our framework, which we refer to as sample-and-dock, to construct focused libraries for cyclin-dependent kinase type-2 (CDK2) and the active site of the main protease (Mpro) of the SARS-CoV-2 virus. We envision that the sample-and-dock framework could be used to generate theoretical maps of the chemical space specific to a given target and so provide information about its molecular recognition characteristics.
Collapse
Affiliation(s)
- Ziqiao Xu
- Chemistry Department, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | - Orrette R Wauchope
- Department of Natural Sciences, City University of New York, Baruch College, New York, New York 10010, United States
| | - Aaron T Frank
- Biophysics Program, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
20
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
21
|
Capecchi A, Reymond JL. Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning. J Cheminform 2021; 13:82. [PMID: 34663470 PMCID: PMC8524952 DOI: 10.1186/s13321-021-00559-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 10/02/2021] [Indexed: 01/13/2023] Open
Abstract
Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.
Collapse
Affiliation(s)
- Alice Capecchi
- 1 Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- 1 Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
22
|
SimilarityLab: Molecular Similarity for SAR Exploration and Target Prediction on the Web. Processes (Basel) 2021. [DOI: 10.3390/pr9091520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Exploration of chemical space around hit, experimental, and known active compounds is an important step in the early stages of drug discovery. In academia, where access to chemical synthesis efforts is restricted in comparison to the pharma-industry, hits from primary screens are typically followed up through purchase and testing of similar compounds, before further funding is sought to begin medicinal chemistry efforts. Rapid exploration of druglike similars and structure–activity relationship profiles can be achieved through our new webservice SimilarityLab. In addition to searching for commercially available molecules similar to a query compound, SimilarityLab also enables the search of compounds with recorded activities, generating consensus counts of activities, which enables target and off-target prediction. In contrast to other online offerings utilizing the USRCAT similarity measure, SimilarityLab’s set of commercially available small molecules is consistently updated, currently containing over 12.7 million unique small molecules, and not relying on published databases which may be many years out of date. This ensures researchers have access to up-to-date chemistries and synthetic processes enabling greater diversity and access to a wider area of commercial chemical space. All source code is available in the SimilarityLab source repository.
Collapse
|
23
|
Přívratský J, Novák J. MassSpecBlocks: a web-based tool to create building blocks and sequences of nonribosomal peptides and polyketides for tandem mass spectra analysis. J Cheminform 2021; 13:51. [PMID: 34233741 PMCID: PMC8265115 DOI: 10.1186/s13321-021-00530-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 06/30/2021] [Indexed: 11/16/2022] Open
Abstract
Nonribosomal peptides and polyketides are natural products commonly synthesized by microorganisms. They are widely used in medicine, agriculture, environmental protection, and other fields. The structures of natural products are often analyzed by high-resolution tandem mass spectrometry, which becomes more popular with its increasing availability. However, the characterization of nonribosomal peptides and polyketides from tandem mass spectra is a nontrivial task because they are composed of many uncommon building blocks in addition to proteinogenic amino acids. Moreover, many of them have cyclic and branch-cyclic structures. Here, we introduce MassSpecBlocks – an open-source and web-based tool that converts the input chemical structures in SMILES format into sequences of building blocks. The structures can be searched in public databases PubChem, ChemSpider, ChEBI, NP Atlas, COCONUT, and Norine and edited in a user-friendly graphical interface. Although MassSpecBlocks can serve as a stand-alone database, our primary goal was to enable easy construction of custom sequence and building block databases, which can be used to annotate mass spectra in CycloBranch software. CycloBranch is an open-source, cross-platform, and stand-alone tool that we recently released for annotating spectra of linear, cyclic, branched, and branch-cyclic nonribosomal peptides and polyketide siderophores. The sequences and building blocks created in MassSpecBlocks can be easily exported into a plain text format used by CycloBranch. MassSpecBlocks is available online or can be installed entirely offline. It offers a REST API to cooperate with other tools. ![]()
Collapse
Affiliation(s)
- Jan Přívratský
- Faculty of Information Technology, Czech Technical University in Prague, Thákurova 9, 160 00, Prague, Czech Republic
| | - Jiří Novák
- Faculty of Information Technology, Czech Technical University in Prague, Thákurova 9, 160 00, Prague, Czech Republic. .,Institute of Microbiology, Czech Academy of Sciences, Vídeňská 1083, 142 20, Prague, Czech Republic.
| |
Collapse
|
24
|
Green H, Durrant JD. DeepFrag: An Open-Source Browser App for Deep-Learning Lead Optimization. J Chem Inf Model 2021; 61:2523-2529. [PMID: 34029094 PMCID: PMC8243318 DOI: 10.1021/acs.jcim.1c00103] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 11/28/2022]
Abstract
Lead optimization, a critical step in early stage drug discovery, involves making chemical modifications to a small-molecule ligand to improve properties such as binding affinity. We recently developed DeepFrag, a deep-learning model capable of recommending such modifications. Though a powerful hypothesis-generating tool, DeepFrag is currently implemented in Python and so requires a certain degree of computational expertise. To encourage broader adoption, we have created the DeepFrag browser app, which provides a user-friendly graphical user interface that runs the DeepFrag model in users' web browsers. The browser app does not require users to upload their molecular structures to a third-party server, nor does it require the separate installation of any third-party software. We are hopeful that the app will be a useful tool for both researchers and students. It can be accessed free of charge, without registration, at http://durrantlab.com/deepfrag. The source code is also available at http://git.durrantlab.com/jdurrant/deepfrag-app, released under the terms of the open-source Apache License, Version 2.0.
Collapse
Affiliation(s)
- Harrison Green
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
25
|
Hu B, Lin A, Brinson LC. ChemProps: A RESTful API enabled database for composite polymer name standardization. J Cheminform 2021; 13:22. [PMID: 33712066 PMCID: PMC7955638 DOI: 10.1186/s13321-021-00502-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/01/2021] [Indexed: 11/24/2022] Open
Abstract
The inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names is a major challenge for widespread use of polymer related data resources and limits broad application of materials informatics for innovation in broad classes of polymer science and polymeric based materials. The current solution of using a variety of different chemical identifiers has proven insufficient to address the challenge and is not intuitive for researchers. This work proposes a multi-algorithm-based mapping methodology entitled ChemProps that is optimized to solve the polymer indexing issue with easy-to-update design both in depth and in width. RESTful API is enabled for lightweight data exchange and easy integration across data systems. A weight factor is assigned to each algorithm to generate scores for candidate chemical names and optimized to maximize the minimum value of the score difference between the ground truth chemical name and the other candidate chemical names. Ten-fold validation is utilized on the 160 training data points to prevent overfitting issues. The obtained set of weight factors achieves a 100% test accuracy on the 54 test data points. The weight factors will evolve as ChemProps grows. With ChemProps, other polymer databases can remove duplicate entries and enable a more accurate “search by SMILES” function by using ChemProps as a common name-to-SMILES translator through API calls. ChemProps is also an excellent tool for auto-populating polymer properties thanks to its easy-to-update design.
Collapse
Affiliation(s)
- Bingyin Hu
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, 27708, USA
| | - Anqi Lin
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, 27708, USA
| | - L Catherine Brinson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, 27708, USA.
| |
Collapse
|
26
|
|
27
|
Schwaller P, Probst D, Vaucher AC, Nair VH, Kreutter D, Laino T, Reymond JL. Mapping the space of chemical reactions using attention-based neural networks. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-020-00284-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
28
|
Torchet R, Druart K, Ruano LC, Moine-Franel A, Borges H, Doppelt-Azeroual O, Brancotte B, Mareuil F, Nilges M, Ménager H, Sperandio O. The iPPI-DB initiative: A Community-centered database of Protein-Protein Interaction modulators. Bioinformatics 2021; 37:89-96. [PMID: 33416858 PMCID: PMC8034526 DOI: 10.1093/bioinformatics/btaa1091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 11/25/2020] [Accepted: 12/23/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION One avenue to address the paucity of clinically testable targets is to reinvestigate the druggable genome by tackling complicated types of targets such as Protein-Protein Interactions (PPIs). Given the challenge to target those interfaces with small chemical compounds, it has become clear that learning from successful examples of PPI modulation is a powerful strategy. Freely-accessible databases of PPI modulators that provide the community with tractable chemical and pharmacological data, as well as powerful tools to query them, are therefore essential to stimulate new drug discovery projects on PPI targets. RESULTS Here, we present the new version iPPI-DB, our manually curated database of PPI modulators. In this completely redesigned version of the database, we introduce a new web interface relying on crowdsourcing for the maintenance of the database. This interface was created to enable community contributions, whereby external experts can suggest new database entries. Moreover, the data model, the graphical interface, and the tools to query the database have been completely modernized and improved. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. AVAILABILITY AND IMPLEMENTATION The iPPI-DB server is available at https://ippidb.pasteur.fr The source code for this server is available at https://gitlab.pasteur.fr/ippidb/ippidb-web/ and is distributed under GPL licence (http://www.gnu.org/licences/gpl). Queries can be shared through persistent links according to the FAIR data standards. Data can be downloaded from the website as csv files. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rachel Torchet
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Karen Druart
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Luis Checa Ruano
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | | | - Hélène Borges
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Olivia Doppelt-Azeroual
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Bryan Brancotte
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Fabien Mareuil
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Michael Nilges
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Olivier Sperandio
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| |
Collapse
|
29
|
Poirier M, Pujol-Giménez J, Manatschal C, Bühlmann S, Embaby A, Javor S, Hediger MA, Reymond JL. Pyrazolyl-pyrimidones inhibit the function of human solute carrier protein SLC11A2 (hDMT1) by metal chelation. RSC Med Chem 2020; 11:1023-1031. [PMID: 33479694 PMCID: PMC7649969 DOI: 10.1039/d0md00085j] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 05/06/2020] [Indexed: 12/22/2022] Open
Abstract
Solute carrier proteins (SLCs) control fluxes of ions and molecules across biological membranes and represent an emerging class of drug targets. SLC11A2 (hDMT1) mediates intestinal iron uptake and its inhibition might be used to treat iron overload diseases such as hereditary hemochromatosis. Here we report a micromolar (IC50 = 1.1 μM) pyrazolyl-pyrimidone inhibitor of radiolabeled iron uptake in hDMT1 overexpressing HEK293 cells acting by a non-competitive mechanism, which however does not affect the electrophysiological properties of the transporter. Isothermal titration calorimetry, competition with calcein, induced precipitation of radioactive iron and cross inhibition of the unrelated iron transporter SLC39A8 (hZIP8) indicate that inhibition is mediated by metal chelation. Mapping the chemical space of thousands of pyrazolo-pyrimidones and similar 2,2'-diazabiaryls in ChEMBL suggests that their reported activities might partly reflect metal chelation. Such metal chelating groups are not listed in pan-assay interference compounds (PAINS) but should be checked when addressing SLCs.
Collapse
Affiliation(s)
- Marion Poirier
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland .
| | - Jonai Pujol-Giménez
- Institute of Biochemistry and Molecular Medicine , University of Bern , Bühlstrasse 28 , 3012 Bern , Switzerland
- Membrane Transport Discovery Lab , Department of Nephrology and Hypertension , Inselspital , University of Bern Kinderklinik , Freiburgstrasse 15 , 3010 Bern , Switzerland .
- Department of Biomedical Research , University of Bern , Murtenstrasse 35 , 3008 Bern , Switzerland
| | - Cristina Manatschal
- Department of Biochemistry , University of Zürich , Winterthurerstrasse 190 , Zürich , Switzerland
| | - Sven Bühlmann
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland .
| | - Ahmed Embaby
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland .
| | - Sacha Javor
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland .
| | - Matthias A Hediger
- Institute of Biochemistry and Molecular Medicine , University of Bern , Bühlstrasse 28 , 3012 Bern , Switzerland
- Membrane Transport Discovery Lab , Department of Nephrology and Hypertension , Inselspital , University of Bern Kinderklinik , Freiburgstrasse 15 , 3010 Bern , Switzerland .
- Department of Biomedical Research , University of Bern , Murtenstrasse 35 , 3008 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland .
| |
Collapse
|
30
|
Borrel A, Auerbach SS, Houck KA, Kleinstreuer NC. Tox21BodyMap: a webtool to map chemical effects on the human body. Nucleic Acids Res 2020; 48:W472-W476. [PMID: 32491175 PMCID: PMC7319561 DOI: 10.1093/nar/gkaa433] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/21/2020] [Accepted: 05/11/2020] [Indexed: 09/29/2023] Open
Abstract
To support rapid chemical toxicity assessment and mechanistic hypothesis generation, here we present an intuitive webtool allowing a user to identify target organs in the human body where a substance is estimated to be more likely to produce effects. This tool, called Tox21BodyMap, incorporates results of 9,270 chemicals tested in the United States federal Tox21 research consortium in 971 high-throughput screening (HTS) assays whose targets were mapped onto human organs using organ-specific gene expression data. Via Tox21BodyMap's interactive tools, users can visualize chemical target specificity by organ system, and implement different filtering criteria by changing gene expression thresholds and activity concentration parameters. Dynamic network representations, data tables, and plots with comprehensive activity summaries across all Tox21 HTS assay targets provide an overall picture of chemical bioactivity. Tox21BodyMap webserver is available at https://sandbox.ntp.niehs.nih.gov/bodymap/.
Collapse
|
31
|
Abstract
The adsorption of a dye to a metal oxide surface such as TiO2, NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful for some applications, it can result in lower performance for dye-sensitized solar cells. To understand this phenomenon better, we have conducted an extensive search of the literature and identified over 4000 records of absorption spectra in solution and after adsorption onto metal oxide. The total data set comprises over 3500 unique compounds, with observed absorption maxima in solution and after adsorption on the semiconductor electrode. This data may serve to provide further insight into the structure-property relationships governing dye-aggregation behaviour.
Collapse
|
32
|
MBLinhibitors.com, a Website Resource Offering Information and Expertise for the Continued Development of Metallo--Lactamase Inhibitors. Biomolecules 2020; 10:biom10030459. [PMID: 32188106 PMCID: PMC7175331 DOI: 10.3390/biom10030459] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/09/2020] [Accepted: 03/12/2020] [Indexed: 12/29/2022] Open
Abstract
In an effort to facilitate the discovery of new, improved inhibitors of the metallo-β-lactamases (MBLs), a new, interactive website called MBLinhibitors.com was developed. Despite considerable efforts from the science community, there are no clinical inhibitors of the MBLs, which are now produced by human pathogens. The website, MBLinhibitors.com, contains a searchable database of known MBL inhibitors, and inhibitors can be searched by chemical name, chemical formula, chemical structure, Simplified Molecular-Input Line-Entry System (SMILES) format, and by the MBL on which studies were conducted. The site will also highlight a “MBL Inhibitor of the Month”, and researchers are invited to submit compounds for this feature. Importantly, MBLinhibitors.com was designed to encourage collaboration, and researchers are invited to submit their new compounds, using the “Submit” function on the site, as well as their expertise using the “Collaboration” function. The intention is for this site to be interactive, and the site will be improved in the future as researchers use the site and suggest improvements. It is hoped that MBLinhibitors.com will serve as the one-stop site for any important information on MBL inhibitors and will aid in the discovery of a clinically useful MBL inhibitor.
Collapse
|
33
|
Probst D, Reymond JL. Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 2020; 12:12. [PMID: 33431043 PMCID: PMC7015965 DOI: 10.1186/s13321-020-0416-x] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/04/2020] [Indexed: 01/10/2023] Open
Abstract
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.![]()
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
34
|
Urán Landaburu L, Berenstein AJ, Videla S, Maru P, Shanmugam D, Chernomoretz A, Agüero F. TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration. Nucleic Acids Res 2020; 48:D992-D1005. [PMID: 31680154 PMCID: PMC7145610 DOI: 10.1093/nar/gkz999] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/11/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
The volume of biological, chemical and functional data deposited in the public domain is growing rapidly, thanks to next generation sequencing and highly-automated screening technologies. These datasets represent invaluable resources for drug discovery, particularly for less studied neglected disease pathogens. To leverage these datasets, smart and intensive data integration is required to guide computational inferences across diverse organisms. The TDR Targets chemogenomics resource integrates genomic data from human pathogens and model organisms along with information on bioactive compounds and their annotated activities. This report highlights the latest updates on the available data and functionality in TDR Targets 6. Based on chemogenomic network models providing links between inhibitors and targets, the database now incorporates network-driven target prioritizations, and novel visualizations of network subgraphs displaying chemical- and target-similarity neighborhoods along with associated target-compound bioactivity links. Available data can be browsed and queried through a new user interface, that allow users to perform prioritizations of protein targets and chemical inhibitors. As such, TDR Targets now facilitates the investigation of drug repurposing against pathogen targets, which can potentially help in identifying candidate targets for bioactive compounds with previously unknown targets. TDR Targets is available at https://tdrtargets.org.
Collapse
Affiliation(s)
- Lionel Urán Landaburu
- Instituto de Investigaciones Biotecnológicas “Rodolfo Ugalde” (IIB), Universidad de San Martín, San Martín, B1650HMP, Buenos Aires, Argentina
- Instituto de Investigaciones Biotecnológicas (IIBIO), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, B1650HMP Buenos Aires, Argentina
| | - Ariel J Berenstein
- Fundación Instituto Leloir, Patricias Argentinas 435, Ciudad Autónoma de Buenos Aires, Argentina
| | - Santiago Videla
- Fundación Instituto Leloir, Patricias Argentinas 435, Ciudad Autónoma de Buenos Aires, Argentina
| | - Parag Maru
- Biochemical Sciences Division, CSIR- National Chemical Laboratory, Pune, India
- Faculty of Sciences, Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Dhanasekaran Shanmugam
- Biochemical Sciences Division, CSIR- National Chemical Laboratory, Pune, India
- Faculty of Sciences, Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Ariel Chernomoretz
- Fundación Instituto Leloir, Patricias Argentinas 435, Ciudad Autónoma de Buenos Aires, Argentina
- Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Ciudad Autónoma de Buenos Aires, Argentina
| | - Fernán Agüero
- Instituto de Investigaciones Biotecnológicas “Rodolfo Ugalde” (IIB), Universidad de San Martín, San Martín, B1650HMP, Buenos Aires, Argentina
- Instituto de Investigaciones Biotecnológicas (IIBIO), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), San Martín, B1650HMP Buenos Aires, Argentina
| |
Collapse
|
35
|
Capecchi A, Zhang A, Reymond JL. Populating Chemical Space with Peptides Using a Genetic Algorithm. J Chem Inf Model 2020; 60:121-132. [PMID: 31868369 DOI: 10.1021/acs.jcim.9b01014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In drug discovery, one uses chemical space as a concept to organize molecules according to their structures and properties. One often would like to generate new possible molecules at a specific location in the chemical space marked by a molecule of interest. Herein, we report the peptide design genetic algorithm (PDGA, code available at https://github.com/reymond-group/PeptideDesignGA ), a computational tool capable of producing peptide sequences of various topologies (linear, cyclic/polycyclic, or dendritic) in proximity of any molecule of interest in a chemical space defined by macromolecule extended atom-pair fingerprint (MXFP), an atom-pair fingerprint describing molecular shape and pharmacophores. We show that the PDGA generates high-similarity analogues of bioactive peptides with diverse peptide chain topologies and of nonpeptide target molecules. We illustrate the chemical space accessible by the PDGA with an interactive 3D map of the MXFP property space available at http://faerun.gdb.tools/ . The PDGA should be generally useful to generate peptides at any location in the chemical space.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Alain Zhang
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| |
Collapse
|
36
|
Luque Ruiz I, Gómez-Nieto MÁ. Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:587-615. [PMID: 31469296 DOI: 10.1080/1062936x.2019.1644666] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 07/14/2019] [Indexed: 06/10/2023]
Abstract
The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.
Collapse
Affiliation(s)
- I Luque Ruiz
- Department of Computing and Numerical Analysis, Campus de Rabanales, University of Córdoba , Córdoba , Spain
| | - M Á Gómez-Nieto
- Department of Computing and Numerical Analysis, Campus de Rabanales, University of Córdoba , Córdoba , Spain
| |
Collapse
|
37
|
Capecchi A, Awale M, Probst D, Reymond JL. PubChem and ChEMBL beyond Lipinski. Mol Inform 2019; 38:e1900016. [PMID: 30844149 DOI: 10.1002/minf.201900016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 12/13/2022]
Abstract
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP and NLC (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
38
|
Awale M, Reymond JL. Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning. J Chem Inf Model 2018; 59:10-17. [PMID: 30558418 DOI: 10.1021/acs.jcim.8b00524] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4) and features an unprecedented combination of nearest neighbor (NN) searches and Naı̈ve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at http://gdb.unibe.ch .
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| |
Collapse
|