1
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
2
|
López-Pérez K, López-López E, Medina-Franco JL, Miranda-Quintana RA. Sampling and Mapping Chemical Space with Extended Similarity Indices. Molecules 2023; 28:6333. [PMID: 37687162 PMCID: PMC10489020 DOI: 10.3390/molecules28176333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open
Abstract
Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA;
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
| | | |
Collapse
|
3
|
Mukherjee G, Braka A, Wu S. Quantifying Functional-Group-like Structural Fragments in Molecules and Its Applications in Drug Design. J Chem Inf Model 2023; 63:2073-2083. [PMID: 36881497 DOI: 10.1021/acs.jcim.3c00050] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
A functional group in a molecule is a structural fragment consisting of a few atoms or a single atom that imparts reactivity to a molecule. Hence, defining functional groups is crucial in chemistry to predict the properties and reactivities of molecules. However, there is no established method in the literature for defining functional groups based on reactivity parameters. In this work, we addressed this issue by designing a set of predefined structural fragments along with reactivity parameters like electron conjugation and ring strain. This approach uses bond orders and atom connectivities to quantify the presence of these fragments within an organic molecule based on a given input molecular coordinate. To assess the effectiveness of this approach, we performed a case study to show the benefits of using these newly designed structural fragments instead of traditional fingerprint-based methods for grouping potential COX1/COX2 inhibitors by screening an approved drug library against aspirin molecule. The structural fragment-based model for ternary classification of rat oral LD50 of chemicals showed performance similar to the fingerprint-based models. In evaluating the regression model performance for aqueous solubility, log(S), predictions, our approach outperformed the fingerprint-based model.
Collapse
Affiliation(s)
- Goutam Mukherjee
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Abdennour Braka
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Sangwook Wu
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea.,Department of Physics, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
4
|
Saldívar-González FI, Medina-Franco JL. Approaches for enhancing the analysis of chemical space for drug discovery. Expert Opin Drug Discov 2022; 17:789-798. [PMID: 35640229 DOI: 10.1080/17460441.2022.2084608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Chemical space is a powerful, general, and practical conceptual framework in drug discovery and other areas in chemistry that addresses the diversity of molecules and it has various applications. Moreover, chemical space is a cornerstone of chemoinformatics as a scientific discipline. In response to the increase in the set of chemical compounds in databases, generators of chemical structures, and tools to calculate molecular descriptors, novel approaches to generate visual representations of chemical space in low dimensions are emerging and evolving. Such approaches include a wide range of commercial and free applications, software, and open-source methods. AREAS COVERED The current state of chemical space in drug design and discovery is reviewed. The topics discussed herein include advances for efficient navigation in chemical space, the use of this concept in assessing the diversity of different data sets, exploring structure-property/activity relationships for one or multiple endpoints, and compound library design. Recent advances in methodologies for generating visual representations of chemical space have been highlighted, thereby emphasizing open-source methods. EXPERT OPINION Quantitative and qualitative generation and analysis of chemical space require novel approaches for handling the increasing number of molecules and their information available in chemical databases (including emerging ultra-large libraries). In addition, it is of utmost importance to note that chemical space is a conceptual framework that goes beyond visual representation in low dimensions. However, the graphical representation of chemical space has several practical applications in drug discovery and beyond.
Collapse
Affiliation(s)
- Fernanda I Saldívar-González
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
5
|
Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022; 13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open
Abstract
Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.
Collapse
Affiliation(s)
- F I Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - V D Aldas-Bulos
- Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| | - J L Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - F Plisson
- CONACYT - Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| |
Collapse
|
6
|
Santiago Á, Guzmán-Ocampo DC, Aguayo-Ortiz R, Dominguez L. Characterizing the Chemical Space of γ-Secretase Inhibitors and Modulators. ACS Chem Neurosci 2021; 12:2765-2775. [PMID: 34291906 DOI: 10.1021/acschemneuro.1c00313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
γ-Secretase (GS) is one of the most attractive molecular targets for the treatment of Alzheimer's disease (AD). Its key role in the final step of amyloid-β peptides generation and its relationship in the cascade of events for disease development have caught the attention of many pharmaceutical groups. Over the past years, different inhibitors and modulators have been evaluated as promising therapeutics against AD. However, despite the great chemical diversity of the reported compounds, a global classification and visual representation of the chemical space for GS inhibitors and modulators remain unavailable. In the present work, we carried out a two-dimensional (2D) chemical space analysis from different classes and subclasses of GS inhibitors and modulators based on their structural similarity. Along with the novel structural information available for GS complexes, our analysis opens the possibility to identify compounds with high molecular similarity, critical to finding new chemical structures through the optimization of existing compounds and relating them with a potential binding site.
Collapse
Affiliation(s)
- Ángel Santiago
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Dulce C. Guzmán-Ocampo
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Rodrigo Aguayo-Ortiz
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Laura Dominguez
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| |
Collapse
|
7
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
8
|
Medina-Franco JL, Saldívar-González FI. Cheminformatics to Characterize Pharmacologically Active Natural Products. Biomolecules 2020; 10:E1566. [PMID: 33213003 PMCID: PMC7698493 DOI: 10.3390/biom10111566] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/11/2020] [Accepted: 11/14/2020] [Indexed: 12/19/2022] Open
Abstract
Natural products have a significant role in drug discovery. Natural products have distinctive chemical structures that have contributed to identifying and developing drugs for different therapeutic areas. Moreover, natural products are significant sources of inspiration or starting points to develop new therapeutic agents. Natural products such as peptides and macrocycles, and other compounds with unique features represent attractive sources to address complex diseases. Computational approaches that use chemoinformatics and molecular modeling methods contribute to speed up natural product-based drug discovery. Several research groups have recently used computational methodologies to organize data, interpret results, generate and test hypotheses, filter large chemical databases before the experimental screening, and design experiments. This review discusses a broad range of chemoinformatics applications to support natural product-based drug discovery. We emphasize profiling natural product data sets in terms of diversity; complexity; acid/base; absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties; and fragment analysis. Novel techniques for the visual representation of the chemical space are also discussed.
Collapse
Affiliation(s)
- José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico;
| | | |
Collapse
|
9
|
Capecchi A, Probst D, Reymond JL. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J Cheminform 2020; 12:43. [PMID: 33431010 PMCID: PMC7291580 DOI: 10.1186/s13321-020-00445-4] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 06/04/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. RESULTS Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. CONCLUSION MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at https://github.com/reymond-group/map4 and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at http://map-search.gdb.tools/ and http://tm.gdb.tools/map4/.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
10
|
Capecchi A, Zhang A, Reymond JL. Populating Chemical Space with Peptides Using a Genetic Algorithm. J Chem Inf Model 2020; 60:121-132. [PMID: 31868369 DOI: 10.1021/acs.jcim.9b01014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In drug discovery, one uses chemical space as a concept to organize molecules according to their structures and properties. One often would like to generate new possible molecules at a specific location in the chemical space marked by a molecule of interest. Herein, we report the peptide design genetic algorithm (PDGA, code available at https://github.com/reymond-group/PeptideDesignGA ), a computational tool capable of producing peptide sequences of various topologies (linear, cyclic/polycyclic, or dendritic) in proximity of any molecule of interest in a chemical space defined by macromolecule extended atom-pair fingerprint (MXFP), an atom-pair fingerprint describing molecular shape and pharmacophores. We show that the PDGA generates high-similarity analogues of bioactive peptides with diverse peptide chain topologies and of nonpeptide target molecules. We illustrate the chemical space accessible by the PDGA with an interactive 3D map of the MXFP property space available at http://faerun.gdb.tools/ . The PDGA should be generally useful to generate peptides at any location in the chemical space.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Alain Zhang
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| |
Collapse
|
11
|
Naveja JJ, Medina-Franco JL. Finding Constellations in Chemical Space Through Core Analysis. Front Chem 2019; 7:510. [PMID: 31380353 PMCID: PMC6646408 DOI: 10.3389/fchem.2019.00510] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 07/03/2019] [Indexed: 12/15/2022] Open
Abstract
Herein we introduce the constellation plots as a general approach that merges different and complementary molecular representations to enhance the information contained in a visual representation and analysis of chemical space. The method is based on a combination of a sub-structure based representation and classification of compounds with a "classical" coordinate-based representation of chemical space. A distinctive outcome of the method is that organizing the compounds in analog series leads to the formation of groups of molecules, aka "constellations" in chemical space. The novel approach is general and can be used to rapidly identify, for instance, insightful and "bright" Structure-Activity Relationships (StARs) in chemical space that are easy to interpret. This kind of analysis is expected to be especially useful for lead identification in large datasets of unannotated molecules, such as those obtained through high-throughput screening. We demonstrate the application of the method using two datasets of focused inhibitors designed against DNMTs and AKT1.
Collapse
Affiliation(s)
- J. Jesús Naveja
- PECEM, School of Medicine, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - José L. Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
12
|
Saldívar-González FI, Pilón-Jiménez BA, Medina-Franco JL. Chemical space of naturally occurring compounds. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
AbstractThe chemical space of naturally occurring compounds is vast and diverse. Other than biologics, naturally occurring small molecules include a large variety of compounds covering natural products from different sources such as plant, marine, and fungi, to name a few, and several food chemicals. The systematic exploration of the chemical space of naturally occurring compounds have significant implications in many areas of research including but not limited to drug discovery, nutrition, bio- and chemical diversity analysis. The exploration of the coverage and diversity of the chemical space of compound databases can be carried out in different ways. The approach will largely depend on the criteria to define the chemical space that is commonly selected based on the goals of the study. This chapter discusses major compound databases of natural products and cheminformatics strategies that have been used to characterize the chemical space of natural products. Recent exemplary studies of the chemical space of natural products from different sources and their relationships with other compounds are also discussed. We also present novel chemical descriptors and data mining approaches that are emerging to characterize the chemical space of naturally occurring compounds.
Collapse
|
13
|
Capecchi A, Awale M, Probst D, Reymond JL. PubChem and ChEMBL beyond Lipinski. Mol Inform 2019; 38:e1900016. [PMID: 30844149 DOI: 10.1002/minf.201900016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 12/13/2022]
Abstract
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP and NLC (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
14
|
Naveja JJ, Rico-Hidalgo MP, Medina-Franco JL. Analysis of a large food chemical database: chemical space, diversity, and complexity. F1000Res 2018; 7:Chem Inf Sci-993. [PMID: 30135721 PMCID: PMC6081979 DOI: 10.12688/f1000research.15440.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/27/2018] [Indexed: 12/18/2022] Open
Abstract
Background: Food chemicals are a cornerstone in the food industry. However, its chemical diversity has been explored on a limited basis, for instance, previous analysis of food-related databases were done up to 2,200 molecules. The goal of this work was to quantify the chemical diversity of chemical compounds stored in FooDB, a database with nearly 24,000 food chemicals. Methods: The visual representation of the chemical space of FooDB was done with ChemMaps, a novel approach based on the concept of chemical satellites. The large food chemical database was profiled based on physicochemical properties, molecular complexity and scaffold content. The global diversity of FoodDB was characterized using Consensus Diversity Plots. Results: It was found that compounds in FooDB are very diverse in terms of properties and structure, with a large structural complexity. It was also found that one third of the food chemicals are acyclic molecules and ring-containing molecules are mostly monocyclic, with several scaffolds common to natural products in other databases. Conclusions: To the best of our knowledge, this is the first analysis of the chemical diversity and complexity of FooDB. This study represents a step further to the emerging field of "Food Informatics". Future study should compare directly the chemical structures of the molecules in FooDB with other compound databases, for instance, drug-like databases and natural products collections.
Collapse
Affiliation(s)
- J. Jesús Naveja
- PECEM, Faculty of Medicine, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| | - Mariel P. Rico-Hidalgo
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| | - José L. Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| |
Collapse
|
15
|
Naveja JJ, Rico-Hidalgo MP, Medina-Franco JL. Analysis of a large food chemical database: chemical space, diversity, and complexity. F1000Res 2018; 7:Chem Inf Sci-993. [PMID: 30135721 PMCID: PMC6081979 DOI: 10.12688/f1000research.15440.2] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2018] [Indexed: 12/18/2022] Open
Abstract
Background: Food chemicals are a cornerstone in the food industry. However, its chemical diversity has been explored on a limited basis, for instance, previous analysis of food-related databases were done up to 2,200 molecules. The goal of this work was to quantify the chemical diversity of chemical compounds stored in FooDB, a database with nearly 24,000 food chemicals. Methods: The visual representation of the chemical space of FooDB was done with ChemMaps, a novel approach based on the concept of chemical satellites. The large food chemical database was profiled based on physicochemical properties, molecular complexity and scaffold content. The global diversity of FooDB was characterized using Consensus Diversity Plots. Results: It was found that compounds in FooDB are very diverse in terms of properties and structure, with a large structural complexity. It was also found that one third of the food chemicals are acyclic molecules and ring-containing molecules are mostly monocyclic, with several scaffolds common to natural products in other databases. Conclusions: To the best of our knowledge, this is the first analysis of the chemical diversity and complexity of FooDB. This study represents a step further to the emerging field of "Food Informatics". Future study should compare directly the chemical structures of the molecules in FooDB with other compound databases, for instance, drug-like databases and natural products collections. An additional future direction of this work is to use the list of 3,228 polyphenolic compounds identified in this work to enhance the on-going polyphenol-protein interactome studies.
Collapse
Affiliation(s)
- J. Jesús Naveja
- PECEM, Faculty of Medicine, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| | - Mariel P. Rico-Hidalgo
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| | - José L. Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| |
Collapse
|
16
|
Naveja JJ, Medina-Franco JL. Insights from pharmacological similarity of epigenetic targets in epipolypharmacology. Drug Discov Today 2018; 23:141-150. [DOI: 10.1016/j.drudis.2017.10.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 09/05/2017] [Accepted: 10/05/2017] [Indexed: 01/10/2023]
|
17
|
Naveja JJ, Oviedo-Osornio CI, Trujillo-Minero NN, Medina-Franco JL. Chemoinformatics: a perspective from an academic setting in Latin America. Mol Divers 2017; 22:247-258. [PMID: 29204824 DOI: 10.1007/s11030-017-9802-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 11/26/2017] [Indexed: 12/13/2022]
Abstract
This perspective discusses the current progress of a chemoinformatics group in a major university in Latin America. Three major aspects are discussed in a critical manner: research, education, and collaboration with industry and other public research networks. It is also presented an overview of the progress in applied research and development of research concepts. Efforts to teach chemoinformatics at the undergraduate and graduate levels are discussed. It is addressed how the partnership with industry and other not-for-profit research institutions not only brings additional sources of funding but, more importantly, increases the impact of the multidisciplinary work and offers the students to be exposed to other research environments. We also discuss the main perspectives and challenges that remain to be addressed in these settings.
Collapse
Affiliation(s)
- J Jesús Naveja
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.,PECEM, Facultad de Medicina, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - C Iluhí Oviedo-Osornio
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Nicole N Trujillo-Minero
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - José L Medina-Franco
- School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| |
Collapse
|
18
|
Naveja JJ, Medina-Franco JL. ChemMaps: Towards an approach for visualizing the chemical space based on adaptive satellite compounds. F1000Res 2017; 6. [PMID: 28794856 PMCID: PMC5538041 DOI: 10.12688/f1000research.12095.2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/03/2017] [Indexed: 01/22/2023] Open
Abstract
We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints’ similarity. The method uses a ‘satellites’ approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets.
Collapse
Affiliation(s)
- J Jesús Naveja
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico.,PECEM, Faculty of Medicine, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico
| |
Collapse
|