1
|
Medina-Franco JL, López-López E, Andrade E, Ruiz-Azuara L, Frei A, Guan D, Zuegg J, Blaskovich MA. Bridging informatics and medicinal inorganic chemistry: toward a database of metallodrugs and metallodrug candidates. Drug Discov Today 2022; 27:1420-1430. [DOI: 10.1016/j.drudis.2022.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/04/2021] [Accepted: 02/22/2022] [Indexed: 12/11/2022]
|
2
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
3
|
Applicability Domain of Active Learning in Chemical Probe Identification: Convergence in Learning from Non-Specific Compounds and Decision Rule Clarification. Molecules 2019; 24:molecules24152716. [PMID: 31357419 PMCID: PMC6696588 DOI: 10.3390/molecules24152716] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 07/19/2019] [Accepted: 07/24/2019] [Indexed: 12/27/2022] Open
Abstract
Efficient identification of chemical probes for the manipulation and understanding of biological systems demands specificity for target proteins. Computational means to optimize candidate compound selection for experimental selectivity evaluation are being sought. The active learning virtual screening method has demonstrated the ability to efficiently converge on predictive models with reduced datasets, though its applicability domain to probe identification has yet to be determined. In this article, we challenge active learning’s ability to predict inhibitory bioactivity profiles of selective compounds when learning from chemogenomic features found in non-selective ligand-target pairs. Comparison of controls versus multiple molecule representations de-convolutes factors contributing to predictive capability. Experiments using the matrix metalloproteinase family demonstrate maximum probe bioactivity prediction achieved from only approximately 20% of non-probe bioactivity; this data volume is consistent with prior chemogenomic active learning studies despite the increased difficulty from chemical biology experimental settings used here. Feature weight analyses are combined with a custom visualization to unambiguously detail how active learning arrives at classification decisions, yielding clarified expectations for chemogenomic modeling. The results influence tactical decisions for computational probe design and discovery.
Collapse
|
4
|
|
5
|
Computationally derived compound profiling matrices. Future Sci OA 2018; 4:FSO327. [PMID: 30271615 PMCID: PMC6153460 DOI: 10.4155/fsoa-2018-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 06/11/2018] [Indexed: 11/17/2022] Open
Abstract
Aim: Screening of compounds against panels of targets yields profiling matrices. Such matrices are excellent test cases for the analysis and prediction of ligand–target interactions. We made three matrices freely available that were extracted from public screening data. Methodology: A new algorithm was used to derive complete profiling matrices from assay data. Data: Two profiling matrices were derived from confirmatory assays containing 53 different targets and 109,925 and 143,310 distinct compounds, respectively. A third matrix was extracted from primary screening assays covering 171 different targets and 224,251 compounds. Next steps: Profiling matrices can be used to test computational chemogenomics methods for their ability to predict ligand–target pairs. Additional matrices will be generated for individual target families. Screening of a given number of small molecules in different assays produces a so-called profiling matrix. This matrix reports for each compound inactivity or activity in all assays. Such profiling matrices are frequently produced in the pharmaceutical industry but rarely disclosed. We have recently reported a computational methodology to derive such matrices from independently conducted biological assays. Herein, we describe three large profiling matrices we have extracted from many experimental screens and made publicly available. These matrices should be helpful to investigators studying the interactions of small molecules with different biological targets.
Shown is a small compound profiling matrix resulting from assaying four compounds (rows) against four target proteins (columns). ‘+’ and ‘−’ signs denote compound activity and inactivity, respectively.
Collapse
|
6
|
Vogt M, Jasial S, Bajorath J. Extracting Compound Profiling Matrices from Screening Data. ACS OMEGA 2018; 3:4706-4712. [PMID: 30023898 PMCID: PMC6044819 DOI: 10.1021/acsomega.8b00461] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 04/20/2018] [Indexed: 05/11/2023]
Abstract
Compound profiling matrices record assay results for compound libraries tested against panels of targets. In addition to their relevance for exploring structure-activity relationships, such matrices are of considerable interest for chemoinformatic and chemogenomic applications. For example, profiling matrices provide a valuable data resource for the development and evaluation of machine learning approaches for multitask activity prediction. However, experimental compound profiling matrices are rare in the public domain. Although they are generated in pharmaceutical settings, they are typically not disclosed. Herein, we present an algorithm for the generation of large profiling matrices, for example, containing more than 100 000 compounds exhaustively tested against 50 to 100 targets. The new methodology is a variant of biclustering algorithms originally introduced for large-scale analysis of genomics data. Our approach is applied here to assays from the PubChem BioAssay database and generates profiling matrices of increasing assay or compound coverage by iterative removal of entities that limit coverage. Weight settings control final matrix size by preferentially retaining assays or compounds. In addition, the methodology can also be applied to generate matrices enriched with active entries representing above-average assay hit rates.
Collapse
|
7
|
Kooistra AJ, Vass M, McGuire R, Leurs R, de Esch IJP, Vriend G, Verhoeven S, de Graaf C. 3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery. ChemMedChem 2018; 13:614-626. [PMID: 29337438 PMCID: PMC5900740 DOI: 10.1002/cmdc.201700754] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/11/2018] [Indexed: 01/06/2023]
Abstract
eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure-based bioactivity data mapping, ii) structure-based identification of scaffold replacement strategies for ligand design, iii) ligand-based target prediction, iv) protein sequence-based binding site identification and ligand repurposing, and v) structure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.
Collapse
Affiliation(s)
- Albert J. Kooistra
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Márton Vass
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Ross McGuire
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- BioAxis Research, Pivot ParkOssThe Netherlands
| | - Rob Leurs
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
| | | | - Chris de Graaf
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
8
|
Rakers C, Najnin RA, Polash AH, Takeda S, Brown J. Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families. ChemMedChem 2018; 13:511-521. [DOI: 10.1002/cmdc.201700677] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/04/2017] [Indexed: 01/21/2023]
Affiliation(s)
- Christin Rakers
- Institute of Transformative bio-Molecules, WPI-ITbM; Nagoya University; Furo-cho Chikusa-ku Nagoya 464-8602 Japan
| | - Rifat Ara Najnin
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Ahsan Habib Polash
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Shunichi Takeda
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - J.B. Brown
- Laboratory for Molecular Biosciences; Kyoto University Graduate School of Medicine; Yoshida-konoemachi Building E 606-8501 Kyoto Sakyo Japan
| |
Collapse
|
9
|
Abstract
Aim: Computational chemogenomics models the compound–protein interaction space, typically for drug discovery, where existing methods predominantly either incorporate increasing numbers of bioactivity samples or focus on specific subfamilies of proteins and ligands. As an alternative to modeling entire large datasets at once, active learning adaptively incorporates a minimum of informative examples for modeling, yielding compact but high quality models. Results/methodology: We assessed active learning for protein/target family-wide chemogenomic modeling by replicate experiment. Results demonstrate that small yet highly predictive models can be extracted from only 10–25% of large bioactivity datasets, irrespective of molecule descriptors used. Conclusion: Chemogenomic active learning identifies small subsets of ligand–target interactions in a large screening database that lead to knowledge discovery and highly predictive models.
Collapse
|
10
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
11
|
Balfer J, Hu Y, Bajorath J. Compound Structure-Independent Activity Prediction in High-Dimensional Target Space. Mol Inform 2014; 33:544-58. [PMID: 27486040 DOI: 10.1002/minf.201400051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Accepted: 05/20/2014] [Indexed: 11/10/2022]
Abstract
Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets.
Collapse
Affiliation(s)
- Jenny Balfer
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, D-53113 Bonn,Germany tel: +49-228-2699-306; fax: +49-228-2699-341
| | - Ye Hu
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, D-53113 Bonn,Germany tel: +49-228-2699-306; fax: +49-228-2699-341
| | - Jürgen Bajorath
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, D-53113 Bonn,Germany tel: +49-228-2699-306; fax: +49-228-2699-341.
| |
Collapse
|
12
|
Hu Y, Gupta-Ostermann D, Bajorath J. Exploring compound promiscuity patterns and multi-target activity spaces. Comput Struct Biotechnol J 2014; 9:e201401003. [PMID: 24688751 PMCID: PMC3962225 DOI: 10.5936/csbj.201401003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 01/13/2014] [Accepted: 01/17/2014] [Indexed: 11/23/2022] Open
Abstract
Compound promiscuity is rationalized as the specific interaction of a small molecule with multiple biological targets (as opposed to non-specific binding events) and represents the molecular basis of polypharmacology, an emerging theme in drug discovery and chemical biology. This concise review focuses on recent studies that have provided a detailed picture of the degree of promiscuity among different categories of small molecules. In addition, an exemplary computational approach is discussed that is designed to navigate multi-target activity spaces populated with various compounds.
Collapse
Affiliation(s)
- Ye Hu
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany ; These authors contributed equally to this work
| | - Disha Gupta-Ostermann
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany ; These authors contributed equally to this work
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| |
Collapse
|
13
|
Medina-Franco JL, Méndez-Lucio O, Martinez-Mayorga K. The Interplay Between Molecular Modeling and Chemoinformatics to Characterize Protein–Ligand and Protein–Protein Interactions Landscapes for Drug Discovery. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 96:1-37. [DOI: 10.1016/bs.apcsb.2014.06.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
14
|
Fourches D. Cheminformatics: At the Crossroad of Eras. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2014. [DOI: 10.1007/978-94-017-9257-8_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|