1
|
Spiers RC, Kalivas JH. Reliable Model Selection without Reference Values by Utilizing Model Diversity with Prediction Similarity. J Chem Inf Model 2021; 61:2220-2230. [PMID: 33900749 DOI: 10.1021/acs.jcim.0c01493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Predictive modeling (calibration or training) with various data formats, such as near-infrared (NIR) spectra and quantitative structure-activity relationship (QSAR) data, provides essential information if a proper model is selected. Similarly, with a general model selection approach, spectral model maintenance (updating) from original modeling conditions to new conditions can be performed for dynamic modeling. Fundamental modeling (partial least-squares (PLS) and others) and maintenance processes (domain adaptation or transfer learning and others) require selection of tuning parameter(s) values to isolate models that can accurately predict new samples or molecules, e.g., number of PLS latent variables to predict analyte concentration. Regardless of the modeling task, model selection is complex and without a reliable protocol. Tuning parameter selection typically depends on only one model quality measure assessing model bias using prediction accuracy. Developed in this paper is a generic model selection process using concepts from consensus modeling and QSAR activity landscapes. It is a consensus filtering approach that prioritizes model diversity (MD) while conserving prediction similarity (PS) fused with a common bias-variance trade-off measure. A significant feature of MDPS is that a cross-validation scheme is not needed because models are selected relative to predicting new samples or molecules, i.e., model selection uses unlabeled samples (without reference values) for active predictions. The versatility and reliability of MDPS model selection is shown using four NIR data sets and a QSAR data set. The study also substantiates the Rashomon effect where there is not one best model tuning parameter value that provides accurate predictions.
Collapse
Affiliation(s)
- Robert C Spiers
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| | - John H Kalivas
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| |
Collapse
|
2
|
Maggiora G, Medina-Franco JL, Iqbal J, Vogt M, Bajorath J. From Qualitative to Quantitative Analysis of Activity and Property Landscapes. J Chem Inf Model 2020; 60:5873-5880. [PMID: 33205984 DOI: 10.1021/acs.jcim.0c01249] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Activity or, more generally, property landscapes (PLs) have been considered as an attractive way to visualize and explore structure-property relationships (SPRs) contained in large data sets of chemical compounds. For graphical analysis, three-dimensional representations reminiscent of natural landscapes are particularly intuitive. So far, the use of such landscape models has essentially been confined to qualitative assessment. We describe recent efforts to analyze PLs in a more quantitative manner, which make it possible to calculate topographical similarity values for comparison of landscape models as a measure of relative SPR information content.
Collapse
Affiliation(s)
- Gerald Maggiora
- University of Arizona BIO5 Institute, 1657 East Helen Street, Tucson, Arizona 85721-0240, United States
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Javed Iqbal
- Department of Life Science Informatics, B-IT LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, Bonn D-53115, Germany
| | - Martin Vogt
- Department of Life Science Informatics, B-IT LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, Bonn D-53115, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, Bonn D-53115, Germany
| |
Collapse
|
3
|
Iqbal J, Vogt M, Bajorath J. Quantitative Comparison of Three-Dimensional Activity Landscapes of Compound Data Sets Based upon Topological Features. ACS OMEGA 2020; 5:24111-24117. [PMID: 32984733 PMCID: PMC7513547 DOI: 10.1021/acsomega.0c03659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 08/27/2020] [Indexed: 05/07/2023]
Abstract
Visualization of structure-activity relationships (SARs) in compound data sets substantially contributes to their systematic analysis. For SAR visualization, different types of activity landscape (AL) representations have been introduced. Three-dimensional (3D) AL models in which an activity hypersurface is constructed in chemical space are particularly intuitive because these 3D ALs are reminiscent of "true" (geographical) landscapes. Accordingly, the topologies of 3D AL representations can be immediately associated with different SAR characteristics of compound data sets. However, the comparison of 3D ALs has thus far been confined to visual inspection and qualitative analysis. We have focused on image analysis as a possible approach to facilitate a quantitative comparison of 3D ALs, which would further increase their utility for SAR exploration. Herein, we introduce a new computational methodology for quantifying topological relationships between 3D ALs. Images of color-coded 3D ALs were converted into top-down views of these ALs. From transformed images, different categories of shape features were systematically extracted, and multilevel shape correspondence was determined as a measure of AL similarity. This made it possible to differentiate between 3D ALs in quantitative terms.
Collapse
Affiliation(s)
- Javed Iqbal
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| | - Martin Vogt
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
4
|
Iqbal J, Vogt M, Bajorath J. Computational Method for Quantitative Comparison of Activity Landscapes on the Basis of Image Data. Molecules 2020; 25:E3952. [PMID: 32872506 PMCID: PMC7504767 DOI: 10.3390/molecules25173952] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 08/21/2020] [Accepted: 08/27/2020] [Indexed: 01/31/2023] Open
Abstract
Activity landscape (AL) models are used for visualizing and interpreting structure-activity relationships (SARs) in compound datasets. Therefore, ALs are designed to present chemical similarity and compound potency information in context. Different two- or three-dimensional (2D or 3D) AL representations have been introduced. For SAR analysis, 3D AL models are particularly intuitive. In these models, an interpolated potency surface is added as a third dimension to a 2D projection of chemical space. Accordingly, AL topology can be associated with characteristic SAR features. Going beyond visualization and a qualitative assessment of SARs, it would be very helpful to compare 3D ALs of different datasets in more quantitative terms. However, quantitative AL analysis is still in its infancy. Recently, it has been shown that 3D AL models with pre-defined topologies can be correctly classified using machine learning. Classification was facilitated on the basis of AL image feature representations learned with convolutional neural networks. Therefore, we have further investigated image analysis for quantitative comparison of 3D ALs and devised an approach to determine (dis)similarity relationships for ALs representing different compound datasets. Herein, we report this approach and demonstrate proof-of-principle. The methodology makes it possible to computationally compare 3D ALs and quantify topological differences reflecting varying SAR information content. For SAR exploration in drug design, this adds a quantitative measure of AL (dis)similarity to graphical analysis.
Collapse
Affiliation(s)
- Javed Iqbal
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
5
|
Iqbal J, Vogt M, Bajorath J. Activity landscape image analysis using convolutional neural networks. J Cheminform 2020; 12:34. [PMID: 33431003 PMCID: PMC7236149 DOI: 10.1186/s13321-020-00436-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 04/30/2020] [Indexed: 12/19/2022] Open
Abstract
Activity landscapes (ALs) are graphical representations that combine compound similarity and activity data. ALs are constructed for visualizing local and global structure–activity relationships (SARs) contained in compound data sets. Three-dimensional (3D) ALs are reminiscent of geographical maps where differences in landscape topology mirror different SAR characteristics. 3D AL models can be stored as differently formatted images and are thus amenable to image analysis approaches, which have thus far not been considered in the context of graphical SAR analysis. In this proof-of-concept study, 3D ALs were constructed for a variety of compound activity classes and 3D AL image variants of varying topology and information content were generated and classified. To these ends, convolutional neural networks (CNNs) were initially applied to images of original 3D AL models with color-coding reflecting compound potency information that were taken from different viewpoints. Images of 3D AL models were transformed into variants from which one-dimensional features were extracted. Other machine learning approaches including support vector machine (SVM) and random forest (RF) algorithms were applied to derive models on the basis of such features. In addition, SVM and RF models were trained using other features obtained from images through edge filtering. Machine learning was able to accurately distinguish between 3D AL image variants with different topology and information content. Overall, CNNs which directly learned feature representations from 3D AL images achieved highest classification accuracy. Predictive performance for CNN, SVM, and RF models was highest for image variants emphasizing topological elevation. In addition, SVM models trained on rudimentary images from edge filtering classified such images with high accuracy, which further supported the critical role of altitude-dependent topological features for image analysis and predictions. Taken together, the findings of our proof-of-concept investigation indicate that image analysis has considerable potential for graphical SAR exploration to systematically infer different SAR characteristics from topological features of 3D ALs.![]()
Collapse
Affiliation(s)
- Javed Iqbal
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.
| |
Collapse
|
6
|
Medina-Franco JL, Naveja JJ, López-López E. Reaching for the bright StARs in chemical space. Drug Discov Today 2019; 24:2162-2169. [PMID: 31557448 DOI: 10.1016/j.drudis.2019.09.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 09/10/2019] [Accepted: 09/17/2019] [Indexed: 02/07/2023]
Abstract
Visualization of activity data in chemical space is common in drug discovery. Navigating the space in a systematic manner is not trivial, given its size and huge coverage. To this end, methods for data visualization have been developed charting biological activity into chemical space. Herein, we review the progress in different visualization approaches to explore the chemical space aiming at reaching insightful structure-activity relationships (SARs) in the chemical space. We discuss recent methods including consensus diversity plots, ChemMaps, and constellation plots. Several of the methods we review can be extended to analyze other properties of interest in medicinal chemistry, such as structure-toxicity relationships, and can be adapted to postprocess results of virtual screening (VS) of large compound libraries.
Collapse
Affiliation(s)
- José L Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico.
| | - J Jesús Naveja
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico; PECEM, School of Medicine, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Edgar López-López
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|