1
|
Macorano A, Mazzolari A, Malloci G, Pedretti A, Vistoli G, Gervasoni S. An improved dataset of force fields, electronic and physicochemical descriptors of metabolic substrates. Sci Data 2024; 11:929. [PMID: 39191771 PMCID: PMC11349763 DOI: 10.1038/s41597-024-03707-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024] Open
Abstract
In silico prediction of xenobiotic metabolism is an important strategy to accelerate the drug discovery process, as candidate compounds often fail in clinical phases due to their poor pharmacokinetic profiles. Here we present MetaQM, a dataset of quantum-mechanical (QM) optimized metabolic substrates, including force field parameters, electronic and physicochemical properties. MetaQM comprises 2054 metabolic substrates extracted from the MetaQSAR database. We provide QM-optimized geometries, General Amber Force Field (FF) parameters for all studied molecules, and an extended set of structural and physicochemical descriptors as calculated by DFT and PM7 methods. The generated data can be used in different types of analysis. FF parameters can be applied to perform classical molecular mechanics calculations as exemplified by the validating molecular dynamics simulations reported here. The calculated descriptors can represent input features for developing improved predictive models for metabolism and drug design, as exemplified in this work. Finally, the QM-optimized molecular structures are valuable starting points for both ligand- and structure-based analyses such as pharmacophore mapping and docking simulations.
Collapse
Affiliation(s)
- Alessio Macorano
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, via Mangiagalli 25, 20133, Milano, Italy
| | - Angelica Mazzolari
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, via Mangiagalli 25, 20133, Milano, Italy
| | - Giuliano Malloci
- Dipartimento di Fisica, Università degli Studi di Cagliari, Cittadella Universitaria, S.P. Monserrato-Sestu Km 0.7, I-09042, Monserrato, CA, Italy
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, via Mangiagalli 25, 20133, Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, via Mangiagalli 25, 20133, Milano, Italy
| | - Silvia Gervasoni
- Dipartimento di Fisica, Università degli Studi di Cagliari, Cittadella Universitaria, S.P. Monserrato-Sestu Km 0.7, I-09042, Monserrato, CA, Italy.
| |
Collapse
|
2
|
Manelfi C, Tazzari V, Lunghini F, Cerchia C, Fava A, Pedretti A, Stouten PFW, Vistoli G, Beccari AR. "DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases. J Cheminform 2024; 16:21. [PMID: 38395961 PMCID: PMC10893756 DOI: 10.1186/s13321-024-00813-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.
Collapse
Affiliation(s)
- Candida Manelfi
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Valerio Tazzari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Carmen Cerchia
- Department of Pharmacy, University of Naples "Federico II", Via D. Montesano 49, 80131, Napoli, Italy
| | - Anna Fava
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, 20133, Milano, Italy
| | - Pieter F W Stouten
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123, Napoli, Italy
- Stouten Pharma Consultancy BV, Kempenarestraat 47, 2860, Sint-Katelijne-Waver, Belgium
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, 20133, Milano, Italy
| | | |
Collapse
|
3
|
Chen Y, Seidel T, Jacob RA, Hirte S, Mazzolari A, Pedretti A, Vistoli G, Langer T, Miljković F, Kirchmair J. Active Learning Approach for Guiding Site-of-Metabolism Measurement and Annotation. J Chem Inf Model 2024; 64:348-358. [PMID: 38170877 PMCID: PMC10806800 DOI: 10.1021/acs.jcim.3c01588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/30/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024]
Abstract
The ability to determine and predict metabolically labile atom positions in a molecule (also called "sites of metabolism" or "SoMs") is of high interest to the design and optimization of bioactive compounds, such as drugs, agrochemicals, and cosmetics. In recent years, several in silico models for SoM prediction have become available, many of which include a machine-learning component. The bottleneck in advancing these approaches is the coverage of distinct atom environments and rare and complex biotransformation events with high-quality experimental data. Pharmaceutical companies typically have measured metabolism data available for several hundred to several thousand compounds. However, even for metabolism experts, interpreting these data and assigning SoMs are challenging and time-consuming. Therefore, a significant proportion of the potential of the existing metabolism data, particularly in machine learning, remains dormant. Here, we report on the development and validation of an active learning approach that identifies the most informative atoms across molecular data sets for SoM annotation. The active learning approach, built on a highly efficient reimplementation of SoM predictor FAME 3, enables experts to prioritize their SoM experimental measurements and annotation efforts on the most rewarding atom environments. We show that this active learning approach yields competitive SoM predictors while requiring the annotation of only 20% of the atom positions required by FAME 3. The source code of the approach presented in this work is publicly available.
Collapse
Affiliation(s)
- Ya Chen
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Thomas Seidel
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
| | - Roxane Axel Jacob
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
- Vienna
Doctoral School of Pharmaceutical, Nutritional and Sport Sciences
(PhaNuSpo), University of Vienna, 1090 Vienna, Austria
| | - Steffen Hirte
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Vienna
Doctoral School of Pharmaceutical, Nutritional and Sport Sciences
(PhaNuSpo), University of Vienna, 1090 Vienna, Austria
| | - Angelica Mazzolari
- Dipartimento
di Scienze Farmaceutiche, Università
degli Studi di Milano, I-20133 Milano, Italy
| | - Alessandro Pedretti
- Dipartimento
di Scienze Farmaceutiche, Università
degli Studi di Milano, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento
di Scienze Farmaceutiche, Università
degli Studi di Milano, I-20133 Milano, Italy
| | - Thierry Langer
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
| | - Filip Miljković
- Medicinal
Chemistry, Research and Early Development, Cardiovascular, Renal and
Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-43183 Gothenburg, Sweden
| | - Johannes Kirchmair
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
| |
Collapse
|