1
|
Feng B, Yu H, Dong X, Díaz-Holguín A, Hu H. Identification of bioactive compounds with popular single-atom modifications: Comprehensive analysis and implications for compound design. Eur J Med Chem 2025; 283:117051. [PMID: 39631098 DOI: 10.1016/j.ejmech.2024.117051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 10/28/2024] [Accepted: 11/11/2024] [Indexed: 12/07/2024]
Abstract
The extensive bioactivity data available in public databases, such as ChEMBL, has facilitated in-depth structure-activity relationship (SAR) analysis, which are essential for understanding the impact of molecular modifications on biological activity in a comprehensive manner. A central strategy in SAR analysis is the assessment of molecular similarity. Several approaches preferred by medicinal chemists have been developed to efficiently capture structurally related compounds on a large scale. Represented as a popular molecular editing strategy in hit-to-lead and lead optimization processes, we previously introduced four types of single-atom modifications (SAMs) as chemical similarity criterion and conducted a systematic analysis of their application in compound design. In this study, we expanded the analysis to cover 10 common SAMs, including carbon-nitrogen (N↔C), O↔C, N↔O, S↔O, as well as simpler modifications such as OH↔H, CH3↔H, and halogen-hydrogen (F, Cl, Br, I↔H) exchanges. Leveraging high-confidence bioactivity data from ChEMBL (version 34), we assembled a comprehensive dataset comprising 374,979 SAM pairs. Following an evaluation of the frequency of these SAM types in medicinal chemistry efforts, we focused on SAM-induced activity cliffs (ACs), yielding over 7400 ACs, substantially expanding the current knowledgebase of ACs associated with single-atom changes. Furthermore, structural analysis of these ACs, supported by experimental data, provides critical insights into the role of single-atom modifications in modulating compound activity, offering practical guidance for the structure-based optimization of molecular properties in drug development. As a result, we are providing open access to all identified ACs along with their associated structural information.
Collapse
Affiliation(s)
- Bo Feng
- Department of Pharmacy, The Affiliated Hospital of Yangzhou University, Yangzhou University, Yangzhou, 225000, PR China
| | - Hui Yu
- Information School, University of Sheffield, 211 Portobello, Sheffield, S1 4DP, UK
| | - Xu Dong
- Department of Pharmacy, The Affiliated Hospital of Yangzhou University, Yangzhou University, Yangzhou, 225000, PR China
| | - Alejandro Díaz-Holguín
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, BMC, Box 596, SE-751 24, Uppsala, Sweden
| | - Huabin Hu
- Department of Pharmacy, The Affiliated Hospital of Yangzhou University, Yangzhou University, Yangzhou, 225000, PR China; Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, BMC, Box 596, SE-751 24, Uppsala, Sweden; Centre for Cancer Drug Discovery, Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK.
| |
Collapse
|
2
|
Zhao B, Xu W, Guan J, Zhou S. Molecular property prediction based on graph structure learning. Bioinformatics 2024; 40:btae304. [PMID: 38710497 PMCID: PMC11112045 DOI: 10.1093/bioinformatics/btae304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 04/06/2024] [Accepted: 05/03/2024] [Indexed: 05/08/2024] Open
Abstract
MOTIVATION Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have achieved considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. RESULTS For this sake, in this article we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecule similarity graph (MSG). Following that, we conduct GSL on the MSG, i.e. molecule-level GSL, to get the final molecular embeddings, which are the results of fuzing both GNN encoded molecular representations and the relationships among molecules. That is, combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on 10 various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/zby961104/GSL-MPP.
Collapse
Affiliation(s)
- Bangyi Zhao
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| | - Weixia Xu
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
3
|
Daoud S, Taha M. Protein characteristics substantially influence the propensity of activity cliffs among kinase inhibitors. Sci Rep 2024; 14:9058. [PMID: 38643174 PMCID: PMC11032345 DOI: 10.1038/s41598-024-59501-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 04/11/2024] [Indexed: 04/22/2024] Open
Abstract
Activity cliffs (ACs) are pairs of structurally similar molecules with significantly different affinities for a biotarget, posing a challenge in computer-assisted drug discovery. This study focuses on protein kinases, significant therapeutic targets, with some exhibiting ACs while others do not despite numerous inhibitors. The hypothesis that the presence of ACs is dependent on the target protein and its complete structural context is explored. Machine learning models were developed to link protein properties to ACs, revealing specific tripeptide sequences and overall protein properties as critical factors in ACs occurrence. The study highlights the importance of considering the entire protein matrix rather than just the binding site in understanding ACs. This research provides valuable insights for drug discovery and design, paving the way for addressing ACs-related challenges in modern computational approaches.
Collapse
Affiliation(s)
- Safa Daoud
- Department of Pharmaceutical Chemistry and Pharmacognosy, Faculty of Pharmacy, Applied Sciences Private University, Amman, Jordan.
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan.
| |
Collapse
|
4
|
Danishuddin, Malik MZ, Kashif M, Haque S, Kim JJ. Exploring chemical space, scaffold diversity, and activity landscape of spleen tyrosine kinase active inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:325-342. [PMID: 38690773 DOI: 10.1080/1062936x.2024.2345618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/14/2024] [Indexed: 05/03/2024]
Abstract
This study aims to comprehensively characterize 576 inhibitors targeting Spleen Tyrosine Kinase (SYK), a non-receptor tyrosine kinase primarily found in haematopoietic cells, with significant relevance to B-cell receptor function. The objective is to gain insights into the structural requirements essential for potent activity, with implications for various therapeutic applications. Through chemoinformatic analyses, we focus on exploring the chemical space, scaffold diversity, and structure-activity relationships (SAR). By leveraging ECFP4 and MACCS fingerprints, we elucidate the relationship between chemical compounds and visualize the network using RDKit and NetworkX platforms. Additionally, compound clustering and visualization of the associated chemical space aid in understanding overall diversity. The outcomes include identifying consensus diversity patterns to assess global chemical space diversity. Furthermore, incorporating pairwise activity differences enhances the activity landscape visualization, revealing heterogeneous SAR patterns. The dataset analysed in this work has three activity cliff generators, CHEMBL3415598, CHEMBL4780257, and CHEMBL3265037, compounds with high affinity to SYK are very similar to compounds analogues with reasonable potency differences. Overall, this study provides a critical analysis of SYK inhibitors, uncovering potential scaffolds and chemical moieties crucial for their activity, thereby advancing the understanding of their therapeutic potential.
Collapse
Affiliation(s)
- Danishuddin
- Department of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| | - M Z Malik
- Department of Genetics and Bioinformatics, Dasman Diabetes Institute (DDI), Dasman, Kuwait
| | - M Kashif
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - S Haque
- Research and Scientific Studies Unit, College of Nursing and Health Sciences, Jazan University, Jazan, Saudi Arabia
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | - J J Kim
- Department of Biotechnology, Yeungnam University, Gyeongsan, Republic of Korea
| |
Collapse
|
5
|
Martinez-Mayorga K, Rosas-Jiménez JG, Gonzalez-Ponce K, López-López E, Neme A, Medina-Franco JL. The pursuit of accurate predictive models of the bioactivity of small molecules. Chem Sci 2024; 15:1938-1952. [PMID: 38332817 PMCID: PMC10848664 DOI: 10.1039/d3sc05534e] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
Property prediction is a key interest in chemistry. For several decades there has been a continued and incremental development of mathematical models to predict properties. As more data is generated and accumulated, there seems to be more areas of opportunity to develop models with increased accuracy. The same is true if one considers the large developments in machine and deep learning models. However, along with the same areas of opportunity and development, issues and challenges remain and, with more data, new challenges emerge such as the quality and quantity and reliability of the data, and model reproducibility. Herein, we discuss the status of the accuracy of predictive models and present the authors' perspective of the direction of the field, emphasizing on good practices. We focus on predictive models of bioactive properties of small molecules relevant for drug discovery, agrochemical, food chemistry, natural product research, and related fields.
Collapse
Affiliation(s)
- Karina Martinez-Mayorga
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José G Rosas-Jiménez
- Department of Theoretical Biophysics, IMPRS on Cellular Biophysics Max-von-Laue Strasse 3 Frankfurt am Main 60438 Germany
| | - Karla Gonzalez-Ponce
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
| | - Edgar López-López
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City 07000 Mexico
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| | - Antonio Neme
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| |
Collapse
|
6
|
Bertin P, Rector-Brooks J, Sharma D, Gaudelet T, Anighoro A, Gross T, Martínez-Peña F, Tang EL, Suraj MS, Regep C, Hayter JBR, Korablyov M, Valiante N, van der Sloot A, Tyers M, Roberts CES, Bronstein MM, Lairson LL, Taylor-King JP, Bengio Y. RECOVER identifies synergistic drug combinations in vitro through sequential model optimization. CELL REPORTS METHODS 2023; 3:100599. [PMID: 37797618 PMCID: PMC10626197 DOI: 10.1016/j.crmeth.2023.100599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 08/30/2023] [Accepted: 09/06/2023] [Indexed: 10/07/2023]
Abstract
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state-of-the-art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased toward synergistic agents and results do not generalize out of distribution. During 5 rounds of experimentation, we employ sequential model optimization with a deep learning model to select drug combinations increasingly enriched for synergism and active against a cancer cell line-evaluating only ∼5% of the total search space. Moreover, we find that learned drug embeddings (using structural information) begin to reflect biological mechanisms. In silico benchmarking suggests search queries are ∼5-10× enriched for highly synergistic drug combinations by using sequential rounds of evaluation when compared with random selection or ∼3× when using a pretrained model.
Collapse
Affiliation(s)
- Paul Bertin
- Mila, the Quebec AI Institute, Montreal, QC, Canada
| | | | | | | | | | | | | | - Eileen L Tang
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | | | | | | | - Almer van der Sloot
- IRIC, Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, QC, Canada
| | - Mike Tyers
- Program in Molecular Medicine, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, 686 Bay Street, Toronto, ON M5G 0A4, Canada
| | | | - Michael M Bronstein
- Relation Therapeutics, London, UK; Department of Computer Science, University of Oxford, Oxford, UK
| | - Luke L Lairson
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA, USA
| | | | | |
Collapse
|
7
|
Schür C, Gasser L, Perez-Cruz F, Schirmer K, Baity-Jesi M. A benchmark dataset for machine learning in ecotoxicology. Sci Data 2023; 10:718. [PMID: 37853023 PMCID: PMC10584858 DOI: 10.1038/s41597-023-02612-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/28/2023] [Indexed: 10/20/2023] Open
Abstract
The use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning as well as a strong biological and ecotoxicological background, which we consider a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splittings were used. Therefore, we provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.
Collapse
Affiliation(s)
- Christoph Schür
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Lilian Gasser
- Swiss Data Science Center (SDSC), Zürich, Switzerland
| | - Fernando Perez-Cruz
- Swiss Data Science Center (SDSC), Zürich, Switzerland
- ETH Zürich: Department of Computer Science, Zürich, Switzerland
| | - Kristin Schirmer
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zürich: Department of Environmental Systems Science, Zürich, Switzerland
- EPF Lausanne, School of Architecture, Civil and Environmental Engineering, Lausanne, Switzerland
| | - Marco Baity-Jesi
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
| |
Collapse
|
8
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
9
|
Dablander M, Hanser T, Lambiotte R, Morris GM. Exploring QSAR models for activity-cliff prediction. J Cheminform 2023; 15:47. [PMID: 37069675 PMCID: PMC10107580 DOI: 10.1186/s13321-023-00708-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 03/10/2023] [Indexed: 04/19/2023] Open
Abstract
INTRODUCTION AND METHODOLOGY Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. RESULTS AND CONCLUSIONS Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity.
Collapse
Affiliation(s)
- Markus Dablander
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK
| | - Thierry Hanser
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK
| | - Renaud Lambiotte
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK
| | - Garrett M Morris
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.
| |
Collapse
|
10
|
Chiodi D, Ishihara Y. "Magic Chloro": Profound Effects of the Chlorine Atom in Drug Discovery. J Med Chem 2023; 66:5305-5331. [PMID: 37014977 DOI: 10.1021/acs.jmedchem.2c02015] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023]
Abstract
Chlorine is one of the most common atoms present in small-molecule drugs beyond carbon, hydrogen, nitrogen, and oxygen. There are currently more than 250 FDA-approved chlorine-containing drugs, yet the beneficial effect of the chloro substituent has not yet been reviewed. The seemingly simple substitution of a hydrogen atom (R = H) with a chlorine atom (R = Cl) can result in remarkable improvements in potency of up to 100,000-fold and can lead to profound effects on pharmacokinetic parameters including clearance, half-life, and drug exposure in vivo. Following the literature terminology of the "magic methyl effect" in drugs, the term "magic chloro effect" has been coined herein. Although reports of 500-fold or 1000-fold potency improvements are often serendipitous discoveries that can be considered "magical" rather than planned, hypotheses made to explain the magic chloro effect can lead to lessons that accelerate the cycle of drug discovery.
Collapse
Affiliation(s)
- Debora Chiodi
- Department of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Yoshihiro Ishihara
- Department of Chemistry, Vividion Therapeutics, 5820 Nancy Ridge Drive, San Diego, California 92121, United States
| |
Collapse
|
11
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
12
|
Isomeric Activity Cliffs-A Case Study for Fluorine Substitution of Aminergic G Protein-Coupled Receptor Ligands. Molecules 2023; 28:molecules28020490. [PMID: 36677547 PMCID: PMC9863698 DOI: 10.3390/molecules28020490] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/30/2022] [Accepted: 01/01/2023] [Indexed: 01/06/2023] Open
Abstract
Currently, G protein-coupled receptors (GPCRs) constitute a significant group of membrane-bound receptors representing more than 30% of therapeutic targets. Fluorine is commonly used in designing highly active biological compounds, as evidenced by the steadily increasing number of drugs by the Food and Drug Administration (FDA). Herein, we identified and analyzed 898 target-based F-containing isomeric analog sets for SAR analysis in the ChEMBL database-FiSAR sets active against 33 different aminergic GPCRs comprising a total of 2163 fluorinated (1201 unique) compounds. We found 30 FiSAR sets contain activity cliffs (ACs), defined as pairs of structurally similar compounds showing significant differences in affinity (≥50-fold change), where the change of fluorine position may lead up to a 1300-fold change in potency. The analysis of matched molecular pair (MMP) networks indicated that the fluorination of aromatic rings showed no clear trend toward a positive or negative effect on affinity. Additionally, we propose an in silico workflow (including induced-fit docking, molecular dynamics, quantum polarized ligand docking, and binding free energy calculations based on the Generalized-Born Surface-Area (GBSA) model) to score the fluorine positions in the molecule.
Collapse
|
13
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
14
|
López-López E, Fernández-de Gortari E, Medina-Franco JL. Yes SIR! On the structure-inactivity relationships in drug discovery. Drug Discov Today 2022; 27:2353-2362. [PMID: 35561964 DOI: 10.1016/j.drudis.2022.05.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/09/2022] [Accepted: 05/05/2022] [Indexed: 12/12/2022]
Abstract
In analogy with structure-activity relationships (SARs), which are at the core of medicinal chemistry, studying structure-inactivity relationships (SIRs) is essential to understanding and predicting biological activity. Current computational methods should predict or distinguish 'activity' and 'inactivity' with the same confidence because both concepts are complementary. However, the lack of inactivity data, in particular in the public domain, limits the development of predictive models and its broad application. In this review, we encourage the scientific community to disclose and analyze high-confidence activity data considering both the labeled 'active' and 'inactive' compounds.
Collapse
Affiliation(s)
- Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico; Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico.
| | - Eli Fernández-de Gortari
- Department of Nanosafety, International Iberian Nanotechnology Laboratory, Braga 4715-330, Portugal
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| |
Collapse
|
15
|
Exploiting activity cliffs for building pharmacophore models and comparison with other pharmacophore generation methods: sphingosine kinase 1 as case study. J Comput Aided Mol Des 2022; 36:39-62. [PMID: 35059939 DOI: 10.1007/s10822-021-00435-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 11/24/2021] [Indexed: 12/20/2022]
|
16
|
Congenericity of Claimed Compounds in Patent Applications. Molecules 2021; 26:molecules26175253. [PMID: 34500686 PMCID: PMC8433967 DOI: 10.3390/molecules26175253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 12/04/2022] Open
Abstract
A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set of compounds around a more vaguely described patent claim. An application to 750 common patents available in SureChEMBL, SureChEMBLccs and ChEMBL is presented and the congenericity of patent compounds in those different sources discussed.
Collapse
|
17
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
18
|
Cheminformatic Profiling and Hit Prioritization of Natural Products with Activities against Methicillin-Resistant Staphylococcus aureus (MRSA). Molecules 2021; 26:molecules26123674. [PMID: 34208597 PMCID: PMC8246317 DOI: 10.3390/molecules26123674] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/28/2021] [Accepted: 05/08/2021] [Indexed: 12/14/2022] Open
Abstract
Several natural products (NPs) have displayed varying in vitro activities against methicillin-resistant Staphylococcus aureus (MRSA). However, few of these compounds have not been developed into potential antimicrobial drug candidates. This may be due to the high cost and tedious and time-consuming process of conducting the necessary preclinical tests on these compounds. In this study, cheminformatic profiling was performed on 111 anti-MRSA NPs (AMNPs), using a few orally administered conventional drugs for MRSA (CDs) as reference, to identify compounds with prospects to become drug candidates. This was followed by prioritizing these hits and identifying the liabilities among the AMNPs for possible optimization. Cheminformatic profiling revealed that most of the AMNPs were within the required drug-like region of the investigated properties. For example, more than 76% of the AMNPs showed compliance with the Lipinski, Veber, and Egan predictive rules for oral absorption and permeability. About 34% of the AMNPs showed the prospect to penetrate the blood–brain barrier (BBB), an advantage over the CDs, which are generally non-permeant of BBB. The analysis of toxicity revealed that 59% of the AMNPs might have negligible or no toxicity risks. Structure–activity relationship (SAR) analysis revealed chemical groups that may be determinants of the reported bioactivity of the compounds. A hit prioritization strategy using a novel “desirability scoring function” was able to identify AMNPs with the desired drug-likeness. Hit optimization strategies implemented on AMNPs with poor desirability scores led to the design of two compounds with improved desirability scores.
Collapse
|