1
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
2
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
3
|
P V, Mohanan M, U K S, E Pa S, U C A J. Graph Attention Network based mapping of knowledge relations between chemical spaces of Nuclear factor kappa B and Centella asiatica. Comput Biol Chem 2023; 107:107955. [PMID: 37734134 DOI: 10.1016/j.compbiolchem.2023.107955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 08/02/2023] [Accepted: 09/07/2023] [Indexed: 09/23/2023]
Abstract
The confounding nature of the innate immunity target Nuclear Factor kappa B (NF-κB) and its interaction with Centella asiatica (CA) molecules necessitate the intervention of advanced technologies, such as deep learning methods. The integration of chemical space concepts with deep learning technologies is a new way of knowledge mapping used to explore drug-target interactions, especially in molecular libraries derived from traditional medicine based molecular sources. The current constraint of virtual screening for mechanistic target hunting is the use of a binary classification model that includes active and inactive molecules from in vitro experiments to explore drug-target interaction. This study aims to explore the regulatory nature of the molecules from the inhibition and activation of the NF-κB bioassay data set and map this information for a knowledge-based analysis against the molecules of CA, a low-growing tropical plant. This finding has led to a new direction in the field, transitioning from the conventional active-inactive framework to a more comprehensive active-inactive-regulatory model. This approach can be thoroughly explored by leveraging a graph-based deep learning system. The study presents an innovative approach using a Graph Attention Network (GAT) to rank CA molecules in chemical space based on their similarity with NF-κB bioassay molecules, enabling the efficient analysis of complex relationships between molecules and their regulatory function. Graph Attention Network (GAT) overcomes the limitations of traditional deep learning models such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in handling non-Euclidean graph data and allows for a more precise understanding of similarity ranking by utilizing molecular graphs and attention behavior. By measuring similarity and arranging a matrix of similarity ranking based on GAT, deep neural ranking-based algorithms confirmed the regulatory behaviour of an innate immunity target NF-κB with the support of underlying inverse mapping in the surjective chemical spaces of NF-κB bioassays and CA molecular spaces. Overall, the study introduces new techniques for exploring the regulatory behaviour of complex targets like NF-κB. We then used t-SNE for clustering in chemical space and scaffold hunting for scaffold property analysis and identified nine CA molecules that exhibit regulatory behavior of NF-κB target and are recommended for further investigation.
Collapse
Affiliation(s)
- Vivek P
- UL Research Center, UL Cyber Park Calicut, India
| | | | | | - Sandesh E Pa
- UL Research Center, UL Cyber Park Calicut, India
| | - Jaleel U C A
- OSPF-NIAS Drug DIscovery Lab, National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, India
| |
Collapse
|
4
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
5
|
Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning. Molecules 2023; 28:5982. [PMID: 37630234 PMCID: PMC10459669 DOI: 10.3390/molecules28165982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Chemistry, New York University, New York, NY 10027, USA
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
| | - Hongyu Kang
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co., Ltd., Beijing 100080, China;
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
7
|
Callis TB, Garrett TR, Montgomery AP, Danon JJ, Kassiou M. Recent Scaffold Hopping Applications in Central Nervous System Drug Discovery. J Med Chem 2022; 65:13483-13504. [PMID: 36206553 DOI: 10.1021/acs.jmedchem.2c00969] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The concept of bioisosterism and the implementation of bioisosteric replacement is fundamental to medicinal chemistry. The exploration of bioisosteres is often used to probe key structural features of candidate pharmacophores and enhance pharmacokinetic properties. As the understanding of bioisosterism has evolved, capabilities to undertake more ambitious bioisosteric replacements have emerged. Scaffold hopping is a broadly used term in the literature referring to a variety of different bioisosteric replacement strategies, ranging from simple heterocyclic replacements to topological structural overhauls. In this work, we have highlighted recent applications of scaffold hopping in the central nervous system drug discovery space. While we have highlighted the benefits of using scaffold hopping approaches in central nervous system drug discovery, these are also widely applicable to other medicinal chemistry fields. We also recommend a shift toward the use of more refined and meaningful terminology within the realm of scaffold hopping.
Collapse
Affiliation(s)
- Timothy B Callis
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Taylor R Garrett
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | | | - Jonathan J Danon
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Michael Kassiou
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
8
|
Gorgulla C, Jayaraj A, Fackeldey K, Arthanari H. Emerging frontiers in virtual drug discovery: From quantum mechanical methods to deep learning approaches. Curr Opin Chem Biol 2022; 69:102156. [PMID: 35576813 PMCID: PMC9990419 DOI: 10.1016/j.cbpa.2022.102156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 03/16/2022] [Accepted: 04/07/2022] [Indexed: 11/19/2022]
Abstract
Virtual screening-based approaches to discover initial hit and lead compounds have the potential to reduce both the cost and time of early drug discovery stages, as well as to find inhibitors for even challenging target sites such as protein-protein interfaces. Here in this review, we provide an overview of the progress that has been made in virtual screening methodology and technology on multiple fronts in recent years. The advent of ultra-large virtual screens, in which hundreds of millions to billions of compounds are screened, has proven to be a powerful approach to discover highly potent hit compounds. However, these developments are just the tip of the iceberg, with new technologies and methods emerging to propel the field forward. Examples include novel machine-learning approaches, which can reduce the computational costs of virtual screening dramatically, while progress in quantum-mechanical approaches can increase the accuracy of predictions of various small molecule properties.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Physics, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | | | - Konstantin Fackeldey
- Institute of Mathematics, Technical University Berlin, Berlin, Germany; Zuse Institute Berlin, Berlin, Germany
| | - Haribabu Arthanari
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA.
| |
Collapse
|
9
|
Zheng S, Lei Z, Ai H, Chen H, Deng D, Yang Y. Deep scaffold hopping with multimodal transformer neural networks. J Cheminform 2021; 13:87. [PMID: 34774103 PMCID: PMC8590293 DOI: 10.1186/s13321-021-00565-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/31/2021] [Indexed: 11/10/2022] Open
Abstract
Scaffold hopping is a central task of modern medicinal chemistry for rational drug design, which aims to design molecules of novel scaffolds sharing similar target biological activities toward known hit molecules. Traditionally, scaffolding hopping depends on searching databases of available compounds that can't exploit vast chemical space. In this study, we have re-formulated this task as a supervised molecule-to-molecule translation to generate hopped molecules novel in 2D structure but similar in 3D structure, as inspired by the fact that candidate compounds bind with their targets through 3D conformations. To efficiently train the model, we curated over 50 thousand pairs of molecules with increased bioactivity, similar 3D structure, but different 2D structure from public bioactivity database, which spanned 40 kinases commonly investigated by medicinal chemists. Moreover, we have designed a multimodal molecular transformer architecture by integrating molecular 3D conformer through a spatial graph neural network and protein sequence information through Transformer. The trained DeepHop model was shown able to generate around 70% molecules having improved bioactivity together with high 3D similarity but low 2D scaffold similarity to the template molecules. This ratio was 1.9 times higher than other state-of-the-art deep learning methods and rule- and virtual screening-based methods. Furthermore, we demonstrated that the model could generalize to new target proteins through fine-tuning with a small set of active compounds. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenarios.
Collapse
Affiliation(s)
- Shuangjia Zheng
- School of Data and Computer Science, Sun Yat-Sen University, China, 132 East Circle at University City, Guangzhou, 510006, China
| | - Zengrong Lei
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China
| | - Haitao Ai
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China
| | - Hongming Chen
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou, 510530, China
| | - Daiguo Deng
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China.
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, China, 132 East Circle at University City, Guangzhou, 510006, China.
| |
Collapse
|
10
|
Sicho M, Liu X, Svozil D, van Westen GJP. GenUI: interactive and extensible open source software platform for de novo molecular generation and cheminformatics. J Cheminform 2021; 13:73. [PMID: 34563271 PMCID: PMC8465716 DOI: 10.1186/s13321-021-00550-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 09/05/2021] [Indexed: 03/05/2023] Open
Abstract
Many contemporary cheminformatics methods, including computer-aided de novo drug design, hold promise to significantly accelerate and reduce the cost of drug discovery. Thanks to this attractive outlook, the field has thrived and in the past few years has seen an especially significant growth, mainly due to the emergence of novel methods based on deep neural networks. This growth is also apparent in the development of novel de novo drug design methods with many new generative algorithms now available. However, widespread adoption of new generative techniques in the fields like medicinal chemistry or chemical biology is still lagging behind the most recent developments. Upon taking a closer look, this fact is not surprising since in order to successfully integrate the most recent de novo drug design methods in existing processes and pipelines, a close collaboration between diverse groups of experimental and theoretical scientists needs to be established. Therefore, to accelerate the adoption of both modern and traditional de novo molecular generators, we developed Generator User Interface (GenUI), a software platform that makes it possible to integrate molecular generators within a feature-rich graphical user interface that is easy to use by experts of diverse backgrounds. GenUI is implemented as a web service and its interfaces offer access to cheminformatics tools for data preprocessing, model building, molecule generation, and interactive chemical space visualization. Moreover, the platform is easy to extend with customizable frontend React.js components and backend Python extensions. GenUI is open source and a recently developed de novo molecular generator, DrugEx, was integrated as a proof of principle. In this work, we present the architecture and implementation details of GenUI and discuss how it can facilitate collaboration in the disparate communities interested in de novo molecular generation and computer-aided drug discovery.
Collapse
Affiliation(s)
- M. Sicho
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
| | - X. Liu
- Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - D. Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20 Prague 4, Czech Republic
| | - G. J. P. van Westen
- Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
11
|
Menke J, Massa J, Koch O. Natural product scores and fingerprints extracted from artificial neural networks. Comput Struct Biotechnol J 2021; 19:4593-4602. [PMID: 34584636 PMCID: PMC8445839 DOI: 10.1016/j.csbj.2021.07.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 11/21/2022] Open
Abstract
Due to their desirable properties, natural products are an important ligand class for medicinal chemists. However, due to their structural distinctiveness, traditional cheminformatic approaches, like ligand-based virtual screening, often perform worse for natural products. Based on our recent work, we evaluated the ability of neural networks to generate fingerprints more appropriate for use with natural products. A manually curated dataset of natural products and synthetic decoys was used to train a multi-layer perceptron network and an autoencoder-like network. In-depth analysis showed that the extracted natural product-specific neural fingerprint outperforms traditional as well as natural product-specific fingerprints on three datasets. Further, we explored how the activations from the output layer of a network can work as a novel natural product likeness score. Overall, two natural product-specific datasets were generated, which are publicly available together with the code to create the fingerprints and the novel natural product likeness score.
Collapse
Affiliation(s)
- Janosch Menke
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| | - Joana Massa
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
- Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| |
Collapse
|
12
|
Amendola G, Cosconati S. PyRMD: A New Fully Automated AI-Powered Ligand-Based Virtual Screening Tool. J Chem Inf Model 2021; 61:3835-3845. [PMID: 34270903 DOI: 10.1021/acs.jcim.1c00653] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Artificial intelligence (AI) algorithms are dramatically redefining the current drug discovery landscape by boosting the efficiency of its various steps. Still, their implementation often requires a certain level of expertise in AI paradigms and coding. This often prevents the use of these powerful methodologies by non-expert users involved in the design of new biologically active compounds. Here, the random matrix discriminant (RMD) algorithm, a high-performance AI method specifically tailored for the identification of new ligands, was implemented in a new fully automated tool, PyRMD. This ligand-based virtual screening tool can be trained using target bioactivity data directly downloaded from the ChEMBL repository without manual intervention. The software automatically splits the available training compounds into active and inactive sets and learns the distinctive chemical features responsible for the compounds' activity/inactivity. PyRMD was designed to easily screen millions of compounds in hours through an automated workflow and intuitive input files, allowing fine tuning of each parameter of the calculation. Additionally, PyRMD features a wealth of benchmark metrics, to accurately probe the model performance, which were used here to gauge its predictive potential and limitations. PyRMD is freely available on GitHub (https://github.com/cosconatilab/PyRMD) as an open-source tool.
Collapse
Affiliation(s)
- Giorgio Amendola
- DiSTABiF, University of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Sandro Cosconati
- DiSTABiF, University of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| |
Collapse
|
13
|
Nakano H, Miyao T, Swarit J, Funatsu K. Sparse Topological Pharmacophore Graphs for Interpretable Scaffold Hopping. J Chem Inf Model 2021; 61:3348-3360. [PMID: 34264667 DOI: 10.1021/acs.jcim.1c00409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The aim of scaffold hopping (SH) is to find compounds consisting of different scaffolds from those in already known active compounds, giving an opportunity for unexplored regions of chemical space. We previously demonstrated the usefulness of pharmacophore graphs (PhGs) for this purpose through proof-of-concept virtual screening experiments. PhGs consist of nodes and edges corresponding to pharmacophoric features (PFs) and their topological distances. Although PhGs were effective in SH, they are hard to interpret as they are complete graphs. Herein, we introduce an intuitive representation of a molecule, termed as sparse pharmacophore graphs (SPhG) by keeping the topological distances among PFs as much as possible while reducing the number of edges in the graphs. Several benchmark calculations quantitatively confirmed the sparseness of the graphs and the preservation of topological distances among pharmacophoric points. As proof-of-concept applications, virtual screening (VS) trials for SH were conducted using active and inactive compounds from ChEMBL and PubChem databases for three biological targets: thrombin, tyrosine kinase ABL1, and κ-opioid receptor. The performances of VS were comparable with using fully connected PhGs. Furthermore, highly ranked SPhGs were interpretable for the three biological targets, in particular for thrombin, for which selected SPhGs were in agreement with the structure-based interpretation.
Collapse
Affiliation(s)
- Hiroshi Nakano
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.,Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Jasial Swarit
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.,Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Kimito Funatsu
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.,Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.,Department of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|