1
|
Riaz IB, Khan MA, Haddad TC. Potential application of artificial intelligence in cancer therapy. Curr Opin Oncol 2024; 36:437-448. [PMID: 39007164 DOI: 10.1097/cco.0000000000001068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
PURPOSE OF REVIEW This review underscores the critical role and challenges associated with the widespread adoption of artificial intelligence in cancer care to enhance disease management, streamline clinical processes, optimize data retrieval of health information, and generate and synthesize evidence. RECENT FINDINGS Advancements in artificial intelligence models and the development of digital biomarkers and diagnostics are applicable across the cancer continuum from early detection to survivorship care. Additionally, generative artificial intelligence has promised to streamline clinical documentation and patient communications, generate structured data for clinical trial matching, automate cancer registries, and facilitate advanced clinical decision support. Widespread adoption of artificial intelligence has been slow because of concerns about data diversity and data shift, model reliability and algorithm bias, legal oversight, and high information technology and infrastructure costs. SUMMARY Artificial intelligence models have significant potential to transform cancer care. Efforts are underway to deploy artificial intelligence models in the cancer practice, evaluate their clinical impact, and enhance their fairness and explainability. Standardized guidelines for the ethical integration of artificial intelligence models in cancer care pathways and clinical operations are needed. Clear governance and oversight will be necessary to gain trust in artificial intelligence-assisted cancer care by clinicians, scientists, and patients.
Collapse
Affiliation(s)
- Irbaz Bin Riaz
- Department of AI and Informatics, Mayo Clinic, Minnesota
- Division of Hematology and Oncology, Mayo Clinic, Phoenix, Arizona
| | | | - Tufia C Haddad
- Department of Oncology, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
2
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
3
|
Odje F, Meijer D, von Coburg E, van der Hooft JJJ, Dunst S, Medema MH, Volkamer A. Unleashing the potential of cell painting assays for compound activities and hazards prediction. FRONTIERS IN TOXICOLOGY 2024; 6:1401036. [PMID: 39086553 PMCID: PMC11288911 DOI: 10.3389/ftox.2024.1401036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 06/14/2024] [Indexed: 08/02/2024] Open
Abstract
The cell painting (CP) assay has emerged as a potent imaging-based high-throughput phenotypic profiling (HTPP) tool that provides comprehensive input data for in silico prediction of compound activities and potential hazards in drug discovery and toxicology. CP enables the rapid, multiplexed investigation of various molecular mechanisms for thousands of compounds at the single-cell level. The resulting large volumes of image data provide great opportunities but also pose challenges to image and data analysis routines as well as property prediction models. This review addresses the integration of CP-based phenotypic data together with or in substitute of structural information from compounds into machine (ML) and deep learning (DL) models to predict compound activities for various human-relevant disease endpoints and to identify the underlying modes-of-action (MoA) while avoiding unnecessary animal testing. The successful application of CP in combination with powerful ML/DL models promises further advances in understanding compound responses of cells guiding therapeutic development and risk assessment. Therefore, this review highlights the importance of unlocking the potential of CP assays when combined with molecular fingerprints for compound evaluation and discusses the current challenges that are associated with this approach.
Collapse
Affiliation(s)
- Floriane Odje
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Elena von Coburg
- Department Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | | | - Sebastian Dunst
- Department Experimental Toxicology and ZEBET, German Federal Institute for Risk Assessment (BfR), German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany
| | - Marnix H. Medema
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| |
Collapse
|
4
|
Fredin Haslum J, Lardeau CH, Karlsson J, Turkki R, Leuchowius KJ, Smith K, Müllers E. Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity. Nat Commun 2024; 15:3470. [PMID: 38658534 PMCID: PMC11043326 DOI: 10.1038/s41467-024-47171-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024] Open
Abstract
Identifying active compounds for a target is a time- and resource-intensive task in early drug discovery. Accurate bioactivity prediction using morphological profiles could streamline the process, enabling smaller, more focused compound screens. We investigate the potential of deep learning on unrefined single-concentration activity readouts and Cell Painting data, to predict compound activity across 140 diverse assays. We observe an average ROC-AUC of 0.744 ± 0.108 with 62% of assays achieving ≥0.7, 30% ≥0.8, and 7% ≥0.9. In many cases, the high prediction performance can be achieved using only brightfield images instead of multichannel fluorescence images. A comprehensive analysis shows that Cell Painting-based bioactivity prediction is robust across assay types, technologies, and target classes, with cell-based assays and kinase targets being particularly well-suited for prediction. Experimental validation confirms the enrichment of active compounds. Our findings indicate that models trained on Cell Painting data, combined with a small set of single-concentration data points, can reliably predict the activity of a compound library across diverse targets and assays while maintaining high hit rates and scaffold diversity. This approach has the potential to reduce the size of screening campaigns, saving time and resources, and enabling primary screening with more complex assays.
Collapse
Affiliation(s)
- Johan Fredin Haslum
- KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
- Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | | | - Johan Karlsson
- Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Riku Turkki
- Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | | | - Kevin Smith
- KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| | - Erik Müllers
- Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
5
|
Thomas JR, Shelton C, Murphy J, Brittain S, Bray MA, Aspesi P, Concannon J, King FJ, Ihry RJ, Ho DJ, Henault M, Hadjikyriacou A, Neri M, Sigoillot FD, Pham HT, Shum M, Barys L, Jones MD, Martin EJ, Blechschmidt A, Rieffel S, Troxler TJ, Mapa FA, Jenkins JL, Jain RK, Kutchukian PS, Schirle M, Renner S. Enhancing the Small-Scale Screenable Biological Space beyond Known Chemogenomics Libraries with Gray Chemical Matter─Compounds with Novel Mechanisms from High-Throughput Screening Profiles. ACS Chem Biol 2024; 19:938-952. [PMID: 38565185 PMCID: PMC11040606 DOI: 10.1021/acschembio.3c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/04/2024]
Abstract
Phenotypic assays have become an established approach to drug discovery. Greater disease relevance is often achieved through cellular models with increased complexity and more detailed readouts, such as gene expression or advanced imaging. However, the intricate nature and cost of these assays impose limitations on their screening capacity, often restricting screens to well-characterized small compound sets such as chemogenomics libraries. Here, we outline a cheminformatics approach to identify a small set of compounds with likely novel mechanisms of action (MoAs), expanding the MoA search space for throughput limited phenotypic assays. Our approach is based on mining existing large-scale, phenotypic high-throughput screening (HTS) data. It enables the identification of chemotypes that exhibit selectivity across multiple cell-based assays, which are characterized by persistent and broad structure activity relationships (SAR). We validate the effectiveness of our approach in broad cellular profiling assays (Cell Painting, DRUG-seq, and Promotor Signature Profiling) and chemical proteomics experiments. These experiments revealed that the compounds behave similarly to known chemogenetic libraries, but with a notable bias toward novel protein targets. To foster collaboration and advance research in this area, we have curated a public set of such compounds based on the PubChem BioAssay dataset and made it available for use by the scientific community.
Collapse
Affiliation(s)
- Jason R. Thomas
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Claude Shelton
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jason Murphy
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Scott Brittain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Mark-Anthony Bray
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Peter Aspesi
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - John Concannon
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Frederick J. King
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Robert J. Ihry
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Daniel J. Ho
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Martin Henault
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Marilisa Neri
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | | | - Helen T. Pham
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Matthew Shum
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Louise Barys
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | - Michael D. Jones
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Eric J. Martin
- Novartis
Biomedical Research, Emeryville, California 94608, United States
| | | | | | | | - Felipa A. Mapa
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jeremy L. Jenkins
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Rishi K. Jain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Markus Schirle
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | |
Collapse
|
6
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
7
|
Feng D, Liu B, Chen Z, Xu J, Geng M, Duan W, Ai J, Zhang H. Discovery of hematopoietic progenitor kinase 1 inhibitors using machine learning-based screening and free energy perturbation. J Biomol Struct Dyn 2024:1-13. [PMID: 38198294 DOI: 10.1080/07391102.2024.2301754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
Hematopoietic progenitor kinase 1 (HPK1) is a key negative regulator of T-cell receptor (TCR) signaling and a promising target for cancer immunotherapy. The development of novel HPK1 inhibitors is challenging yet promising. In this study, we used a combination of machine learning (ML)-based virtual screening and free energy perturbation (FEP) calculations to identify novel HPK1 inhibitors. ML-based screening yielded 10 potent HPK1 inhibitors (IC50 < 1 μM). The FEP-guided modification of the in-house false-positive hit, DW21302, revealed that a single key atom change could trigger activity cliffs. The resulting DW21302-A was a potent HPK1 inhibitor (IC50 = 2.1 nM) and potently inhibited cellular HPK1 signaling and enhanced T-cell function. Molecular dynamics (MD) simulations and ADME predictions confirmed DW21302-A as candidate compound. This study provides new strategies and chemical scaffolds for HPK1 inhibitor development.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Dazhi Feng
- Department of Medicinal Chemistry, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
- State Key Laboratory of Natural Medicines and Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, China
| | - Bo Liu
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
| | - Zhiwei Chen
- Department of Medicinal Chemistry, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinyi Xu
- State Key Laboratory of Natural Medicines and Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, China
| | - Meiyu Geng
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Laboratory of Yantai Drug Discovery, Bohai Rim Advanced Research Institute for Drug Discovery, Yantai, Shandong, China
| | - Wenhu Duan
- Department of Medicinal Chemistry, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- Shandong Laboratory of Yantai Drug Discovery, Bohai Rim Advanced Research Institute for Drug Discovery, Yantai, Shandong, China
| | - Jing Ai
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hefeng Zhang
- Department of Medicinal Chemistry, Shanghai Institute of Materia Medica (SIMM), Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
8
|
Yu L, He X, Fang X, Liu L, Liu J. Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening. J Chem Inf Model 2023; 63:6501-6514. [PMID: 37882338 DOI: 10.1021/acs.jcim.3c01371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
Structure-based virtual screening has been a crucial tool in drug discovery for decades. However, as the chemical space expands, the existing structure-based virtual screening techniques based on molecular docking and scoring struggle to handle billion-entry ultralarge libraries due to the high computational cost. To address this challenge, people have resorted to machine learning techniques to enhance structure-based virtual screening for efficiently exploring the vast chemical space. In those cases, compounds are usually treated as sequential strings or two-dimensional topology graphs, limiting their ability to incorporate three-dimensional structural information for downstream tasks. We herein propose a novel deep learning protocol, GEM-Screen, which utilizes the geometry-enhanced molecular representation of the compounds docking to a specific target and is trained on docking scores of a small fraction of a library through an active learning strategy to approximate the docking outcome for yet nontraining entries. This protocol is applied to virtual screening campaigns against the AmpC and D4 targets, demonstrating that GEM-Screen enriches more than 90% of the hit scaffolds for AmpC in the top 4% of model predictions and more than 80% of the hit scaffolds for D4 in the same top-ranking size of library. GEM-Screen can be used in conjunction with traditional docking programs for docking of only the top-ranked compounds to avoid the exhaustive docking of the whole library, thus allowing for discovering top-scoring compounds from billion-entry libraries in a rapid yet accurate fashion.
Collapse
Affiliation(s)
- Lan Yu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200062, China
| | - Xiaomin Fang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Lihang Liu
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen 518063, China
| | - Jinfeng Liu
- School of Science, China Pharmaceutical University, Nanjing 210009, China
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
9
|
Xiaolin X, Xiaozhi L, Guoping H, Hongwei L, Jinkuo G, Xiyun B, Zhen T, Xiaofang M, Yanxia L, Na X, Chunyan Z, Rui G, Kuan W, Cheng Z, Cuancuan W, Mingyong L, Xinping D. Overfit deep neural network for predicting drug-target interactions. iScience 2023; 26:107646. [PMID: 37680476 PMCID: PMC10480310 DOI: 10.1016/j.isci.2023.107646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 06/28/2023] [Accepted: 08/11/2023] [Indexed: 09/09/2023] Open
Abstract
Drug-target interactions (DTIs) prediction is an important step in drug discovery. As traditional biological experiments or high-throughput screening are high cost and time-consuming, many deep learning models have been developed. Overfitting must be avoided when training deep learning models. We propose a simple framework, called OverfitDTI, for DTI prediction. In OverfitDTI, a deep neural network (DNN) model is overfit to sufficiently learn the features of the chemical space of drugs and the biological space of targets. The weights of trained DNN model form an implicit representation of the nonlinear relationship between drugs and targets. Performance of OverfitDTI on three public datasets showed that the overfit DNN models fit the nonlinear relationship with high accuracy. We identified fifteen compounds that interacted with TEK, a receptor tyrosine kinase contributing to vascular homeostasis, and the predicted AT9283 and dorsomorphin were experimentally demonstrated as inhibitors of TEK in human umbilical vein endothelial cells (HUVECs).
Collapse
Affiliation(s)
- Xiao Xiaolin
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Xiaozhi
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - He Guoping
- Geriatrics Department, Traditional Chinese Medicine Hospital of Binhai New Area, Tianjin, China
| | - Liu Hongwei
- School of Clinical Medicine, North China University of Science and Technology, Tangshan, Hebei, China
- Department of Anesthesiology, Tangshan Maternal and Child Health Hospital, Tangshan, Hebei, China
| | - Guo Jinkuo
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| | - Bian Xiyun
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Tian Zhen
- Deepwater Technology Research Institute, China National Offshore Oil Corporation, Tianjin, China
| | - Ma Xiaofang
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Li Yanxia
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Xue Na
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Chunyan
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Gao Rui
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Kuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Cheng
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Cuancuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Mingyong
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Department of Urology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Du Xinping
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| |
Collapse
|
10
|
Seifermann M, Reiser P, Friederich P, Levkin PA. High-Throughput Synthesis and Machine Learning Assisted Design of Photodegradable Hydrogels. SMALL METHODS 2023; 7:e2300553. [PMID: 37287430 DOI: 10.1002/smtd.202300553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Indexed: 06/09/2023]
Abstract
Due to the large chemical space, the design of functional and responsive soft materials poses many challenges but also offers a wide range of opportunities in terms of the scope of possible properties. Herein, an experimental workflow for miniaturized combinatorial high-throughput screening of functional hydrogel libraries is reported. The data created from the analysis of the photodegradation process of more than 900 different types of hydrogel pads are used to train a machine learning model for automated decision making. Through iterative model optimization based on Bayesian optimization, a substantial improvement in response properties is achieved and thus expanded the scope of material properties obtainable within the chemical space of hydrogels in the study. It is therefore demonstrated that the potential of combining miniaturized high-throughput experiments with smart optimization algorithms for cost and time efficient optimization of materials properties.
Collapse
Affiliation(s)
- Maximilian Seifermann
- Institute of Biological and Chemical Systems-Functional Molecular Systems, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Patrick Reiser
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131, Karlsruhe, Germany
| | - Pascal Friederich
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131, Karlsruhe, Germany
| | - Pavel A Levkin
- Institute of Biological and Chemical Systems-Functional Molecular Systems, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
- Institute of Organic Chemistry, Karlsruhe Institute of Technology, Fritz-Haber-Weg 6, Karlsruhe, Germany
| |
Collapse
|
11
|
Combining metabolome and clinical indicators with machine learning provides some promising diagnostic markers to precisely detect smear-positive/negative pulmonary tuberculosis. BMC Infect Dis 2022; 22:707. [PMID: 36008772 PMCID: PMC9403968 DOI: 10.1186/s12879-022-07694-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 08/22/2022] [Indexed: 11/30/2022] Open
Abstract
Background Tuberculosis (TB) had been the leading lethal infectious disease worldwide for a long time (2014–2019) until the COVID-19 global pandemic, and it is still one of the top 10 death causes worldwide. One important reason why there are so many TB patients and death cases in the world is because of the difficulties in precise diagnosis of TB using common detection methods, especially for some smear-negative pulmonary tuberculosis (SNPT) cases. The rapid development of metabolome and machine learning offers a great opportunity for precision diagnosis of TB. However, the metabolite biomarkers for the precision diagnosis of smear-positive and smear-negative pulmonary tuberculosis (SPPT/SNPT) remain to be uncovered. In this study, we combined metabolomics and clinical indicators with machine learning to screen out newly diagnostic biomarkers for the precise identification of SPPT and SNPT patients. Methods Untargeted plasma metabolomic profiling was performed for 27 SPPT patients, 37 SNPT patients and controls. The orthogonal partial least squares-discriminant analysis (OPLS-DA) was then conducted to screen differential metabolites among the three groups. Metabolite enriched pathways, random forest (RF), support vector machines (SVM) and multilayer perceptron neural network (MLP) were performed using Metaboanalyst 5.0, “caret” R package, “e1071” R package and “Tensorflow” Python package, respectively. Results Metabolomic analysis revealed significant enrichment of fatty acid and amino acid metabolites in the plasma of SPPT and SNPT patients, where SPPT samples showed a more serious dysfunction in fatty acid and amino acid metabolisms. Further RF analysis revealed four optimized diagnostic biomarker combinations including ten features (two lipid/lipid-like molecules and seven organic acids/derivatives, and one clinical indicator) for the identification of SPPT, SNPT patients and controls with high accuracy (83–93%), which were further verified by SVM and MLP. Among them, MLP displayed the best classification performance on simultaneously precise identification of the three groups (94.74%), suggesting the advantage of MLP over RF/SVM to some extent. Conclusions Our findings reveal plasma metabolomic characteristics of SPPT and SNPT patients, provide some novel promising diagnostic markers for precision diagnosis of various types of TB, and show the potential of machine learning in screening out biomarkers from big data. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07694-8.
Collapse
|
12
|
He K. Pharmacological affinity fingerprints derived from bioactivity data for the identification of designer drugs. J Cheminform 2022; 14:35. [PMID: 35672835 PMCID: PMC9171973 DOI: 10.1186/s13321-022-00607-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022] Open
Abstract
Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (Ph-fp) of NPS compounds were predicted by Random Forest classification models using bioactivity data from the ChEMBL database. The binary Ph-fp is the vector consisting of a compound's activity against a list of molecular targets reported to be responsible for the pharmacological effects of NPS. Their performance in similarity searching and unsupervised clustering was assessed and compared to 2D structure fingerprints Morgan and MACCS (1024-bits ECFP4 and 166-bits SMARTS-based MACCS implementation of RDKit). The performance in retrieving compounds according to their pharmacological categorizations is influenced by the predicted active assay counts in Ph-fp and the choice of similarity metric. Overall, the comparative unsupervised clustering analysis suggests the use of a classification model with Morgan fingerprints as input for the construction of Ph-fp. This combination gives satisfactory clustering performance based on external and internal clustering validation indices.
Collapse
Affiliation(s)
- Kedan He
- Physical Sciences, Eastern Connecticut State University, 83 Windham St, Willimantic, CT, 06226, USA.
| |
Collapse
|
13
|
Wet-dry-wet drug screen leads to the synthesis of TS1, a novel compound reversing lung fibrosis through inhibition of myofibroblast differentiation. Cell Death Dis 2021; 13:2. [PMID: 34916483 PMCID: PMC8677786 DOI: 10.1038/s41419-021-04439-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 11/18/2021] [Accepted: 11/29/2021] [Indexed: 11/09/2022]
Abstract
Therapies halting the progression of fibrosis are ineffective and limited. Activated myofibroblasts are emerging as important targets in the progression of fibrotic diseases. Previously, we performed a high-throughput screen on lung fibroblasts and subsequently demonstrated that the inhibition of myofibroblast activation is able to prevent lung fibrosis in bleomycin-treated mice. High-throughput screens are an ideal method of repurposing drugs, yet they contain an intrinsic limitation, which is the size of the library itself. Here, we exploited the data from our "wet" screen and used "dry" machine learning analysis to virtually screen millions of compounds, identifying novel anti-fibrotic hits which target myofibroblast differentiation, many of which were structurally related to dopamine. We synthesized and validated several compounds ex vivo ("wet") and confirmed that both dopamine and its derivative TS1 are powerful inhibitors of myofibroblast activation. We further used RNAi-mediated knock-down and demonstrated that both molecules act through the dopamine receptor 3 and exert their anti-fibrotic effect by inhibiting the canonical transforming growth factor β pathway. Furthermore, molecular modelling confirmed the capability of TS1 to bind both human and mouse dopamine receptor 3. The anti-fibrotic effect on human cells was confirmed using primary fibroblasts from idiopathic pulmonary fibrosis patients. Finally, TS1 prevented and reversed disease progression in a murine model of lung fibrosis. Both our interdisciplinary approach and our novel compound TS1 are promising tools for understanding and combating lung fibrosis.
Collapse
|
14
|
Mohanty E, Mohanty A. Role of artificial intelligence in peptide vaccine design against RNA viruses. INFORMATICS IN MEDICINE UNLOCKED 2021; 26:100768. [PMID: 34722851 PMCID: PMC8536498 DOI: 10.1016/j.imu.2021.100768] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 10/16/2021] [Accepted: 10/16/2021] [Indexed: 01/18/2023] Open
Abstract
RNA viruses have high rate of replication and mutation that help them adapt and change according to their environmental conditions. Many viral mutants are the cause of various severe and lethal diseases. Vaccines, on the other hand have the capacity to protect us from infectious diseases by eliciting antibody or cell-mediated immune responses that are pathogen-specific. While there are a few reviews pertaining to the use of artificial intelligence (AI) for SARS-COV-2 vaccine development, none focus on peptide vaccination for RNA viruses and the important role played by AI in it. Peptide vaccine which is slowly coming to be recognized as a safe and effective vaccination strategy has the capacity to overcome the mutant escape problem which is also being currently faced by SARS-COV-2 vaccines in circulation.Here we review the present scenario of peptide vaccines which are developed using mathematical and computational statistics methods to prevent the spread of disease caused by RNA viruses. We also focus on the importance and current stage of AI and mathematical evolutionary modeling using machine learning tools in the establishment of these new peptide vaccines for the control of viral disease.
Collapse
Affiliation(s)
- Eileena Mohanty
- Trident School of Biotech Sciences, Trident Academy of Creative Technology (TACT), Bhubaneswar, Odisha, 751024, India
| | - Anima Mohanty
- School of Biotechnology (KSBT), KIIT University-2, Bhubaneswar, 751024, India
| |
Collapse
|
15
|
Aghamiri SS, Amin R, Helikar T. Recent applications of quantitative systems pharmacology and machine learning models across diseases. J Pharmacokinet Pharmacodyn 2021; 49:19-37. [PMID: 34671863 PMCID: PMC8528185 DOI: 10.1007/s10928-021-09790-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 10/07/2021] [Indexed: 12/29/2022]
Abstract
Quantitative systems pharmacology (QSP) is a quantitative and mechanistic platform describing the phenotypic interaction between drugs, biological networks, and disease conditions to predict optimal therapeutic response. In this meta-analysis study, we review the utility of the QSP platform in drug development and therapeutic strategies based on recent publications (2019-2021). We gathered recent original QSP models and described the diversity of their applications based on therapeutic areas, methodologies, software platforms, and functionalities. The collection and investigation of these publications can assist in providing a repository of recent QSP studies to facilitate the discovery and further reusability of QSP models. Our review shows that the largest number of QSP efforts in recent years is in Immuno-Oncology. We also addressed the benefits of integrative approaches in this field by presenting the applications of Machine Learning methods for drug discovery and QSP models. Based on this meta-analysis, we discuss the advantages and limitations of QSP models and propose fields where the QSP approach constitutes a valuable interface for more investigations to tackle complex diseases and improve drug development.
Collapse
Affiliation(s)
- Sara Sadat Aghamiri
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Rada Amin
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, USA.
| | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
16
|
Wilm A, Garcia de Lomana M, Stork C, Mathai N, Hirte S, Norinder U, Kühnl J, Kirchmair J. Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors. Pharmaceuticals (Basel) 2021; 14:ph14080790. [PMID: 34451887 PMCID: PMC8402010 DOI: 10.3390/ph14080790] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 08/03/2021] [Accepted: 08/06/2021] [Indexed: 02/06/2023] Open
Abstract
In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.
Collapse
Affiliation(s)
- Anke Wilm
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
- HITeC e.V., 22527 Hamburg, Germany
| | - Marina Garcia de Lomana
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
| | - Neann Mathai
- Computational Biology Unit (CBU), Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
| | - Steffen Hirte
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
| | - Ulf Norinder
- MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden;
- Department of Computer and Systems Sciences, Stockholm University, SE-16407 Kista, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, SE-75124 Uppsala, Sweden
| | - Jochen Kühnl
- Front End Innovation, Beiersdorf AG, 22529 Hamburg, Germany;
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; (A.W.); (C.S.)
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; (M.G.d.L.); (S.H.)
- Correspondence: ; Tel.: +43-1-4277-55104
| |
Collapse
|
17
|
Garcia de Lomana M, Morger A, Norinder U, Buesen R, Landsiedel R, Volkamer A, Kirchmair J, Mathea M. ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities. J Chem Inf Model 2021; 61:3255-3272. [PMID: 34153183 PMCID: PMC8317154 DOI: 10.1021/acs.jcim.1c00451] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Indexed: 02/07/2023]
Abstract
Computational methods such as machine learning approaches have a strong track record of success in predicting the outcomes of in vitro assays. In contrast, their ability to predict in vivo endpoints is more limited due to the high number of parameters and processes that may influence the outcome. Recent studies have shown that the combination of chemical and biological data can yield better models for in vivo endpoints. The ChemBioSim approach presented in this work aims to enhance the performance of conformal prediction models for in vivo endpoints by combining chemical information with (predicted) bioactivity assay outcomes. Three in vivo toxicological endpoints, capturing genotoxic (MNT), hepatic (DILI), and cardiological (DICC) issues, were selected for this study due to their high relevance for the registration and authorization of new compounds. Since the sparsity of available biological assay data is challenging for predictive modeling, predicted bioactivity descriptors were introduced instead. Thus, a machine learning model for each of the 373 collected biological assays was trained and applied on the compounds of the in vivo toxicity data sets. Besides the chemical descriptors (molecular fingerprints and physicochemical properties), these predicted bioactivities served as descriptors for the models of the three in vivo endpoints. For this study, a workflow based on a conformal prediction framework (a method for confidence estimation) built on random forest models was developed. Furthermore, the most relevant chemical and bioactivity descriptors for each in vivo endpoint were preselected with lasso models. The incorporation of bioactivity descriptors increased the mean F1 scores of the MNT model from 0.61 to 0.70 and for the DICC model from 0.72 to 0.82 while the mean efficiencies increased by roughly 0.10 for both endpoints. In contrast, for the DILI endpoint, no significant improvement in model performance was observed. Besides pure performance improvements, an analysis of the most important bioactivity features allowed detection of novel and less intuitive relationships between the predicted biological assay outcomes used as descriptors and the in vivo endpoints. This study presents how the prediction of in vivo toxicity endpoints can be improved by the incorporation of biological information-which is not necessarily captured by chemical descriptors-in an automated workflow without the need for adding experimental workload for the generation of bioactivity descriptors as predicted outcomes of bioactivity assays were utilized. All bioactivity CP models for deriving the predicted bioactivities, as well as the in vivo toxicity CP models, can be freely downloaded from https://doi.org/10.5281/zenodo.4761225.
Collapse
Affiliation(s)
- Marina Garcia de Lomana
- BASF
SE, Ludwigshafen am Rhein 67063, Germany
- Department
of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna 1090, Austria
| | - Andrea Morger
- In Silico
Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz
1, Berlin 10117, Germany
| | - Ulf Norinder
- MTM
Research Centre, School of Science and Technology, Örebro University, Örebro SE-70182, Sweden
| | | | | | - Andrea Volkamer
- In Silico
Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz
1, Berlin 10117, Germany
| | - Johannes Kirchmair
- Department
of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna 1090, Austria
| | | |
Collapse
|
18
|
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021; 61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Collapse
Affiliation(s)
- Carmen Esposito
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland.,T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
19
|
Biological activity-based modeling identifies antiviral leads against SARS-CoV-2. Nat Biotechnol 2021; 39:747-753. [PMID: 33623157 PMCID: PMC9843700 DOI: 10.1038/s41587-021-00839-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 01/25/2021] [Indexed: 01/29/2023]
Abstract
Computational approaches for drug discovery, such as quantitative structure-activity relationship, rely on structural similarities of small molecules to infer biological activity but are often limited to identifying new drug candidates in the chemical spaces close to known ligands. Here we report a biological activity-based modeling (BABM) approach, in which compound activity profiles established across multiple assays are used as signatures to predict compound activity in other assays or against a new target. This approach was validated by identifying candidate antivirals for Zika and Ebola viruses based on high-throughput screening data. BABM models were then applied to predict 311 compounds with potential activity against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Of the predicted compounds, 32% had antiviral activity in a cell culture live virus assay, the most potent compounds showing a half-maximal inhibitory concentration in the nanomolar range. Most of the confirmed anti-SARS-CoV-2 compounds were found to be viral entry inhibitors and/or autophagy modulators. The confirmed compounds have the potential to be further developed into anti-SARS-CoV-2 therapies.
Collapse
|
20
|
Discovery of Novel eEF2K Inhibitors Using HTS Fingerprint Generated from Predicted Profiling of Compound-Protein Interactions. MEDICINES 2021; 8:medicines8050023. [PMID: 34065377 PMCID: PMC8161098 DOI: 10.3390/medicines8050023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/24/2021] [Accepted: 05/18/2021] [Indexed: 11/29/2022]
Abstract
Background: Eukaryotic elongation factor 2 kinase (eEF2K) regulates the elongation stage of protein synthesis by phosphorylating eEF2, a process related to various diseases including cancer and cardiovascular and neurodegenerative diseases. In this study, we describe the identification of novel eEF2K inhibitors using high-throughput screening fingerprints (HTSFP) generated from predicted profiling of compound-protein interactions (CPIs). Methods: We utilized computationally generated HTSFPs referred to as chemical genomics-based fingerprint (CGBFP). Generally, HTSFPs are generated from multiple biochemical or cell-based assay data. On the other hand, CGBFPs are generated from computational prediction of CPIs using the Chemical Genomics-Based Virtual Screening (CGBVS) method. Therefore, CGBFPs do not have missing information mainly caused by the absence of assay data. Results: Chemogenomics-Based Similarity Profiling (CGBSP) of the screening library (2.6 million compounds) yielded 27 compounds which were evaluated for in vitro eEF2K inhibitory activity. Three compounds with interesting results were identified. Compounds 2 (IC50 = 11.05 μM) and 4 (IC50 = 43.54 μM) are thieno[2,3-b]pyridine derivatives that have the same scaffolds with a known eEF2K inhibitor, while compound 13 (IC50 = 70.13 μM) was a new thiophene-2-amine-type eEF2K inhibitor. Conclusions: CGBSP supplied an efficient strategy in the identification of novel eEF2K inhibitors and provided useful scaffolds for optimization.
Collapse
|
21
|
Kamerzell TJ, Middaugh CR. Prediction Machines: Applied Machine Learning for Therapeutic Protein Design and Development. J Pharm Sci 2020; 110:665-681. [PMID: 33278409 DOI: 10.1016/j.xphs.2020.11.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/27/2020] [Accepted: 11/27/2020] [Indexed: 12/11/2022]
Abstract
The rapid growth in technological advances and quantity of scientific data over the past decade has led to several challenges including data storage and analysis. Accurate models of complex datasets were previously difficult to develop and interpret. However, improvements in machine learning algorithms have since enabled unparalleled classification and prediction capabilities. The application of machine learning can be seen throughout diverse industries due to their ease of use and interpretability. In this review, we describe popular machine learning algorithms and highlight their application in pharmaceutical protein development. Machine learning models have now been applied to better understand the nonlinear concentration dependent viscosity of protein solutions, predict protein oxidation and deamidation rates, classify sub-visible particles and compare the physical stability of proteins. We also applied several machine learning algorithms using previously published data and describe models with improved predictions and classification. The authors hope that this review can be used as a resource to others and encourage continued application of machine learning algorithms to problems in pharmaceutical protein development.
Collapse
Affiliation(s)
- Tim J Kamerzell
- Department of Pharmaceutical Chemistry, The University of Kansas, Lawrence, KS, USA; Division of Internal Medicine, HCA MidWest Health, Overland Park, KS, USA.
| | - C Russell Middaugh
- Department of Pharmaceutical Chemistry, The University of Kansas, Lawrence, KS, USA
| |
Collapse
|
22
|
Early lung cancer diagnostic biomarker discovery by machine learning methods. Transl Oncol 2020; 14:100907. [PMID: 33217646 PMCID: PMC7683339 DOI: 10.1016/j.tranon.2020.100907] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/21/2020] [Accepted: 09/25/2020] [Indexed: 02/07/2023] Open
Abstract
Early diagnosis could improve lung cancer survival rate. The availability of blood-based screening could increase lung cancer patient uptake. An interdisciplinary mechanism combines metabolomics and machine learning methods. Metabolic biomarkers could be potential screening biomarkers for early detection of lung cancer. Naïve Bayes is recommended as an exploitable tool for early lung tumor prediction.
Early diagnosis has been proved to improve survival rate of lung cancer patients. The availability of blood-based screening could increase early lung cancer patient uptake. Our present study attempted to discover Chinese patients’ plasma metabolites as diagnostic biomarkers for lung cancer. In this work, we use a pioneering interdisciplinary mechanism, which is firstly applied to lung cancer, to detect early lung cancer diagnostic biomarkers by combining metabolomics and machine learning methods. We collected total 110 lung cancer patients and 43 healthy individuals in our study. Levels of 61 plasma metabolites were from targeted metabolomic study using LC-MS/MS. A specific combination of six metabolic biomarkers note-worthily enabling the discrimination between stage I lung cancer patients and healthy individuals (AUC = 0.989, Sensitivity = 98.1%, Specificity = 100.0%). And the top 5 relative importance metabolic biomarkers developed by FCBF algorithm also could be potential screening biomarkers for early detection of lung cancer. Naïve Bayes is recommended as an exploitable tool for early lung tumor prediction. This research will provide strong support for the feasibility of blood-based screening, and bring a more accurate, quick and integrated application tool for early lung cancer diagnostic. The proposed interdisciplinary method could be adapted to other cancer beyond lung cancer.
Collapse
|
23
|
Hsieh JH, Sedykh A, Mutlu E, Germolec DR, Auerbach SS, Rider CV. Harnessing In Silico, In Vitro, and In Vivo Data to Understand the Toxicity Landscape of Polycyclic Aromatic Compounds (PACs). Chem Res Toxicol 2020; 34:268-285. [PMID: 33063992 DOI: 10.1021/acs.chemrestox.0c00213] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Polycyclic aromatic compounds (PACs) are compounds with a minimum of two six-atom aromatic fused rings. PACs arise from incomplete combustion or thermal decomposition of organic matter and are ubiquitous in the environment. Within PACs, carcinogenicity is generally regarded to be the most important public health concern. However, toxicity in other systems (reproductive and developmental toxicity, immunotoxicity) has also been reported. Despite the large number of PACs identified in the environment, research attention to understand exposure and health effects of PACs has focused on a relatively limited subset, namely polycyclic aromatic hydrocarbons (PAHs), the PACs with only carbon and hydrogen atoms. To triage the rest of the vast number of PACs for more resource-intensive testing, we developed a data-driven approach to contextualize hazard characterization of PACs, by leveraging the available data from various data streams (in silico toxicity, in vitro activity, structural fingerprints, and in vivo data availability). The PACs were clustered on the basis of their in silico toxicity profiles containing predictions from 8 different categories (carcinogenicity, cardiotoxicity, developmental toxicity, genotoxicity, hepatotoxicity, neurotoxicity, reproductive toxicity, and urinary toxicity). We found that PACs with the same parent structure (e.g., fluorene) could have diverse in silico toxicity profiles. In contrast, PACs with similar substituted groups (e.g., alkylated-PAHs) or heterocyclics (e.g., N-PACs) with varying ring sizes could have similar in silico toxicity profiles, suggesting that these groups are better candidates for toxicity read-across analysis. The clusters/regions associated with certain in silico toxicity, in vitro activity, and structural fingerprints were identified. We found that genotoxicity/carcinogenicity (in silico toxicity) and xenobiotic homeostasis and stress response (in vitro activity), respectively, dominate the toxicity/activity variation seen in the PACs. The "hot spots" with enriched toxicity/activity in conjunction with availability of in vivo carcinogenicity data revealed regions of either data-poor (hydroxylated-PAHs) or data-rich (unsubstituted, parent PAHs) PACs. These regions offer potential targets for prioritization of further in vivo assessment and for chemical read-across efforts. The analysis results are searchable through an interactive web application (https://ntp.niehs.nih.gov/go/pacs_tableau), allowing for alternative hypothesis generation.
Collapse
Affiliation(s)
- Jui-Hua Hsieh
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, North Carolina 27709, United States
| | | | - Esra Mutlu
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, North Carolina 27709, United States
| | - Dori R Germolec
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, North Carolina 27709, United States
| | - Scott S Auerbach
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, North Carolina 27709, United States
| | - Cynthia V Rider
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, North Carolina 27709, United States
| |
Collapse
|
24
|
Stojanović L, Popović M, Tijanić N, Rakočević G, Kalinić M. Improved Scaffold Hopping in Ligand-Based Virtual Screening Using Neural Representation Learning. J Chem Inf Model 2020; 60:4629-4639. [DOI: 10.1021/acs.jcim.0c00622] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Miloš Popović
- Totient, Inc., Sinđelićeva 9, 11000 Belgrade, Serbia
| | | | | | - Marko Kalinić
- Totient, Inc., Sinđelićeva 9, 11000 Belgrade, Serbia
| |
Collapse
|
25
|
Raschka S, Kaufman B. Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods 2020; 180:89-110. [PMID: 32645448 PMCID: PMC8457393 DOI: 10.1016/j.ymeth.2020.06.016] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 02/06/2023] Open
Abstract
In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.
Collapse
Affiliation(s)
- Sebastian Raschka
- University of Wisconsin-Madison, Department of Statistics, United States.
| | - Benjamin Kaufman
- University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics, United States
| |
Collapse
|
26
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
27
|
Norinder U, Spjuth O, Svensson F. Using Predicted Bioactivity Profiles to Improve Predictive Modeling. J Chem Inf Model 2020; 60:2830-2837. [PMID: 32374618 DOI: 10.1021/acs.jcim.0c00250] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles.
Collapse
Affiliation(s)
- Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124 Uppsala, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124 Uppsala, Sweden.,Science for Life Laboratory, Uppsala University, Box 591, SE-75124 Uppsala, Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute, The Cruciform Building, Gower Street, WC1E 6BT London, U.K
| |
Collapse
|
28
|
Wang Y, Chen Z, Bian F, Shang L, Zhu K, Zhao Y. Advances of droplet-based microfluidics in drug discovery. Expert Opin Drug Discov 2020; 15:969-979. [DOI: 10.1080/17460441.2020.1758663] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Yuetong Wang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Zhuoyue Chen
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Feika Bian
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Luoran Shang
- Zhongshan-Xuhui Hospital, Fudan University, and the Shanghai Key Laboratory of Medical Epigenetics, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Kaixuan Zhu
- School of Electrical and Information Engineering, Suzhou Institute of Technology, Jiangsu University of Science and Technology, Zhangjiagang, China
| | - Yuanjin Zhao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
29
|
Réda C, Kaufmann E, Delahaye-Duriez A. Machine learning applications in drug development. Comput Struct Biotechnol J 2019; 18:241-252. [PMID: 33489002 PMCID: PMC7790737 DOI: 10.1016/j.csbj.2019.12.006] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 02/07/2023] Open
Abstract
Due to the huge amount of biological and medical data available today, along with well-established machine learning algorithms, the design of largely automated drug development pipelines can now be envisioned. These pipelines may guide, or speed up, drug discovery; provide a better understanding of diseases and associated biological phenomena; help planning preclinical wet-lab experiments, and even future clinical trials. This automation of the drug development process might be key to the current issue of low productivity rate that pharmaceutical companies currently face. In this survey, we will particularly focus on two classes of methods: sequential learning and recommender systems, which are active biomedical fields of research.
Collapse
Affiliation(s)
- Clémence Réda
- NeuroDiderot, UMR 1141, Inserm, Université de Paris, Sorbonne Paris Cité, Hôpital Robert Debré, 48, boulevard Sérurier, Paris 75019, France
- Université Paris Diderot, Université de Paris, Sorbonne Paris Cité, 5, rue Thomas Mann, Paris 75013, France
| | - Emilie Kaufmann
- Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France
| | - Andrée Delahaye-Duriez
- NeuroDiderot, UMR 1141, Inserm, Université de Paris, Sorbonne Paris Cité, Hôpital Robert Debré, 48, boulevard Sérurier, Paris 75019, France
- Université Paris 13, Sorbonne Paris Cité, UFR de santé, médecine et biologie humaine, Bobigny 93000, France
- Service histologie-embryologie-cytogénétique-biologie de la reproduction-CECOS, Hôpital Jean Verdier, AP-HP, Bondy 93140, France
| |
Collapse
|
30
|
|
31
|
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 2019; 18:463-477. [PMID: 30976107 DOI: 10.1038/s41573-019-0024-5] [Citation(s) in RCA: 979] [Impact Index Per Article: 195.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
Collapse
Affiliation(s)
- Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Dominic Clark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | - Ian Dunham
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Edgardo Ferran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - George Lee
- Bristol-Myers Squibb, Princeton, NJ, USA
| | - Bin Li
- Takeda Pharmaceuticals International Co., Cambridge, MA, USA
| | - Anant Madabhushi
- Case Western Reserve University, Cleveland, OH, USA.,Louis Stokes Cleveland Veterans Affair Medical Center, Cleveland, OH, USA
| | | | - Michaela Spitzer
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Shanrong Zhao
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| |
Collapse
|
32
|
David L, Arús-Pous J, Karlsson J, Engkvist O, Bjerrum EJ, Kogej T, Kriegl JM, Beck B, Chen H. Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research. Front Pharmacol 2019; 10:1303. [PMID: 31749705 PMCID: PMC6848277 DOI: 10.3389/fphar.2019.01303] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 10/14/2019] [Indexed: 12/21/2022] Open
Abstract
In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Josep Arús-Pous
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Johan Karlsson
- Quantitative Biology, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jan M. Kriegl
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health – Guangdong Laboratory, Guangzhou, China
| |
Collapse
|
33
|
Lu Y, Anand S, Shirley W, Gedeck P, Kelley BP, Skolnik S, Rodde S, Nguyen M, Lindvall M, Jia W. Prediction of pKa Using Machine Learning Methods with Rooted Topological Torsion Fingerprints: Application to Aliphatic Amines. J Chem Inf Model 2019; 59:4706-4719. [DOI: 10.1021/acs.jcim.9b00498] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Yipin Lu
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Shankara Anand
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - William Shirley
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Peter Gedeck
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Brian P. Kelley
- Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Suzanne Skolnik
- Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Stephane Rodde
- Novartis Institutes for Biomedical Research, Postfach, CH-4002 Basel, Switzerland
| | - Mai Nguyen
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Mika Lindvall
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Weiping Jia
- Novartis Institutes for Biomedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| |
Collapse
|
34
|
Laufkötter O, Sturm N, Bajorath J, Chen H, Engkvist O. Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold hopping capability. J Cheminform 2019; 11:54. [PMID: 31396716 PMCID: PMC6686534 DOI: 10.1186/s13321-019-0376-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 07/31/2019] [Indexed: 11/29/2022] Open
Abstract
This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs) and thereby showcasing the benefits of combining different descriptor types. This type of descriptor would be applied in an iterative screening scenario for more targeted compound set selection. The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The bioactivity-structure hybrid (BaSH) fingerprint was benchmarked against the individual ECFP4 and HTSFP fingerprints. Their performance was evaluated via retrospective analysis of a subset of the PubChem HTS data. Results showed that the BaSH fingerprint has improved predictive performance as well as scaffold hopping capability. The BaSH fingerprint identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the BaSH fingerprint. This hybrid approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.![]()
Collapse
Affiliation(s)
- Oliver Laufkötter
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany.
| | - Noé Sturm
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
35
|
Predicting kinase inhibitors using bioactivity matrix derived informer sets. PLoS Comput Biol 2019; 15:e1006813. [PMID: 31381559 PMCID: PMC6695194 DOI: 10.1371/journal.pcbi.1006813] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 08/15/2019] [Accepted: 07/13/2019] [Indexed: 12/21/2022] Open
Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS. In the early stages of drug discovery efforts, computational models are used to predict activity and prioritize compounds for experimental testing. New targets commonly lack the data necessary to build effective models, and the screening needed to generate that experimental data can be costly. We seek to improve the efficiency of the initial screening phase, and of the process of prioritizing compounds for subsequent screening. We choose a small informer set of compounds based on publicly available prior screening data on distinct targets. We then collect experimental data on these informer compounds and use that data to predict the activity of other compounds in the set for the target of interest. Computational and statistical tools are needed to identify informer compounds and to prioritize other compounds for subsequent phases of screening. We find that selection of informer compounds on the basis of bioactivity data from previous screening efforts is superior to the traditional approach of selection of a chemically diverse subset of compounds. We demonstrate the success of this approach in retrospective tests on the Published Kinase Inhibitor Sets (PKIS) chemogenomic data and in prospective experimental screens against three additional non-human kinase targets.
Collapse
|
36
|
Jansen JM, De Pascale G, Fong S, Lindvall M, Moser HE, Pfister K, Warne B, Wartchow C. Biased Complement Diversity Selection for Effective Exploration of Chemical Space in Hit-Finding Campaigns. J Chem Inf Model 2019; 59:1709-1714. [DOI: 10.1021/acs.jcim.9b00048] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Johanna M. Jansen
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Gianfranco De Pascale
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Susan Fong
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Mika Lindvall
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Heinz E. Moser
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Keith Pfister
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Bob Warne
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| | - Charles Wartchow
- Novartis Institutes for BioMedical Research, 5300 Chiron Way, Emeryville, California 94608, United States
| |
Collapse
|
37
|
Sturm N, Sun J, Vandriessche Y, Mayr A, Klambauer G, Carlsson L, Engkvist O, Chen H. Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models. J Chem Inf Model 2018; 59:962-972. [DOI: 10.1021/acs.jcim.8b00550] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Noé Sturm
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Jiangming Sun
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Yves Vandriessche
- Intel Corporation, Data Center Group, Veldkant 31, 2550 Kontich, Belgium
| | - Andreas Mayr
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstr 69, 4040 Linz, Austria
| | - Günter Klambauer
- LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstr 69, 4040 Linz, Austria
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Pepparedsleden 1, 43153 Mölndal, Sweden
| |
Collapse
|
38
|
Mason DJ, Eastman RT, Lewis RPI, Stott IP, Guha R, Bender A. Using Machine Learning to Predict Synergistic Antimalarial Compound Combinations With Novel Structures. Front Pharmacol 2018; 9:1096. [PMID: 30333748 PMCID: PMC6176478 DOI: 10.3389/fphar.2018.01096] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 09/07/2018] [Indexed: 01/28/2023] Open
Abstract
The parasite Plasmodium falciparum is the most lethal species of Plasmodium to cause serious malaria infection in humans, and with resistance developing rapidly novel treatment modalities are currently being sought, one of which being combinations of existing compounds. The discovery of combinations of antimalarial drugs that act synergistically with one another is hence of great importance; however an exhaustive experimental screen of large drug space in a pairwise manner is not an option. In this study we apply our machine learning approach, Combination Synergy Estimation (CoSynE), which can predict novel synergistic drug interactions using only prior experimental combination screening data and knowledge of compound molecular structures, to a dataset of 1,540 antimalarial drug combinations in which 22.2% were synergistic. Cross validation of our model showed that synergistic CoSynE predictions are enriched 2.74 × compared to random selection when both compounds in a predicted combination are known from other combinations among the training data, 2.36 × when only one compound is known from the training data, and 1.5 × for entirely novel combinations. We prospectively validated our model by making predictions for 185 combinations of 23 entirely novel compounds. CoSynE predicted 20 combinations to be synergistic, which was experimentally validated for nine of them (45%), corresponding to an enrichment of 1.70 × compared to random selection from this prospective data set. Such enrichment corresponds to a 41% reduction in experimental effort. Interestingly, we found that pairwise screening of the compounds CoSynE individually predicted to be synergistic would result in an enrichment of 1.36 × compared to random selection, indicating that synergy among compound combinations is not a random event. The nine novel and correctly predicted synergistic compound combinations mainly (where sufficient bioactivity information is available) consist of efflux or transporter inhibitors (such as hydroxyzine), combined with compounds exhibiting antimalarial activity alone (such as sorafenib, apicidin, or dihydroergotamine). However, not all compound synergies could be rationalized easily in this way. Overall, this study highlights the potential for predictive modeling to expedite the discovery of novel drug combinations in fight against antimalarial resistance, while the underlying approach is also generally applicable.
Collapse
Affiliation(s)
- Daniel J Mason
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom.,Healx Ltd., Cambridge, United Kingdom
| | - Richard T Eastman
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Richard P I Lewis
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Ian P Stott
- Unilever Research and Development, Wirral, United Kingdom
| | - Rajarshi Guha
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
39
|
Paricharak S, Méndez-Lucio O, Chavan Ravindranath A, Bender A, IJzerman AP, van Westen GJP. Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening. Brief Bioinform 2018; 19:277-285. [PMID: 27789427 PMCID: PMC6018726 DOI: 10.1093/bib/bbw105] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/26/2016] [Indexed: 12/25/2022] Open
Abstract
High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics. This is evidenced by increased hit rates in iterative screening strategies and novel insights into compound mode of action obtained through activity modeling. Here, we provide an overview of the developments in data-driven approaches, elaborate on novel activity modeling techniques and screening paradigms explored and outline their significance in HTS.
Collapse
Affiliation(s)
- Shardul Paricharak
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Oscar Méndez-Lucio
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Facultad de Química, Departamento de Farmacia, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City, Mexico
| | - Aakash Chavan Ravindranath
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| |
Collapse
|
40
|
Cortes Cabrera A, Petrone PM. Optimal HTS Fingerprint Definitions by Using a Desirability Function and a Genetic Algorithm. J Chem Inf Model 2018; 58:641-646. [DOI: 10.1021/acs.jcim.7b00447] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Alvaro Cortes Cabrera
- GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Paula M. Petrone
- BarcelonaBeta Brain Research Center, Carrer de Wellington, 30, 08005 Barcelona, Spain
| |
Collapse
|
41
|
|
42
|
Pertusi DA, O’Donnell G, Homsher MF, Solly K, Patel A, Stahler SL, Riley D, Finley MF, Finger EN, Adam GC, Meng J, Bell DJ, Zuck PD, Hudak EM, Weber MJ, Nothstein JE, Locco L, Quinn C, Amoss A, Squadroni B, Hartnett M, Heo MR, White T, May SA, Boots E, Roberts K, Cocchiarella P, Wolicki A, Kreamer A, Kutchukian PS, Wassermann AM, Uebele VN, Glick M, Rusinko A, Culberson JC. Prospective Assessment of Virtual Screening Heuristics Derived Using a Novel Fusion Score. SLAS DISCOVERY 2017; 22:995-1006. [DOI: 10.1177/2472555217706058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
High-throughput screening (HTS) is a widespread method in early drug discovery for identifying promising chemical matter that modulates a target or phenotype of interest. Because HTS campaigns involve screening millions of compounds, it is often desirable to initiate screening with a subset of the full collection. Subsequently, virtual screening methods prioritize likely active compounds in the remaining collection in an iterative process. With this approach, orthogonal virtual screening methods are often applied, necessitating the prioritization of hits from different approaches. Here, we introduce a novel method of fusing these prioritizations and benchmark it prospectively on 17 screening campaigns using virtual screening methods in three descriptor spaces. We found that the fusion approach retrieves 15% to 65% more active chemical series than any single machine-learning method and that appropriately weighting contributions of similarity and machine-learning scoring techniques can increase enrichment by 1% to 19%. We also use fusion scoring to evaluate the tradeoff between screening more chemical matter initially in lieu of replicate samples to prevent false-positives and find that the former option leads to the retrieval of more active chemical series. These results represent guidelines that can increase the rate of identification of promising active compounds in future iterative screens.
Collapse
Affiliation(s)
- Dante A. Pertusi
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | - Gregory O’Donnell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michelle F. Homsher
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Kelli Solly
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Amita Patel
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Shannon L. Stahler
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Daniel Riley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michael F. Finley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
| | - Eleftheria N. Finger
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery & Preclinical Development, GlaxoSmithKline, Collegeville, PA, USA
| | - Gregory C. Adam
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Juncai Meng
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - David J. Bell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Paul D. Zuck
- Merck & Co., Inc., North Wales, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Edward M. Hudak
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - Michael J. Weber
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Jennifer E. Nothstein
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Louis Locco
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Carissa Quinn
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Adam Amoss
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Brian Squadroni
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Michelle Hartnett
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Mee Ra Heo
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Tara White
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - S. Alex May
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Evelyn Boots
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Kenneth Roberts
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | | | - Alex Wolicki
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Anthony Kreamer
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., Kenilworth, NJ, USA
| | | | | | - Victor N. Uebele
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Meir Glick
- Modeling and Informatics, Merck & Co., Inc., Boston, MA, USA
| | - Andrew Rusinko
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | | |
Collapse
|
43
|
Kutchukian PS, Warren L, Magliaro BC, Amoss A, Cassaday JA, O’Donnell G, Squadroni B, Zuck P, Pascarella D, Culberson JC, Cooke AJ, Hurzy D, Schlegel KAS, Thomson F, Johnson EN, Uebele VN, Hermes JD, Parmentier-Batteur S, Finley M. Iterative Focused Screening with Biological Fingerprints Identifies Selective Asc-1 Inhibitors Distinct from Traditional High Throughput Screening. ACS Chem Biol 2017; 12:519-527. [PMID: 28032990 DOI: 10.1021/acschembio.6b00913] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
N-methyl-d-aspartate receptors (NMDARs) mediate glutamatergic signaling that is critical to cognitive processes in the central nervous system, and NMDAR hypofunction is thought to contribute to cognitive impairment observed in both schizophrenia and Alzheimer's disease. One approach to enhance the function of NMDAR is to increase the concentration of an NMDAR coagonist, such as glycine or d-serine, in the synaptic cleft. Inhibition of alanine-serine-cysteine transporter-1 (Asc-1), the primary transporter of d-serine, is attractive because the transporter is localized to neurons in brain regions critical to cognitive function, including the hippocampus and cortical layers III and IV, and is colocalized with d-serine and NMDARs. To identify novel Asc-1 inhibitors, two different screening approaches were performed with whole-cell amino acid uptake in heterologous cells stably expressing human Asc-1: (1) a high-throughput screen (HTS) of 3 M compounds measuring 35S l-cysteine uptake into cells attached to scintillation proximity assay beads in a 1536 well format and (2) an iterative focused screen (IFS) of a 45 000 compound diversity set using a 3H d-serine uptake assay with a liquid scintillation plate reader in a 384 well format. Critically important for both screening approaches was the implementation of counter screens to remove nonspecific inhibitors of radioactive amino acid uptake. Furthermore, a 15 000 compound expansion step incorporating both on- and off-target data into chemical and biological fingerprint-based models for selection of additional hits enabled the identification of novel Asc-1-selective chemical matter from the IFS that was not identified in the full-collection HTS.
Collapse
Affiliation(s)
- Peter S. Kutchukian
- Modeling and Informatics, Merck & Co., Inc., MRL, Boston, Massachusetts, United States
| | - Lee Warren
- Neuroscience, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Brian C. Magliaro
- Pharmacology, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Adam Amoss
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Jason A. Cassaday
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Gregory O’Donnell
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Brian Squadroni
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Paul Zuck
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Danette Pascarella
- Pharmacology, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - J. Chris Culberson
- Modeling and Informatics, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Andrew J. Cooke
- Chemistry, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Danielle Hurzy
- Chemistry, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | | | - Fiona Thomson
- Neuroscience, Merck & Co., Inc., MRL, West Point, Pennsylvania, United States
| | - Eric N. Johnson
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Victor N. Uebele
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | - Jeffrey D. Hermes
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| | | | - Michael Finley
- Screening and Protein Sciences, Merck & Co., Inc., MRL, North Wales, Pennsylvania, United States
| |
Collapse
|
44
|
Merget B, Turk S, Eid S, Rippmann F, Fulle S. Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay. J Med Chem 2016; 60:474-485. [PMID: 27966949 DOI: 10.1021/acs.jmedchem.6b01611] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Kinome-wide screening would have the advantage of providing structure-activity relationships against hundreds of targets simultaneously. Here, we report the generation of ligand-based activity prediction models for over 280 kinases by employing Machine Learning methods on an extensive data set of proprietary bioactivity data combined with open data. High quality (AUC > 0.7) was achieved for ∼200 kinases by (1) combining open with proprietary data, (2) choosing Random Forest over alternative tested Machine Learning methods, and (3) balancing the training data sets. Tests on left-out and external data indicate a high value for virtual screening projects. Importantly, the derived models are evenly distributed across the kinome tree, allowing reliable profiling prediction for all kinase branches. The prediction quality was further improved by employing experimental bioactivity fingerprints of a small kinase subset. Overall, the generated models can support various hit identification tasks, including virtual screening, compound repurposing, and the detection of potential off-targets.
Collapse
Affiliation(s)
- Benjamin Merget
- BioMed X Innovation Center , Im Neuenheimer Feld 515, 69120 Heidelberg, Germany
| | - Samo Turk
- BioMed X Innovation Center , Im Neuenheimer Feld 515, 69120 Heidelberg, Germany
| | - Sameh Eid
- BioMed X Innovation Center , Im Neuenheimer Feld 515, 69120 Heidelberg, Germany
| | - Friedrich Rippmann
- Global Computational Chemistry, Merck KGaA , Frankfurter Strasse 250, 64293 Darmstadt, Germany
| | - Simone Fulle
- BioMed X Innovation Center , Im Neuenheimer Feld 515, 69120 Heidelberg, Germany
| |
Collapse
|
45
|
Cortes Cabrera A, Lucena-Agell D, Redondo-Horcajo M, Barasoain I, Díaz JF, Fasching B, Petrone PM. Aggregated Compound Biological Signatures Facilitate Phenotypic Drug Discovery and Target Elucidation. ACS Chem Biol 2016; 11:3024-3034. [PMID: 27564241 DOI: 10.1021/acschembio.6b00358] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Predicting the cellular response of compounds is a challenge central to the discovery of new drugs. Compound biological signatures have risen as a way of representing the perturbation produced by a compound in the cell. However, their ability to encode specific phenotypic information and generating tangible predictions remains unknown, mainly because of the inherent noise in such data sets. In this work, we statistically aggregate signals from several compound biological signatures to find compounds that produce a desired phenotype in the cell. We exploit this method in two applications relevant for phenotypic screening in drug discovery programs: target-independent hit expansion and target identification. As a result, we present here (i) novel nanomolar inhibitors of cellular division that reproduce the phenotype and the mode of action of reference natural products and (ii) blockers of the NKCC1 cotransporter for autism spectrum disorders. Our results were confirmed in both cellular and biochemical assays of the respective projects. In addition, these examples provided novel insights on the information content and biological significance of compound biological signatures from HTS, and their applicability to drug discovery in general. For target identification, we show that novel targets can be predicted successfully for drugs by reporting new activities for nimedipine, fluspirilene, and pimozide and providing a rationale for repurposing and side effects. Our results highlight the opportunities of reusing public bioactivity data for prospective drug discovery, including scenarios where the effective target or mode of action of a particular molecule is not known, such as in phenotypic screening campaigns.
Collapse
Affiliation(s)
- Alvaro Cortes Cabrera
- Pharma Research & Early Development Informatics (pREDi), Roche Innovation Center Basel, Basel, Switzerland
| | - Daniel Lucena-Agell
- Laboratory
of Microtubule Stabilizing Agents, Department of Physical and Chemical
Biology, Centro de Investigaciones Biológicas, CIB, CSIC, Madrid, Spain
| | - Mariano Redondo-Horcajo
- Laboratory
of Microtubule Stabilizing Agents, Department of Physical and Chemical
Biology, Centro de Investigaciones Biológicas, CIB, CSIC, Madrid, Spain
| | - Isabel Barasoain
- Laboratory
of Microtubule Stabilizing Agents, Department of Physical and Chemical
Biology, Centro de Investigaciones Biológicas, CIB, CSIC, Madrid, Spain
| | - José Fernando Díaz
- Laboratory
of Microtubule Stabilizing Agents, Department of Physical and Chemical
Biology, Centro de Investigaciones Biológicas, CIB, CSIC, Madrid, Spain
| | - Bernhard Fasching
- Medicinal Chemistry, Pharma Research & Early Development (pRED), Roche Innovation Center Basel, Basel, Switzerland
| | - Paula M. Petrone
- Pharma Research & Early Development Informatics (pREDi), Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
46
|
Gütlein M, Kramer S. Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J Cheminform 2016; 8:60. [PMID: 27853484 PMCID: PMC5088672 DOI: 10.1186/s13321-016-0173-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. RESULTS We show that it can be preferable to build (Q)SAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive (Q)SAR models. We demonstrate the suitability of filtered fingerprints for (Q)SAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound (see http://coffer.informatik.uni-mainz.de). CONCLUSIONS Circular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating (Q)SAR models for virtual screening than the area under ROC or other measures for early recognition. GRAPHICAL ABSTRACT
Collapse
Affiliation(s)
- Martin Gütlein
- Chair of Data Mining, Institute of Computer Science, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Stefan Kramer
- Chair of Data Mining, Institute of Computer Science, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| |
Collapse
|
47
|
O'Hagan S, Kell DB. MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites. Front Pharmacol 2016; 7:266. [PMID: 27597830 PMCID: PMC4992690 DOI: 10.3389/fphar.2016.00266] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 08/08/2016] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Previous studies compared the molecular similarity of marketed drugs and endogenous human metabolites (endogenites), using a series of fingerprint-type encodings, variously ranked and clustered using the Tanimoto (Jaccard) similarity coefficient (TS). Because this gives equal weight to all parts of the encoding (thence to different substructures in the molecule) it may not be optimal, since in many cases not all parts of the molecule will bind to their macromolecular targets. Unsupervised methods cannot alone uncover this. We here explore the kinds of differences that may be observed when the TS is replaced-in a manner more equivalent to semi-supervised learning-by variants of the asymmetric Tversky (TV) similarity, that includes α and β parameters. RESULTS Dramatic differences are observed in (i) the drug-endogenite similarity heatmaps, (ii) the cumulative "greatest similarity" curves, and (iii) the fraction of drugs with a Tversky similarity to a metabolite exceeding a given value when the Tversky α and β parameters are varied from their Tanimoto values. The same is true when the sum of the α and β parameters is varied. A clear trend toward increased endogenite-likeness of marketed drugs is observed when α or β adopt values nearer the extremes of their range, and when their sum is smaller. The kinds of molecules exhibiting the greatest similarity to two interrogating drug molecules (chlorpromazine and clozapine) also vary in both nature and the values of their similarity as α and β are varied. The same is true for the converse, when drugs are interrogated with an endogenite. The fraction of drugs with a Tversky similarity to a molecule in a library exceeding a given value depends on the contents of that library, and α and β may be "tuned" accordingly, in a semi-supervised manner. At some values of α and β drug discovery library candidates or natural products can "look" much more like (i.e., have a numerical similarity much closer to) drugs than do even endogenites. CONCLUSIONS Overall, the Tversky similarity metrics provide a more useful range of examples of molecular similarity than does the simpler Tanimoto similarity, and help to draw attention to molecular similarities that would not be recognized if Tanimoto alone were used. Hence, the Tversky similarity metrics are likely to be of significant value in many general problems in cheminformatics.
Collapse
Affiliation(s)
- Steve O'Hagan
- School of Chemistry, The University of ManchesterManchester, UK
- The Manchester Institute of Biotechnology, The University of ManchesterManchester, UK
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, The University of ManchesterManchester, UK
| | - Douglas B. Kell
- School of Chemistry, The University of ManchesterManchester, UK
- The Manchester Institute of Biotechnology, The University of ManchesterManchester, UK
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, The University of ManchesterManchester, UK
| |
Collapse
|
48
|
Paricharak S, IJzerman AP, Jenkins JL, Bender A, Nigsch F. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening. J Chem Inf Model 2016; 56:1622-30. [PMID: 27487177 DOI: 10.1021/acs.jcim.6b00244] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values <10(-15). A per-assay assessment showed that the BEDROC(α = 100), which is of particular relevance for early retrieval of actives, improved for 38 out of 46 assays, increasing the success rate of smaller follow-up screens. Overall, we showed that an informer set derived from historical HTS activity data can be employed for routine small-scale exploratory screening in an assay-agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.
Collapse
Affiliation(s)
- Shardul Paricharak
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Lensfield Road, CB2 1EW, Cambridge, United Kingdom.,Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University , P.O. Box 9502, 2300 RA Leiden, The Netherlands.,Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Novartis Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University , P.O. Box 9502, 2300 RA Leiden, The Netherlands
| | - Jeremy L Jenkins
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Cambridge, Massachusetts 02139, United States
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge , Lensfield Road, CB2 1EW, Cambridge, United Kingdom
| | - Florian Nigsch
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research , Novartis Pharma AG, Novartis Campus, 4056 Basel, Switzerland
| |
Collapse
|
49
|
Raevsky OA, Polianczyk DE, Mukhametov A, Grigorev VY. Assessment of the classification abilities of the CNS multi-parametric optimization approach by the method of logistic regression. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:629-635. [PMID: 27477321 DOI: 10.1080/1062936x.2016.1212922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 07/11/2016] [Indexed: 06/06/2023]
Abstract
Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs.
Collapse
Affiliation(s)
- O A Raevsky
- a Department of Computer-aided Molecular Design , Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka , Russian Federation
| | - D E Polianczyk
- a Department of Computer-aided Molecular Design , Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka , Russian Federation
| | - A Mukhametov
- a Department of Computer-aided Molecular Design , Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka , Russian Federation
| | - V Y Grigorev
- a Department of Computer-aided Molecular Design , Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka , Russian Federation
| |
Collapse
|
50
|
Wang Y, Cornett A, King FJ, Mao Y, Nigsch F, Paris CG, McAllister G, Jenkins JL. Evidence-Based and Quantitative Prioritization of Tool Compounds in Phenotypic Drug Discovery. Cell Chem Biol 2016; 23:862-874. [PMID: 27427232 DOI: 10.1016/j.chembiol.2016.05.016] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 04/29/2016] [Accepted: 05/13/2016] [Indexed: 01/07/2023]
Abstract
The use of potent and selective chemical tools with well-defined targets can help elucidate biological processes driving phenotypes in phenotypic screens. However, identification of selective compounds en masse to create targeted screening sets is non-trivial. A systematic approach is needed to prioritize probes, which prevents the repeated use of published but unselective compounds. Here we performed a meta-analysis of integrated large-scale, heterogeneous bioactivity data to create an evidence-based, quantitative metric to systematically rank tool compounds for targets. Our tool score (TS) was then tested on hundreds of compounds by assessing their activity profiles in a panel of 41 cell-based pathway assays. We demonstrate that high-TS tools show more reliably selective phenotypic profiles than lower-TS compounds. Additionally we highlight frequently tested compounds that are non-selective tools and distinguish target family polypharmacology from cross-family promiscuity. TS can therefore be used to prioritize compounds from heterogeneous databases for phenotypic screening.
Collapse
Affiliation(s)
- Yuan Wang
- Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | - Allen Cornett
- Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Fred J King
- Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA
| | - Yi Mao
- Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Florian Nigsch
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, Basel 4056, Switzerland
| | - C Gregory Paris
- Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Gregory McAllister
- Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Jeremy L Jenkins
- Novartis Institutes for BioMedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| |
Collapse
|