1
|
Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024; 64:6938-6956. [PMID: 39237105 PMCID: PMC11423346 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.
Collapse
Affiliation(s)
- Gwenn Guichaoua
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Philippe Pinel
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
- Iktos SAS, 75017 Paris, France
| | | | - Chloé-Agathe Azencott
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Véronique Stoven
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| |
Collapse
|
2
|
Oprea TI, Bologa C, Holmes J, Mathias S, Metzger VT, Waller A, Yang JJ, Leach AR, Jensen LJ, Kelleher KJ, Sheils TK, Mathé E, Avram S, Edwards JS. Overview of the Knowledge Management Center for Illuminating the Druggable Genome. Drug Discov Today 2024; 29:103882. [PMID: 38218214 PMCID: PMC10939799 DOI: 10.1016/j.drudis.2024.103882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/22/2023] [Accepted: 01/09/2024] [Indexed: 01/15/2024]
Abstract
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.
Collapse
Affiliation(s)
- Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Cristian Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Stephen Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Anna Waller
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Keith J Kelleher
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Ewy Mathé
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Sorin Avram
- Coriolan Dragulescu Institute of Chemistry, Timisoara, Romania
| | - Jeremy S Edwards
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA; Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM, USA.
| |
Collapse
|
3
|
Harding SD, Armstrong JF, Faccenda E, Southan C, Alexander SH, Davenport AP, Spedding M, Davies JA. The IUPHAR/BPS Guide to PHARMACOLOGY in 2024. Nucleic Acids Res 2024; 52:D1438-D1449. [PMID: 37897341 PMCID: PMC10767925 DOI: 10.1093/nar/gkad944] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/09/2023] [Accepted: 10/18/2023] [Indexed: 10/30/2023] Open
Abstract
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb; https://www.guidetopharmacology.org) is an open-access, expert-curated, online database that provides succinct overviews and key references for pharmacological targets and their recommended experimental ligands. It includes over 3039 protein targets and 12 163 ligand molecules, including approved drugs, small molecules, peptides and antibodies. Here, we report recent developments to the resource and describe expansion in content over the six database releases made during the last two years. The database update section of this paper focuses on two areas relating to important global health challenges. The first, SARS-CoV-2 COVID-19, remains a major concern and we describe our efforts to expand the database to include a new family of coronavirus proteins. The second area is antimicrobial resistance, for which we have extended our coverage of antibacterials in partnership with AntibioticDB, a collaboration that has continued through support from GARDP. We discuss other areas of curation and also focus on our external links to resources such as PubChem that bring important synergies to the resources.
Collapse
Affiliation(s)
- Simon D Harding
- Centre for Discovery Brain Science, Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Jane F Armstrong
- Centre for Discovery Brain Science, Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Elena Faccenda
- Centre for Discovery Brain Science, Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Christopher Southan
- Centre for Discovery Brain Science, Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Stephen P H Alexander
- School of Life Sciences, University of Nottingham Medical School, Nottingham NG7 2UH, UK
| | - Anthony P Davenport
- Experimental Medicine and Immunotherapeutics, University of Cambridge, Cambridge CB2 0QQ, UK
| | | | - Jamie A Davies
- Centre for Discovery Brain Science, Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| |
Collapse
|
4
|
Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning. Molecules 2023; 28:5982. [PMID: 37630234 PMCID: PMC10459669 DOI: 10.3390/molecules28165982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Chemistry, New York University, New York, NY 10027, USA
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
| | - Hongyu Kang
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co., Ltd., Beijing 100080, China;
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Avram S, Wilson TB, Curpan R, Halip L, Borota A, Bora A, Bologa C, Holmes J, Knockel J, Yang J, Oprea T. DrugCentral 2023 extends human clinical data and integrates veterinary drugs. Nucleic Acids Res 2022; 51:D1276-D1287. [PMID: 36484092 PMCID: PMC9825566 DOI: 10.1093/nar/gkac1085] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/20/2022] [Accepted: 12/02/2022] [Indexed: 12/14/2022] Open
Abstract
DrugCentral monitors new drug approvals and standardizes drug information. The current update contains 285 drugs (131 for human use). New additions include: (i) the integration of veterinary drugs (154 for animal use only), (ii) the addition of 66 documented off-label uses and iii) the identification of adverse drug events from pharmacovigilance data for pediatric and geriatric patients. Additional enhancements include chemical substructure searching using SMILES and 'Target Cards' based on UniProt accession codes. Statistics of interests include the following: (i) 60% of the covered drugs are on-market drugs with expired patent and exclusivity coverage, 17% are off-market, and 23% are on-market drugs with active patents and exclusivity coverage; (ii) 59% of the drugs are oral, 33% are parenteral and 18% topical, at the level of the active ingredients; (iii) only 3% of all drugs are for animal use only; however, 61% of the veterinary drugs are also approved for human use; (iv) dogs, cats and horses are by far the most represented target species for veterinary drugs; (v) the physicochemical property profile of animal drugs is very similar to that of human drugs. Use cases include azaperone, the only sedative approved for swine, and ruxolitinib, a Janus kinase inhibitor.
Collapse
Affiliation(s)
| | | | - Ramona Curpan
- Department of Computational Chemistry, “Coriolan Dragulescu” Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş 300223, Romania
| | - Liliana Halip
- Department of Computational Chemistry, “Coriolan Dragulescu” Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş 300223, Romania
| | - Ana Borota
- Department of Computational Chemistry, “Coriolan Dragulescu” Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş 300223, Romania
| | - Alina Bora
- Department of Computational Chemistry, “Coriolan Dragulescu” Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş 300223, Romania
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, 700 Camino de Salud NE, Albuquerque, NM 87106, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, 700 Camino de Salud NE, Albuquerque, NM 87106, USA
| | - Jeffrey Knockel
- Department of Computer Science, University of New Mexico, 1901 Redondo S Dr, Albuquerque, NM 87106, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, 700 Camino de Salud NE, Albuquerque, NM 87106, USA
| | - Tudor I Oprea
- To whom correspondence should be addressed. Tel: +1 505 925 7529; Fax: +1 505 925 7625;
| |
Collapse
|
6
|
A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics. Molecules 2022; 27:molecules27082513. [PMID: 35458710 PMCID: PMC9028877 DOI: 10.3390/molecules27082513] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/31/2022] [Accepted: 04/10/2022] [Indexed: 02/01/2023] Open
Abstract
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.
Collapse
|
7
|
Harding SD, Armstrong J, Faccenda E, Southan C, Alexander SPH, Davenport AP, Pawson A, Spedding M, Davies J. The IUPHAR/BPS guide to PHARMACOLOGY in 2022: curating pharmacology for COVID-19, malaria and antibacterials. Nucleic Acids Res 2022; 50:D1282-D1294. [PMID: 34718737 PMCID: PMC8689838 DOI: 10.1093/nar/gkab1010] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/11/2021] [Accepted: 10/12/2021] [Indexed: 11/12/2022] Open
Abstract
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb; www.guidetopharmacology.org) is an open-access, expert-curated database of molecular interactions between ligands and their targets. We describe expansion in content over nine database releases made during the last two years, which has focussed on three main areas of infection. The COVID-19 pandemic continues to have a major impact on health worldwide. GtoPdb has sought to support the wider research community to understand the pharmacology of emerging drug targets for SARS-CoV-2 as well as potential targets in the host to block viral entry and reduce the adverse effects of infection in patients with COVID-19. We describe how the database rapidly evolved to include a new family of Coronavirus proteins. Malaria remains a global threat to half the population of the world. Our database content continues to be enhanced through our collaboration with Medicines for Malaria Venture (MMV) on the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY (www.guidetomalariapharmacology.org). Antibiotic resistance is also a growing threat to global health. In response, we have extended our coverage of antibacterials in partnership with AntibioticDB.
Collapse
Affiliation(s)
- Simon D Harding
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Jane F Armstrong
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Elena Faccenda
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | - Stephen P H Alexander
- School of Life Sciences, University of Nottingham Medical School, Nottingham NG7 2UH, UK
| | - Anthony P Davenport
- Experimental Medicine and Immunotherapeutics, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Adam J Pawson
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | | | - Jamie A Davies
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK
| | | |
Collapse
|