1
|
Liu R, Clayton J, Shen M, Bhatnagar S, Shen J. Machine Learning Models to Interrogate Proteome-Wide Covalent Ligandabilities Directed at Cysteines. JACS AU 2024; 4:1374-1384. [PMID: 38665640 PMCID: PMC11040703 DOI: 10.1021/jacsau.3c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 04/28/2024]
Abstract
Machine learning (ML) identification of covalently ligandable sites may accelerate targeted covalent inhibitor design and help expand the druggable proteome space. Here, we report the rigorous development and validation of the tree-based models and convolutional neural networks (CNNs) trained on a newly curated database (LigCys3D) of over 1000 liganded cysteines in nearly 800 proteins represented by over 10,000 three-dimensional structures in the protein data bank. The unseen tests yielded 94 and 93% area under the receiver operating characteristic curves for the tree models and CNNs, respectively. Based on the AlphaFold2 predicted structures, the ML models recapitulated the newly liganded cysteines in the PDB with over 90% recall values. To assist the community of covalent drug discoveries, we report the predicted ligandable cysteines in 392 human kinases and their locations in the sequence-aligned kinase structure, including the PH and SH2 domains. Furthermore, we disseminate a searchable online database LigCys3D (https://ligcys.computchem.org/) and a web prediction server DeepCys (https://deepcys.computchem.org/), both of which will be continuously updated and improved by including newly published experimental data. The present work represents an important step toward the ML-led integration of big genome data and structure models to annotate the human proteome space for the next-generation covalent drug discoveries.
Collapse
Affiliation(s)
- Ruibin Liu
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Joseph Clayton
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
- Division
of Applied Regulatory Science, Office of Clinical Pharmacology, Center
for Drug Evaluation and Research, U.S. Food
and Drug Administration, Silver
Spring, Maryland 20993, United States
| | - Mingzhe Shen
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Shubham Bhatnagar
- Department
of Computer Science, University of Maryland
at College Park, College
Park, Maryland 20742, United States
| | - Jana Shen
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| |
Collapse
|
2
|
Chen B, Pan Z, Mou M, Zhou Y, Fu W. Is fragment-based graph a better graph-based molecular representation for drug design? A comparison study of graph-based models. Comput Biol Med 2024; 169:107811. [PMID: 38168647 DOI: 10.1016/j.compbiomed.2023.107811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/23/2023] [Accepted: 12/03/2023] [Indexed: 01/05/2024]
Abstract
Graph Neural Networks (GNNs) have gained significant traction in various sectors of AI-driven drug design. Over recent years, the integration of fragmentation concepts into GNNs has emerged as a potent strategy to augment the efficacy of molecular generative models. Nonetheless, challenges such as symmetry breaking and potential misrepresentation of intricate cycles and undefined functional groups raise questions about the superiority of fragment-based graph representation over traditional methods. In our research, we undertook a rigorous evaluation, contrasting the predictive prowess of eight models-developed using deep learning algorithms-across 12 benchmark datasets that span a range of properties. These models encompass established methods like GCN, AttentiveFP, and D-MPNN, as well as innovative fragment-based representation techniques. Our results indicate that fragment-based methodologies, notably PharmHGT, significantly improve model performance and interpretability, particularly in scenarios characterized by limited data availability. However, in situations with extensive training, fragment-based molecular graph representations may not necessarily eclipse traditional methods. In summation, we posit that the integration of fragmentation, as an avant-garde technique in drug design, harbors considerable promise for the future of AI-enhanced drug design.
Collapse
Affiliation(s)
- Baiyu Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Wei Fu
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 202103, China.
| |
Collapse
|
3
|
Liu R, Clayton J, Shen M, Bhatnagar S, Shen J. Machine Learning Models to Interrogate Proteomewide Covalent Ligandabilities Directed at Cysteines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.17.553742. [PMID: 37662346 PMCID: PMC10473668 DOI: 10.1101/2023.08.17.553742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Machine learning (ML) identification of covalently ligandable sites may accelerate targeted covalent inhibitor design and help expand the druggable proteome space. Here we report the rigorous development and validation of the tree-based models and convolutional neural networks (CNNs) trained on a newly curated database (LigCys3D) of over 1,000 liganded cysteines in nearly 800 proteins represented by over 10,000 three-dimensional structures in the protein data bank. The unseen tests yielded 94% and 93% AUCs (area under the receiver operating characteristic curve) for the tree models and CNNs, respectively. Based on the AlphaFold2 predicted structures, the ML models recapitulated the newly liganded cysteines in the PDB with over 90% recall values. To assist the community of covalent drug discoveries, we report the predicted ligandable cysteines in 392 human kinases and their locations in the sequence-aligned kinase structure including the PH and SH2 domains. Furthermore, we disseminate a searchable online database LigCys3D (https://ligcys.computchem.org/) and a web prediction server DeepCys (https://deepcys.computchem.org/), both of which will be continuously updated and improved by including newly published experimental data. The present work represents a first step towards the ML-led integration of big genome data and structure models to annotate the human proteome space for the next-generation covalent drug discoveries.
Collapse
Affiliation(s)
- Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| | - Joseph Clayton
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Mingzhe Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| | - Shubham Bhatnagar
- Department of Computer Science, University of Maryland at College Park, College Park, MD 20742, USA
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| |
Collapse
|
4
|
Du H, Jiang D, Zhang O, Wu Z, Gao J, Zhang X, Wang X, Deng Y, Kang Y, Li D, Pan P, Hsieh CY, Hou T. A flexible data-free framework for structure-based de novo drug design with reinforcement learning. Chem Sci 2023; 14:12166-12181. [PMID: 37969589 PMCID: PMC10631243 DOI: 10.1039/d3sc04091g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/11/2023] [Indexed: 11/17/2023] Open
Abstract
Contemporary structure-based molecular generative methods have demonstrated their potential to model the geometric and energetic complementarity between ligands and receptors, thereby facilitating the design of molecules with favorable binding affinity and target specificity. Despite the introduction of deep generative models for molecular generation, the atom-wise generation paradigm that partially contradicts chemical intuition limits the validity and synthetic accessibility of the generated molecules. Additionally, the dependence of deep learning models on large-scale structural data has hindered their adaptability across different targets. To overcome these challenges, we present a novel search-based framework, 3D-MCTS, for structure-based de novo drug design. Distinct from prevailing atom-centric methods, 3D-MCTS employs a fragment-based molecular editing strategy. The fragments decomposed from small-molecule drugs are recombined under predefined retrosynthetic rules, offering improved drug-likeness and synthesizability, overcoming the inherent limitations of atom-based approaches. Leveraging multi-threaded parallel simulations combined with a real-time energy constraint-based pruning strategy, 3D-MCTS achieves remarkable efficiency. At a fixed computational cost, it outperforms other state-of-the-art (SOTA) methods by producing molecules with enhanced binding affinity. Furthermore, its fragment-based approach ensures the generation of more dependable binding conformations, exhibiting a success rate 43.6% higher than that of other SOTAs. This advantage becomes even more pronounced when handling targets that significantly deviate from the training dataset. 3D-MCTS is capable of achieving thirty times more hits with high binding affinity than traditional virtual screening methods, which demonstrates the superior ability of 3D-MCTS to explore chemical space. Moreover, the flexibility of our framework makes it easy to incorporate domain knowledge during the process, thereby enabling the generation of molecules with desirable pharmacophores and enhanced binding affinity. The adaptability of 3D-MCTS is further showcased in metalloprotein applications, highlighting its potential across various drug design scenarios.
Collapse
Affiliation(s)
- Hongyan Du
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Junbo Gao
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology Macao 999078 China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
5
|
White MEH, Gil J, Tate EW. Proteome-wide structural analysis identifies warhead- and coverage-specific biases in cysteine-focused chemoproteomics. Cell Chem Biol 2023; 30:828-838.e4. [PMID: 37451266 DOI: 10.1016/j.chembiol.2023.06.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 03/20/2023] [Accepted: 06/23/2023] [Indexed: 07/18/2023]
Abstract
Covalent drug discovery has undergone a resurgence over the past two decades and reactive cysteine profiling has emerged in parallel as a platform for ligand discovery through on- and off-target profiling; however, the scope of this approach has not been fully explored at the whole-proteome level. We combined AlphaFold2-predicted side-chain accessibilities for >95% of the human proteome with a meta-analysis of eighteen public cysteine profiling datasets, totaling 44,187 unique cysteine residues, revealing accessibility biases in sampled cysteines primarily dictated by warhead chemistry. Analysis of >3.5 million cysteine-fragment interactions further showed that hit elaboration and optimization drives increased bias against buried cysteine residues. Based on these data, we suggest that current profiling approaches cover a small proportion of potential ligandable cysteine residues and propose future directions for increasing coverage, focusing on high-priority residues and depth. All analysis and produced resources are freely available and extendable to other reactive amino acids.
Collapse
Affiliation(s)
- Matthew E H White
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, London W12 0BZ, UK; MRC London Institute of Medical Sciences (LMS), London W12 0NN, UK
| | - Jesús Gil
- MRC London Institute of Medical Sciences (LMS), London W12 0NN, UK; Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London W12 0NN, UK
| | - Edward W Tate
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, London W12 0BZ, UK; The Francis Crick Institute, London NW1 1AT, UK.
| |
Collapse
|
6
|
Kołat D, Zhao LY, Kciuk M, Płuciennik E, Kałuzińska-Kołat Ż. AP-2δ Is the Most Relevant Target of AP-2 Family-Focused Cancer Therapy and Affects Genome Organization. Cells 2022; 11:cells11244124. [PMID: 36552887 PMCID: PMC9776946 DOI: 10.3390/cells11244124] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/26/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022] Open
Abstract
Formerly hailed as "undruggable" proteins, transcription factors (TFs) are now under investigation for targeted therapy. In cancer, this may alter, inter alia, immune evasion or replicative immortality, which are implicated in genome organization, a process that accompanies multi-step tumorigenesis and which frequently develops in a non-random manner. Still, targeting-related research on some TFs is scarce, e.g., among AP-2 proteins, which are known for their altered functionality in cancer and prognostic importance. Using public repositories, bioinformatics tools, and RNA-seq data, the present study examined the ligandability of all AP-2 members, selecting the best one, which was investigated in terms of mutations, targets, co-activators, correlated genes, and impact on genome organization. AP-2 proteins were found to have the conserved "TF_AP-2" domain, but manifested different binding characteristics and evolution. Among them, AP-2δ has not only the highest number of post-translational modifications and extended strands but also contains a specific histidine-rich region and cleft that can receive a ligand. Uterine, colon, lung, and stomach tumors are most susceptible to AP-2δ mutations, which also co-depend with cancer hallmark genes and drug targets. Considering AP-2δ targets, some of them were located proximally in the spatial genome or served as co-factors of the genes regulated by AP-2δ. Correlation and functional analyses suggested that AP-2δ affects various processes, including genome organization, via its targets; this has been eventually verified in lung adenocarcinoma using expression and immunohistochemistry data of chromosomal conformation-related genes. In conclusion, AP-2δ affects chromosomal conformation and is the most appropriate target for cancer therapy focused on the AP-2 family.
Collapse
Affiliation(s)
- Damian Kołat
- Department of Experimental Surgery, Medical University of Lodz, 90-136 Lodz, Poland
- Correspondence:
| | - Lin-Yong Zhao
- Gastric Cancer Center and Laboratory of Gastric Cancer, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, and Collaborative Innovation Centre for Biotherapy, Chengdu 610041, China
| | - Mateusz Kciuk
- Department of Molecular Biotechnology and Genetics, University of Lodz, 90-237 Lodz, Poland
- Doctoral School of Exact and Natural Sciences, University of Lodz, 90-237 Lodz, Poland
| | - Elżbieta Płuciennik
- Department of Functional Genomics, Medical University of Lodz, 90-752 Lodz, Poland
| | | |
Collapse
|