1
|
Chen S, Zhong F. GPCRSPACE: A New GPCR Real Expanded Library Based on Large Language Models Architecture and Positive Sample Machine Learning Strategies. J Med Chem 2024; 67:16912-16922. [PMID: 39288965 DOI: 10.1021/acs.jmedchem.4c01983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
The quest for novel therapeutics targeting G protein-coupled receptors (GPCRs), essential in numerous physiological processes, is crucial in drug discovery. Despite the abundance of GPCR-targeting drugs, many receptors lack selective modulators, indicating a significant untapped therapeutic potential. To bridge this gap, we introduce GPCRSPACE, a novel GPCR-focused purchasable real chemical library developed using the G protein-coupled receptors large language models (GPCR LLM) architecture. Different from traditional machine learning models, GPCR LLM uses a positive sample machine learning strategy for training and does not need to construct any negative samples. This not only reduces false negatives but also reduces the time to label negative samples. GPCR LLM accelerates the identification and screening of potential GPCR-interactive compounds by learning the chemical space of GPCR-targeting molecules. GPCRSPACE, built on GPCR LLM, outperforms existing chemical data sets in synthesizability, structural diversity, and GPCR-likeness, making it a valuable tool for GPCR drug discovery.
Collapse
Affiliation(s)
- Shiming Chen
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| | - Feisheng Zhong
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350122, China
| |
Collapse
|
2
|
Yang Y, Qiu Y, Hu J, Rosen-Zvi M, Guan Q, Cheng F. A deep learning framework combining molecular image and protein structural representations identifies candidate drugs for pain. CELL REPORTS METHODS 2024:100865. [PMID: 39341201 DOI: 10.1016/j.crmeth.2024.100865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 07/11/2024] [Accepted: 09/03/2024] [Indexed: 09/30/2024]
Abstract
Artificial intelligence (AI) and deep learning technologies hold promise for identifying effective drugs for human diseases, including pain. Here, we present an interpretable deep-learning-based ligand image- and receptor's three-dimensional (3D)-structure-aware framework to predict compound-protein interactions (LISA-CPI). LISA-CPI integrates an unsupervised deep-learning-based molecular image representation (ImageMol) of ligands and an advanced AlphaFold2-based algorithm (Evoformer). We demonstrated that LISA-CPI achieved ∼20% improvement in the average mean absolute error (MAE) compared to state-of-the-art models on experimental CPIs connecting 104,969 ligands and 33 G-protein-coupled receptors (GPCRs). Using LISA-CPI, we prioritized potential repurposable drugs (e.g., methylergometrine) and identified candidate gut-microbiota-derived metabolites (e.g., citicoline) for potential treatment of pain via specifically targeting human GPCRs. In summary, we presented that the integration of molecular image and protein 3D structural representations using a deep learning framework offers a powerful computational drug discovery tool for treating pain and other complex diseases if broadly applied.
Collapse
Affiliation(s)
- Yuxin Yang
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Computer Science, Kent State University, Kent, OH 44242, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Jianying Hu
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Michal Rosen-Zvi
- AI for Accelerated Healthcare and Life Sciences Discovery, IBM Research-Israel, Haifa 3498825, Israel
| | - Qiang Guan
- Department of Computer Science, Kent State University, Kent, OH 44242, USA.
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA.
| |
Collapse
|
3
|
Jang H, Seo S, Park S, Kim BJ, Choi GW, Choi J, Park C. De novo drug design through gradient-based regularized search in information-theoretically controlled latent space. J Comput Aided Mol Des 2024; 38:32. [PMID: 39190191 DOI: 10.1007/s10822-024-00571-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/31/2024] [Indexed: 08/28/2024]
Abstract
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
Collapse
Affiliation(s)
- Hyosoon Jang
- Graduate School of AI, POSTECH, 77 Cheongam-Ro, Pohang, 37673, Gyeongbuk, Republic of Korea
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Geon-Woo Choi
- Department of Medical Bigdata Convergence, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea
| | - Jonghwan Choi
- College of Information Science, Hallym University, 1 Hallymdaehak-gil, Chuncheon, 24252, Gangwon-do, Republic of Korea.
| | - Chihyun Park
- Department of Medical Bigdata Convergence, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea.
- Department of Compupter Science and Engineering, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea.
| |
Collapse
|
4
|
Oh M, Shen M, Liu R, Stavitskaya L, Shen J. Machine Learned Classification of Ligand Intrinsic Activities at Human μ-Opioid Receptor. ACS Chem Neurosci 2024; 15:2842-2852. [PMID: 38990780 DOI: 10.1021/acschemneuro.4c00212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024] Open
Abstract
Opioids are small-molecule agonists of μ-opioid receptor (μOR), while reversal agents such as naloxone are antagonists of μOR. Here, we developed machine learning (ML) models to classify the intrinsic activities of ligands at the human μOR based on the SMILES strings and two-dimensional molecular descriptors. We first manually curated a database of 983 small molecules with measured Emax values at the human μOR. Analysis of the chemical space allowed identification of dominant scaffolds and structurally similar agonists and antagonists. Decision tree models and directed message passing neural networks (MPNNs) were then trained to classify agonistic and antagonistic ligands. The hold-out test AUCs (areas under the receiver operator curves) of the extra-tree (ET) and MPNN models are 91.5 ± 3.9% and 91.8 ± 4.4%, respectively. To overcome the challenge of a small data set, a student-teacher learning method called tritraining with disagreement was tested using an unlabeled data set comprised of 15,816 ligands of human, mouse, and rat μOR, κOR, and δOR. We found that the tritraining scheme was able to increase the hold-out AUC of MPNN models to as high as 95.7%. Our work demonstrates the feasibility of developing ML models to accurately predict the intrinsic activities of μOR ligands, even with limited data. We envisage potential applications of these models in evaluating uncharacterized substances for public safety risks and discovering new therapeutic agents to counteract opioid overdoses.
Collapse
Affiliation(s)
- Myongin Oh
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, Maryland 20993, United States
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Maximilian Shen
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, United States
| | - Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Lidiya Stavitskaya
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| |
Collapse
|
5
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
6
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
7
|
Oh M, Shen M, Liu R, Stavitskaya L, Shen J. Machine Learned Classification of Ligand Intrinsic Activities at Human μ-Opioid Receptor. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588485. [PMID: 38645122 PMCID: PMC11030315 DOI: 10.1101/2024.04.07.588485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Opioids are small-molecule agonists of μ-opioid receptor (μOR), while reversal agents such as naloxone are antagonists of μOR. Here we developed machine learning (ML) models to classify the intrinsic activities of ligands at the human μOR based on the SMILE strings and two-dimensional molecular descriptors. We first manually curated a database of 983 small molecules with measured E max values at the human μOR. Analysis of the chemical space allowed identification of dominant scaffolds and structurally similar agonists and antagonists. Decision tree models and directed message passing neural networks (MPNNs) were then trained to classify agonistic and antagonistic ligands. The hold-out test AUCs (areas under the receiver operator curves) of the extra-tree (ET) and MPNN models are 91.5±3.9% and 91.8± 4.4%, respectively. To overcome the challenge of small dataset, a student-teacher learning method called tri-training with disagreement was tested using an unlabeled dataset comprised of 15,816 ligands of human, mouse, or rat μOR, κOR, or δOR. We found that the tri-training scheme was able to increase the hold-out AUC of MPNN to as high as 95.7%. Our work demonstrates the feasibility of developing ML models to accurately predict the intrinsic activities of μOR ligands, even with limited data. We envisage potential applications of these models in evaluating uncharacterized substances for public safety risks and discovering new therapeutic agents to counteract opioid overdoses.
Collapse
Affiliation(s)
- Myongin Oh
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, United States
| | - Maximilian Shen
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD
| | - Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, United States
| | - Lidiya Stavitskaya
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, United States
| |
Collapse
|
8
|
Yin Y, Hu H, Yang J, Ye C, Goh WWB, Kong AWK, Wu J. OLB-AC: toward optimizing ligand bioactivities through deep graph learning and activity cliffs. Bioinformatics 2024; 40:btae365. [PMID: 38889277 PMCID: PMC11208724 DOI: 10.1093/bioinformatics/btae365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 05/14/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Deep graph learning (DGL) has been widely employed in the realm of ligand-based virtual screening. Within this field, a key hurdle is the existence of activity cliffs (ACs), where minor chemical alterations can lead to significant changes in bioactivity. In response, several DGL models have been developed to enhance ligand bioactivity prediction in the presence of ACs. Yet, there remains a largely unexplored opportunity within ACs for optimizing ligand bioactivity, making it an area ripe for further investigation. RESULTS We present a novel approach to simultaneously predict and optimize ligand bioactivities through DGL and ACs (OLB-AC). OLB-AC possesses the capability to optimize ligand molecules located near ACs, providing a direct reference for optimizing ligand bioactivities with the matching of original ligands. To accomplish this, a novel attentive graph reconstruction neural network and ligand optimization scheme are proposed. Attentive graph reconstruction neural network reconstructs original ligands and optimizes them through adversarial representations derived from their bioactivity prediction process. Experimental results on nine drug targets reveal that out of the 667 molecules generated through OLB-AC optimization on datasets comprising 974 low-activity, noninhibitor, or highly toxic ligands, 49 are recognized as known highly active, inhibitor, or nontoxic ligands beyond the datasets' scope. The 27 out of 49 matched molecular pairs generated by OLB-AC reveal novel transformations not present in their training sets. The adversarial representations employed for ligand optimization originate from the gradients of bioactivity predictions. Therefore, we also assess OLB-AC's prediction accuracy across 33 different bioactivity datasets. Results show that OLB-AC achieves the best Pearson correlation coefficient (r2) on 27/33 datasets, with an average improvement of 7.2%-22.9% against the state-of-the-art bioactivity prediction methods. AVAILABILITY AND IMPLEMENTATION The code and dataset developed in this work are available at github.com/Yueming-Yin/OLB-AC.
Collapse
Affiliation(s)
- Yueming Yin
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
- College of Computing and Data Science, Nanyang Technological University, 639798, Singapore
| | - Haifeng Hu
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jitao Yang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Chun Ye
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 637551, Singapore
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 637551, Singapore
- Center for AI in Medicine, Nanyang Technological University, 639798, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, U.K
| | - Adams Wai-Kin Kong
- College of Computing and Data Science, Nanyang Technological University, 639798, Singapore
| | - Jiansheng Wu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
9
|
Fryer E, Guha S, Rogel-Hernandez LE, Logan-Garbisch T, Farah H, Rezaei E, Mollhoff IN, Nekimken AL, Xu A, Seyahi LS, Fechner S, Druckmann S, Clandinin TR, Rhee SY, Goodman MB. A high-throughput behavioral screening platform for measuring chemotaxis by C. elegans. PLoS Biol 2024; 22:e3002672. [PMID: 38935621 PMCID: PMC11210793 DOI: 10.1371/journal.pbio.3002672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 05/11/2024] [Indexed: 06/29/2024] Open
Abstract
Throughout history, humans have relied on plants as a source of medication, flavoring, and food. Plants synthesize large chemical libraries and release many of these compounds into the rhizosphere and atmosphere where they affect animal and microbe behavior. To survive, nematodes must have evolved the sensory capacity to distinguish plant-made small molecules (SMs) that are harmful and must be avoided from those that are beneficial and should be sought. This ability to classify chemical cues as a function of their value is fundamental to olfaction and represents a capacity shared by many animals, including humans. Here, we present an efficient platform based on multiwell plates, liquid handling instrumentation, inexpensive optical scanners, and bespoke software that can efficiently determine the valence (attraction or repulsion) of single SMs in the model nematode, Caenorhabditis elegans. Using this integrated hardware-wetware-software platform, we screened 90 plant SMs and identified 37 that attracted or repelled wild-type animals but had no effect on mutants defective in chemosensory transduction. Genetic dissection indicates that for at least 10 of these SMs, response valence emerges from the integration of opposing signals, arguing that olfactory valence is often determined by integrating chemosensory signals over multiple lines of information. This study establishes that C. elegans is an effective discovery engine for determining chemotaxis valence and for identifying natural products detected by the chemosensory nervous system.
Collapse
Affiliation(s)
- Emily Fryer
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
| | - Sujay Guha
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Lucero E. Rogel-Hernandez
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Theresa Logan-Garbisch
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
- Neurosciences Graduate Program, Stanford University, Stanford, California, United States of America
| | - Hodan Farah
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
| | - Ehsan Rezaei
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Iris N. Mollhoff
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Adam L. Nekimken
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
- Department of Mechanical Engineering, Stanford University, Stanford, California, United States of America
| | - Angela Xu
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
| | - Lara Selin Seyahi
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
| | - Sylvia Fechner
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Shaul Druckmann
- Department of Neurobiology, Stanford University, Stanford, California, United States of America
| | - Thomas R. Clandinin
- Department of Neurobiology, Stanford University, Stanford, California, United States of America
| | - Seung Y. Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
| | - Miriam B. Goodman
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, United States of America
| |
Collapse
|
10
|
Qiu Y, Hou Y, Gohel D, Zhou Y, Xu J, Bykova M, Yang Y, Leverenz JB, Pieper AA, Nussinov R, Caldwell JZK, Brown JM, Cheng F. Systematic characterization of multi-omics landscape between gut microbial metabolites and GPCRome in Alzheimer's disease. Cell Rep 2024; 43:114128. [PMID: 38652661 DOI: 10.1016/j.celrep.2024.114128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/06/2024] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open
Abstract
Shifts in the magnitude and nature of gut microbial metabolites have been implicated in Alzheimer's disease (AD), but the host receptors that sense and respond to these metabolites are largely unknown. Here, we develop a systems biology framework that integrates machine learning and multi-omics to identify molecular relationships of gut microbial metabolites with non-olfactory G-protein-coupled receptors (termed the "GPCRome"). We evaluate 1.09 million metabolite-protein pairs connecting 408 human GPCRs and 335 gut microbial metabolites. Using genetics-derived Mendelian randomization and integrative analyses of human brain transcriptomic and proteomic profiles, we identify orphan GPCRs (i.e., GPR84) as potential drug targets in AD and that triacanthine experimentally activates GPR84. We demonstrate that phenethylamine and agmatine significantly reduce tau hyperphosphorylation (p-tau181 and p-tau205) in AD patient induced pluripotent stem cell-derived neurons. This study demonstrates a systems biology framework to uncover the GPCR targets of human gut microbiota in AD and other complex diseases if broadly applied.
Collapse
Affiliation(s)
- Yunguang Qiu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yuan Hou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dhruv Gohel
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yadi Zhou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Jielin Xu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Marina Bykova
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yuxin Yang
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - James B Leverenz
- Lou Ruvo Center for Brain Health, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
| | - Andrew A Pieper
- Brain Health Medicines Center, Harrington Discovery Institute, University Hospitals Cleveland Medical Center, Cleveland, OH 44106, USA; Department of Psychiatry, Case Western Reserve University, Cleveland, OH 44106, USA; Geriatric Psychiatry, GRECC, Louis Stokes Cleveland VA Medical Center, Cleveland, OH 44106, USA; Institute for Transformative Molecular Medicine, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA; Department of Neurosciences, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA; Department of Pathology, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jessica Z K Caldwell
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Lou Ruvo Center for Brain Health, Neurological Institute, Cleveland Clinic, Las Vegas, NV 89106, USA
| | - J Mark Brown
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Department of Cancer Biology, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA; Center for Microbiome and Human Health, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Case Comprehensive Cancer Center, Case Western Reserve University, School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
11
|
Zhang S, Tian X, Chen C, Su Y, Huang W, Lv X, Chen C, Li H. AIGO-DTI: Predicting Drug-Target Interactions Based on Improved Drug Properties Combined with Adaptive Iterative Algorithms. J Chem Inf Model 2024; 64:4373-4384. [PMID: 38743013 DOI: 10.1021/acs.jcim.4c00584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Artificial intelligence-based methods for predicting drug-target interactions (DTIs) aim to explore reliable drug candidate targets rapidly and cost-effectively to accelerate the drug development process. However, current methods are often limited by the topological regularities of drug molecules, making them difficult to generalize to a broader chemical space. Additionally, the use of similarity to measure DTI network links often introduces noise, leading to false DTI relationships and affecting the prediction accuracy. To address these issues, this study proposes an Adaptive Iterative Graph Optimization (AIGO)-DTI prediction framework. This framework integrates atomic cluster information and enhances molecular features through the design of functional group prompts and graph encoders, optimizing the construction of DTI association networks. Furthermore, the optimization of graph structure is transformed into a node similarity learning problem, utilizing multihead similarity metric functions to iteratively update the network structure to improve the quality of DTI information. Experimental results demonstrate the outstanding performance of AIGO-DTI on multiple public data sets and label reversal data sets. Case studies, molecular docking, and existing research validate its effectiveness and reliability. Overall, the method proposed in this study can construct comprehensive and reliable DTI association network information, providing new graphing and optimization strategies for DTI prediction, which contribute to efficient drug development and reduce target discovery costs.
Collapse
Affiliation(s)
- Sizhe Zhang
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xuecong Tian
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Chen Chen
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Ying Su
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Wanhua Huang
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xiaoyi Lv
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Hongyi Li
- Xinjiang University, Urumqi, 830046 Xinjiang, China
| |
Collapse
|
12
|
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. Brief Bioinform 2024; 25:bbae281. [PMID: 38864340 PMCID: PMC11167311 DOI: 10.1093/bib/bbae281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/05/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein-ligand interaction models falter in GPCR-drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein-ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR-ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical-chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.
Collapse
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hongjie Fan
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
| | - Jixia Wang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Tao Hou
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Agharam Road 173, Selaiyur, Chennai, Tamil Nadu 600073, India
| | - Wei Xia
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hei Wun Kan
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - John Z H Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Xinmiao Liang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Yang Chen
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| |
Collapse
|
13
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
14
|
Toshchakov VY. Peptide-Based Inhibitors of the Induced Signaling Protein Interactions: Current State and Prospects. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:784-798. [PMID: 38880642 DOI: 10.1134/s000629792405002x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/29/2024] [Accepted: 03/12/2024] [Indexed: 06/18/2024]
Abstract
Formation of the transient protein complexes in response to activation of cellular receptors is a common mechanism by which cells respond to external stimuli. This article presents the concept of blocking interactions of signaling proteins by the peptide inhibitors, and describes the progress achieved to date in the development of signaling inhibitors that act by blocking the signal-dependent protein interactions.
Collapse
Affiliation(s)
- Vladimir Y Toshchakov
- Sirius University of Science and Technology, Sirius Federal Territory, Krasnodar Region, 354340, Russia.
| |
Collapse
|
15
|
Fryer E, Guha S, Rogel-Hernandez LE, Logan-Garbisch T, Farah H, Rezaei E, Mollhoff IN, Nekimken AL, Xu A, Selin Seyahi L, Fechner S, Druckmann S, Clandinin TR, Rhee SY, Goodman MB. An efficient behavioral screening platform classifies natural products and other chemical cues according to their chemosensory valence in C. elegans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.02.542933. [PMID: 37333363 PMCID: PMC10274637 DOI: 10.1101/2023.06.02.542933] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Throughout history, humans have relied on plants as a source of medication, flavoring, and food. Plants synthesize large chemical libraries and release many of these compounds into the rhizosphere and atmosphere where they affect animal and microbe behavior. To survive, nematodes must have evolved the sensory capacity to distinguish plant-made small molecules (SMs) that are harmful and must be avoided from those that are beneficial and should be sought. This ability to classify chemical cues as a function of their value is fundamental to olfaction, and represents a capacity shared by many animals, including humans. Here, we present an efficient platform based on multi-well plates, liquid handling instrumentation, inexpensive optical scanners, and bespoke software that can efficiently determine the valence (attraction or repulsion) of single SMs in the model nematode, Caenorhabditis elegans. Using this integrated hardware-wetware-software platform, we screened 90 plant SMs and identified 37 that attracted or repelled wild-type animals, but had no effect on mutants defective in chemosensory transduction. Genetic dissection indicates that for at least 10 of these SMs, response valence emerges from the integration of opposing signals, arguing that olfactory valence is often determined by integrating chemosensory signals over multiple lines of information. This study establishes that C. elegans is an effective discovery engine for determining chemotaxis valence and for identifying natural products detected by the chemosensory nervous system.
Collapse
Affiliation(s)
- Emily Fryer
- Department of Plant Biology, Carnegie Institution for Science
- Department of Molecular and Cellular Physiology, Stanford University
| | - Sujay Guha
- Department of Molecular and Cellular Physiology, Stanford University
| | | | - Theresa Logan-Garbisch
- Department of Molecular and Cellular Physiology, Stanford University
- Neurosciences Graduate Program, Stanford University
| | - Hodan Farah
- Department of Plant Biology, Carnegie Institution for Science
- Department of Molecular and Cellular Physiology, Stanford University
| | - Ehsan Rezaei
- Department of Molecular and Cellular Physiology, Stanford University
| | - Iris N. Mollhoff
- Department of Plant Biology, Carnegie Institution for Science
- Department of Molecular and Cellular Physiology, Stanford University
- Department of Biology, Stanford University
| | - Adam L. Nekimken
- Department of Molecular and Cellular Physiology, Stanford University
- Department of Mechanical Engineering, Stanford University
| | - Angela Xu
- Department of Plant Biology, Carnegie Institution for Science
| | - Lara Selin Seyahi
- Department of Plant Biology, Carnegie Institution for Science
- Department of Molecular and Cellular Physiology, Stanford University
| | - Sylvia Fechner
- Department of Molecular and Cellular Physiology, Stanford University
| | | | | | - Seung Y. Rhee
- Department of Plant Biology, Carnegie Institution for Science
| | - Miriam B. Goodman
- Department of Molecular and Cellular Physiology, Stanford University
| |
Collapse
|
16
|
Chen S, Li M, Semenov I. MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework. Methods 2024; 224:79-92. [PMID: 38430967 DOI: 10.1016/j.ymeth.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/16/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024] Open
Abstract
The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
| | - Minghui Li
- Beidahuang Industry Group General Hospital, Harbin, 150006, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
17
|
Wang M, Wang J, Rong Z, Wang L, Xu Z, Zhang L, He J, Li S, Cao L, Hou Y, Li K. A bidirectional interpretable compound-protein interaction prediction framework based on cross attention. Comput Biol Med 2024; 172:108239. [PMID: 38460309 DOI: 10.1016/j.compbiomed.2024.108239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 03/11/2024]
Abstract
The identification of compound-protein interactions (CPIs) plays a vital role in drug discovery. However, the huge cost and labor-intensive nature in vitro and vivo experiments make it urgent for researchers to develop novel CPI prediction methods. Despite emerging deep learning methods have achieved promising performance in CPI prediction, they also face ongoing challenges: (i) providing bidirectional interpretability from both the chemical and biological perspective for the prediction results; (ii) comprehensively evaluating model generalization performance; (iii) demonstrating the practical applicability of these models. To overcome the challenges posed by current deep learning methods, we propose a cross multi-head attention oriented bidirectional interpretable CPI prediction model (CmhAttCPI). First, CmhAttCPI takes molecular graphs and protein sequences as inputs, utilizing the GCW module to learn atom features and the CNN module to learn residue features, respectively. Second, the model applies cross multi-head attention module to compute attention weights for atoms and residues. Finally, CmhAttCPI employs a fully connected neural network to predict scores for CPIs. We evaluated the performance of CmhAttCPI on balanced datasets and imbalanced datasets. The results consistently show that CmhAttCPI outperforms multiple state-of-the-art methods. We constructed three scenarios based on compound and protein clustering and comprehensively evaluated the model generalization ability within these scenarios. The results demonstrate that the generalization ability of CmhAttCPI surpasses that of other models. Besides, the visualizations of attention weights reveal that CmhAttCPI provides chemical and biological interpretation for CPI prediction. Moreover, case studies confirm the practical applicability of CmhAttCPI in discovering anticancer candidates.
Collapse
Affiliation(s)
- Meng Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jianmin Wang
- School of Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
| | - Zhiwei Rong
- School of Public Health, Peking University, Beijing, 100871, China
| | - Liuying Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Zhenyi Xu
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Liuchao Zhang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jia He
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Shuang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Lei Cao
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Yan Hou
- School of Public Health, Peking University, Beijing, 100871, China
| | - Kang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
18
|
Yamane H, Ishida T. Helix encoder: a compound-protein interaction prediction model specifically designed for class A GPCRs. FRONTIERS IN BIOINFORMATICS 2023; 3:1193025. [PMID: 37304403 PMCID: PMC10250622 DOI: 10.3389/fbinf.2023.1193025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 05/15/2023] [Indexed: 06/13/2023] Open
Abstract
Class A G protein-coupled receptors (GPCRs) represent the largest class of GPCRs. They are essential targets of drug discovery and thus various computational approaches have been applied to predict their ligands. However, there are a large number of orphan receptors in class A GPCRs and it is difficult to use a general protein-specific supervised prediction scheme. Therefore, the compound-protein interaction (CPI) prediction approach has been considered one of the most suitable for class A GPCRs. However, the accuracy of CPI prediction is still insufficient. The current CPI prediction model generally employs the whole protein sequence as the input because it is difficult to identify the important regions in general proteins. In contrast, it is well-known that only a few transmembrane helices of class A GPCRs play a critical role in ligand binding. Therefore, using such domain knowledge, the CPI prediction performance could be improved by developing an encoding method that is specifically designed for this family. In this study, we developed a protein sequence encoder called the Helix encoder, which takes only a protein sequence of transmembrane regions of class A GPCRs as input. The performance evaluation showed that the proposed model achieved a higher prediction accuracy compared to a prediction model using the entire protein sequence. Additionally, our analysis indicated that several extracellular loops are also important for the prediction as mentioned in several biological researches.
Collapse
|
19
|
Hu J, Yu W, Pang C, Jin J, Pham NT, Manavalan B, Wei L. DrugormerDTI: Drug Graphormer for drug-target interaction prediction. Comput Biol Med 2023; 161:106946. [PMID: 37244151 DOI: 10.1016/j.compbiomed.2023.106946] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/29/2023] [Accepted: 04/15/2023] [Indexed: 05/29/2023]
Abstract
Drug-target interactions (DTI) prediction is a crucial task in drug discovery. Existing computational methods accelerate the drug discovery in this respect. However, most of them suffer from low feature representation ability, significantly affecting the predictive performance. To address the problem, we propose a novel neural network architecture named DrugormerDTI, which uses Graph Transformer to learn both sequential and topological information through the input molecule graph and Resudual2vec to learn the underlying relation between residues from proteins. By conducting ablation experiments, we verify the importance of each part of the DrugormerDTI. We also demonstrate the good feature extraction and expression capabilities of our model via comparing the mapping results of the attention layer and molecular docking results. Experimental results show that our proposed model performs better than baseline methods on four benchmarks. We demonstrate that the introduction of Graph Transformer and the design of residue are appropriate for drug-target prediction.
Collapse
Affiliation(s)
- Jiayue Hu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Wang Yu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chao Pang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
20
|
Fierro F, Peri L, Hübner H, Tabor-Schkade A, Waterloo L, Löber S, Pfeiffer T, Weikert D, Dingjan T, Margulis E, Gmeiner P, Niv MY. Inhibiting a promiscuous GPCR: iterative discovery of bitter taste receptor ligands. Cell Mol Life Sci 2023; 80:114. [PMID: 37012410 PMCID: PMC11072104 DOI: 10.1007/s00018-023-04765-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/09/2023] [Accepted: 03/21/2023] [Indexed: 04/05/2023]
Abstract
The human GPCR family comprises circa 800 members, activated by hundreds of thousands of compounds. Bitter taste receptors, TAS2Rs, constitute a large and distinct subfamily, expressed orally and extra-orally and involved in physiological and pathological conditions. TAS2R14 is the most promiscuous member, with over 150 agonists and 3 antagonists known prior to this study. Due to the scarcity of inhibitors and to the importance of chemical probes for exploring TAS2R14 functions, we aimed to discover new ligands for this receptor, with emphasis on antagonists. To cope with the lack of experimental structure of the receptor, we used a mixed experimental/computational methodology which iteratively improved the performance of the predicted structure. The increasing number of active compounds, obtained here through experimental screening of FDA-approved drug library, and through chemically synthesized flufenamic acid derivatives, enabled the refinement of the binding pocket, which in turn improved the structure-based virtual screening reliability. This mixed approach led to the identification of 10 new antagonists and 200 new agonists of TAS2R14, illustrating the untapped potential of rigorous medicinal chemistry for TAS2Rs. 9% of the ~ 1800 pharmaceutical drugs here tested activate TAS2R14, nine of them at sub-micromolar concentrations. The iterative framework suggested residues involved in the activation process, is suitable for expanding bitter and bitter-masking chemical space, and is applicable to other promiscuous GPCRs lacking experimental structures.
Collapse
Affiliation(s)
- Fabrizio Fierro
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Lior Peri
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Harald Hübner
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Alina Tabor-Schkade
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Lukas Waterloo
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Stefan Löber
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Tara Pfeiffer
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Dorothee Weikert
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Tamir Dingjan
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Eitan Margulis
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Peter Gmeiner
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany.
| | - Masha Y Niv
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
21
|
Remington JM, McKay KT, Beckage NB, Ferrell JB, Schneebeli ST, Li J. GPCRLigNet: rapid screening for GPCR active ligands using machine learning. J Comput Aided Mol Des 2023; 37:147-156. [PMID: 36840893 PMCID: PMC10379640 DOI: 10.1007/s10822-023-00497-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/03/2023] [Indexed: 02/26/2023]
Abstract
Molecules with bioactivity towards G protein-coupled receptors represent a subset of the vast space of small drug-like molecules. Here, we compare machine learning models, including dilated graph convolutional networks, that conduct binary classification to quickly identify molecules with activity towards G protein-coupled receptors. The models are trained and validated using a large set of over 600,000 active, inactive, and decoy compounds. The best performing machine learning model, dubbed GPCRLigNet, was a surprisingly simple feedforward dense neural network mapping from Morgan fingerprints to activity. Incorporation of GPCRLigNet into a high-throughput virtual screening workflow is demonstrated with molecular docking towards a particular G protein-coupled receptor, the pituitary adenylate cyclase-activating polypeptide receptor type 1. Through rigorous comparison of docking scores for molecules selected with and without using GPCRLigNet, we demonstrate an enrichment of potentially potent molecules using GPCRLigNet. This work provides a proof of principle that GPCRLigNet can effectively hone the chemical search space towards ligands with G protein-coupled receptor activity.
Collapse
Affiliation(s)
- Jacob M Remington
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA
| | - Kyle T McKay
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA
| | - Noah B Beckage
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA
| | - Jonathon B Ferrell
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA
| | - Severin T Schneebeli
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA.,Department of Industrial and Physical Pharmacy, Department of Chemistry, Purdue University, West Lafayette, IN, 47906, USA.,Department of Pathology, University of Vermont, Burlington, VT, 05405, USA
| | - Jianing Li
- Department of Chemistry, University of Vermont, Burlington, VT, 05405, USA. .,Department of Pathology, University of Vermont, Burlington, VT, 05405, USA. .,Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
22
|
Choi J, Seo S, Choi S, Piao S, Park C, Ryu SJ, Kim BJ, Park S. ReBADD-SE: Multi-objective molecular optimisation using SELFIES fragment and off-policy self-critical sequence training. Comput Biol Med 2023; 157:106721. [PMID: 36913852 DOI: 10.1016/j.compbiomed.2023.106721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/11/2023] [Accepted: 02/26/2023] [Indexed: 03/02/2023]
Abstract
The discovery of drugs to selectively remove disease-related cells is challenging in computer-aided drug design. Many studies have proposed multi-objective molecular generation methods and demonstrated their superiority using the public benchmark dataset for kinase inhibitor generation tasks. However, the dataset does not contain many molecules that violate Lipinski's rule of five. Thus, it remains unclear whether existing methods are effective in generating molecules violating the rule, such as navitoclax. To address this, we analysed the limitations of existing methods and propose a multi-objective molecular generation method with a novel parsing algorithm for molecular string representation and a modified reinforcement learning method for the efficient training of multi-objective molecular optimisation. The proposed model had success rates of 84% in GSK3b+JNK3 inhibitor generation and 99% in Bcl-2 family inhibitor generation tasks.
Collapse
Affiliation(s)
- Jonghwan Choi
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea.
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Seungyeon Choi
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Shengmin Piao
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Chihyun Park
- Department of Computer Science and Engineering, Kangwon National University, Chuncheon-si, 24341, Kangwon-do, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sung Jin Ryu
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| |
Collapse
|
23
|
Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023; 12:e82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Collapse
Affiliation(s)
- Abel Chandra
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Laura Tünnermann
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
| | - Tommy Löfstedt
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Regina Gratz
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
- Department of Forest Ecology and Management, Swedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
24
|
Huang S, Zheng S, Chen R. Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2588-2608. [PMID: 36899548 DOI: 10.3934/mbe.2023121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
G protein-coupled receptors (GPCRs) have been the targets for more than 40% of the currently approved drugs. Although neural networks can effectively improve the accuracy of prediction with the biological activity, the result is undesirable in the limited orphan GPCRs (oGPCRs) datasets. To this end, we proposed Multi-source Transfer Learning with Graph Neural Network, called MSTL-GNN, to bridge this gap. Firstly, there are three ideal sources of data for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, the SIMLEs format GPCRs convert to graphics, and they can be the input of Graph Neural Network (GNN) and ensemble learning for improving prediction accuracy. Finally, our experiments show that MSTL-GNN remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopted, R2 and Root-mean-square deviation (RMSE). Compared with the state-of-the-art work MSTL-GNN increased up to 67.13% and 17.22%, respectively. The effectiveness of MSTL-GNN in the field of GPCR Drug discovery with limited data also paves the way for other similar application scenarios.
Collapse
Affiliation(s)
- Shizhen Huang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
| | - ShaoDong Zheng
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
- VeriMake Innovation Lab, Nanjing Renmian Integrated Circuit Co., Ltd., Nanjing 210088, China
| | - Ruiqi Chen
- VeriMake Innovation Lab, Nanjing Renmian Integrated Circuit Co., Ltd., Nanjing 210088, China
| |
Collapse
|
25
|
Nguyen NQ, Jang G, Kim H, Kang J. Perceiver CPI: a nested cross-attention network for compound-protein interaction prediction. Bioinformatics 2022; 39:6842322. [PMID: 36416124 PMCID: PMC9848062 DOI: 10.1093/bioinformatics/btac731] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information. RESULTS We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments. AVAILABILITY AND IMPLEMENTATION Perceiver CPI is available at https://github.com/dmis-lab/PerceiverCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ngoc-Quang Nguyen
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Gwanghoon Jang
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Hajung Kim
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul 02841, Republic of Korea
| | | |
Collapse
|
26
|
G-Protein Coupled Receptors in Human Sperm: An In Silico Approach to Identify Potential Modulatory Targets. Molecules 2022; 27:molecules27196503. [PMID: 36235040 PMCID: PMC9571544 DOI: 10.3390/molecules27196503] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/28/2022] [Accepted: 09/28/2022] [Indexed: 11/09/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are involved in several physiological processes, and they represent the largest family of drug targets to date. However, the presence and function of these receptors are poorly described in human spermatozoa. Here, we aimed to identify and characterize the GPCRs present in human spermatozoa and perform an in silico analysis to understand their potential role in sperm functions. The human sperm proteome, including proteomic studies in which the criteria used for protein identification was set as <5% FDR and a minimum of 2 peptides match per protein, was crossed with the list of GPCRs retrieved from GLASS and GPCRdb databases. A total of 71 GPCRs were identified in human spermatozoa, of which 7 had selective expression in male tissues (epididymis, seminal vesicles, and testis), and 9 were associated with male infertility defects in mice. Additionally, ADRA2A, AGTR1, AGTR2, FZD3, and GLP1R were already associated with sperm-specific functions such as sperm capacitation, acrosome reaction, and motility, representing potential targets to modulate and improve sperm function. Finally, the protein-protein interaction network for the human sperm GPCRs revealed that 24 GPCRs interact with 49 proteins involved in crucial processes for sperm formation, maturation, and fertilization. This approach allowed the identification of 8 relevant GPCRs (ADGRE5, ADGRL2, GLP1R, AGTR2, CELSR2, FZD3, CELSR3, and GABBR1) present in human spermatozoa that can be the subject of further investigation to be used even as potential modulatory targets to treat male infertility or to develop new non-hormonal male contraceptives.
Collapse
|
27
|
Cheng Z, Zhao Q, Li Y, Wang J. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 2022; 38:4153-4161. [PMID: 35801934 DOI: 10.1093/bioinformatics/btac485] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 05/02/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. RESULTS In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict drug-target interactions (DTIs) based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve (AUPR), precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. AVAILABILITY AND IMPLEMENTATION The data and codes underlying this article are available in Github at https://github.com/czjczj/IIFDTI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhongjian Cheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
28
|
What Makes GPCRs from Different Families Bind to the Same Ligand? Biomolecules 2022; 12:biom12070863. [PMID: 35883418 PMCID: PMC9313020 DOI: 10.3390/biom12070863] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 06/09/2022] [Accepted: 06/19/2022] [Indexed: 12/10/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are the largest class of cell-surface receptor proteins with important functions in signal transduction and often serve as therapeutic drug targets. With the rapidly growing public data on three dimensional (3D) structures of GPCRs and GPCR-ligand interactions, computational prediction of GPCR ligand binding becomes a convincing option to high throughput screening and other experimental approaches during the beginning phases of ligand discovery. In this work, we set out to computationally uncover and understand the binding of a single ligand to GPCRs from several different families. Three-dimensional structural comparisons of the GPCRs that bind to the same ligand revealed local 3D structural similarities and often these regions overlap with locations of binding pockets. These pockets were found to be similar (based on backbone geometry and side-chain orientation using APoc), and they correlate positively with electrostatic properties of the pockets. Moreover, the more similar the pockets, the more likely a ligand binding to the pockets will interact with similar residues, have similar conformations, and produce similar binding affinities across the pockets. These findings can be exploited to improve protein function inference, drug repurposing and drug toxicity prediction, and accelerate the development of new drugs.
Collapse
|
29
|
Yin Y, Hu H, Yang Z, Jiang F, Huang Y, Wu J. AFSE: towards improving model generalization of deep graph learning of ligand bioactivities targeting GPCR proteins. Brief Bioinform 2022; 23:6554127. [PMID: 35348582 DOI: 10.1093/bib/bbac077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 11/14/2022] Open
Abstract
Ligand molecules naturally constitute a graph structure. Recently, many excellent deep graph learning (DGL) methods have been proposed and used to model ligand bioactivities, which is critical for the virtual screening of drug hits from compound databases in interest. However, pharmacists can find that these well-trained DGL models usually are hard to achieve satisfying performance in real scenarios for virtual screening of drug candidates. The main challenges involve that the datasets for training models were small-sized and biased, and the inner active cliff cases would worsen model performance. These challenges would cause predictors to overfit the training data and have poor generalization in real virtual screening scenarios. Thus, we proposed a novel algorithm named adversarial feature subspace enhancement (AFSE). AFSE dynamically generates abundant representations in new feature subspace via bi-directional adversarial learning, and then minimizes the maximum loss of molecular divergence and bioactivity to ensure local smoothness of model outputs and significantly enhance the generalization of DGL models in predicting ligand bioactivities. Benchmark tests were implemented on seven state-of-the-art open-source DGL models with the potential of modeling ligand bioactivities, and precisely evaluated by multiple criteria. The results indicate that, on almost all 33 GPCRs datasets and seven DGL models, AFSE greatly improved their enhancement factor (top-10%, 20% and 30%), which is the most important evaluation in virtual screening of hits from compound databases, while ensuring the superior performance on RMSE and $r^2$. The web server of AFSE is freely available at http://noveldelta.com/AFSE for academic purposes.
Collapse
Affiliation(s)
- Yueming Yin
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Haifeng Hu
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Zhen Yang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China.,National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Feihu Jiang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Yihe Huang
- School of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jiansheng Wu
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Smart Health Big Data Analysis and Location Services Engineering Research Center of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
30
|
Cai T, Abbu KA, Liu Y, Xie L. DeepREAL: A Deep Learning Powered Multi-scale Modeling Framework for Predicting Out-of-distribution Ligand-induced GPCR Activity. Bioinformatics 2022; 38:2561-2570. [PMID: 35274689 PMCID: PMC9048666 DOI: 10.1093/bioinformatics/btac154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 11/20/2022] Open
Abstract
Motivation Drug discovery has witnessed intensive exploration of predictive modeling of drug–target physical interactions over two decades. However, a critical knowledge gap needs to be filled for correlating drug–target interactions with clinical outcomes: predicting genome-wide receptor activities or function selectivity, especially agonist versus antagonist, induced by novel chemicals. Two major obstacles compound the difficulty on this task: known data of receptor activity is far too scarce to train a robust model in light of genome-scale applications, and real-world applications need to deploy a model on data from various shifted distributions. Results To address these challenges, we have developed an end-to-end deep learning framework, DeepREAL, for multi-scale modeling of genome-wide ligand-induced receptor activities. DeepREAL utilizes self-supervised learning on tens of millions of protein sequences and pre-trained binary interaction classification to solve the data distribution shift and data scarcity problems. Extensive benchmark studies on G-protein coupled receptors (GPCRs), which simulate real-world scenarios, demonstrate that DeepREAL achieves state-of-the-art performances in out-of-distribution settings. DeepREAL can be extended to other gene families beyond GPCRs. Availability and implementation All data used are downloaded from Pfam (Mistry et al., 2020), GLASS (Chan et al., 2015) and IUPHAR/BPS and the data from reference (Sakamuru et al., 2021). Readers are directed to their official website for original data. Code is available on GitHub https://github.com/XieResearchGroup/DeepREAL. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Kyra Alyssa Abbu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA.,Helen and Robert Appel Alzheimer's Disease Research Institute,Feil Family Brain & Mind Research Institute,Weill Cornell Medicine,Cornell University, New York, 10021, USA
| |
Collapse
|
31
|
Wu J, Lan C, Mei Z, Chen X, Zhu Y, Hu H, Diao Y. Transfer learning with molecular graph convolutional networks for accurate modelling and representation of bioactivities of ligands targeting GPCRs without sufficient data. Comput Biol Chem 2022; 98:107664. [DOI: 10.1016/j.compbiolchem.2022.107664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 02/23/2022] [Accepted: 03/06/2022] [Indexed: 11/29/2022]
|
32
|
The SwissSimilarity 2021 Web Tool: Novel Chemical Libraries and Additional Methods for an Enhanced Ligand-Based Virtual Screening Experience. Int J Mol Sci 2022; 23:ijms23020811. [PMID: 35054998 PMCID: PMC8776004 DOI: 10.3390/ijms23020811] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/06/2022] [Accepted: 01/07/2022] [Indexed: 01/27/2023] Open
Abstract
Hit finding, scaffold hopping, and structure–activity relationship studies are important tasks in rational drug discovery. Implementation of these tasks strongly depends on the availability of compounds similar to a known bioactive molecule. SwissSimilarity is a web tool for low-to-high-throughput virtual screening of multiple chemical libraries to find molecules similar to a compound of interest. According to the similarity principle, the output list of molecules generated by SwissSimilarity is expected to be enriched in compounds that are likely to share common protein targets with the query molecule and that can, therefore, be acquired and tested experimentally in priority. Compound libraries available for screening using SwissSimilarity include approved drugs, clinical candidates, known bioactive molecules, commercially available and synthetically accessible compounds. The first version of SwissSimilarity launched in 2015 made use of various 2D and 3D molecular descriptors, including path-based FP2 fingerprints and ElectroShape vectors. However, during the last few years, new fingerprinting methods for molecular description have been developed or have become popular. Here we would like to announce the launch of the new version of the SwissSimilarity web tool, which features additional 2D and 3D methods for estimation of molecular similarity: extended-connectivity, MinHash, 2D pharmacophore, extended reduced graph, and extended 3D fingerprints. Moreover, it is now possible to screen for molecular structures having the same scaffold as the query compound. Additionally, all compound libraries available for screening in SwissSimilarity have been updated, and several new ones have been added to the list. Finally, the interface of the website has been comprehensively rebuilt to provide a better user experience. The new version of SwissSimilarity is freely available starting from December 2021.
Collapse
|
33
|
Zhang H, Chen S. Cyclic peptide drugs approved in the last two decades (2001-2021). RSC Chem Biol 2022; 3:18-31. [PMID: 35128405 PMCID: PMC8729179 DOI: 10.1039/d1cb00154j] [Citation(s) in RCA: 128] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/05/2021] [Indexed: 01/01/2023] Open
Abstract
In contrast to the major families of small molecules and antibodies, cyclic peptides, as a family of synthesizable macromolecules, have distinct biochemical and therapeutic properties for pharmaceutical applications. Cyclic peptide-based drugs have increasingly been developed in the past two decades, confirming the common perception that cyclic peptides have high binding affinities and low metabolic toxicity as antibodies, good stability and ease of manufacture as small molecules. Natural peptides were the major source of cyclic peptide drugs in the last century, and cyclic peptides derived from novel screening and cyclization strategies are the new source. In this review, we will discuss and summarize 18 cyclic peptides approved for clinical use in the past two decades to provide a better understanding of cyclic peptide development and to inspire new perspectives. The purpose of the present review is to promote efforts to resolve the challenges in the development of cyclic peptide drugs that are more effective.
Collapse
Affiliation(s)
- Huiya Zhang
- Biotech Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences Shanghai 201203 China
| | - Shiyu Chen
- Biotech Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences Shanghai 201203 China
| |
Collapse
|
34
|
Binding site identification of G protein-coupled receptors through a 3D Zernike polynomials-based method: application to C. elegans olfactory receptors. J Comput Aided Mol Des 2022; 36:11-24. [PMID: 34977999 PMCID: PMC8831295 DOI: 10.1007/s10822-021-00434-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 11/18/2021] [Indexed: 11/01/2022]
Abstract
Studying the binding processes of G protein-coupled receptors (GPCRs) proteins is of particular interest both to better understand the molecular mechanisms that regulate the signaling between the extracellular and intracellular environment and for drug design purposes. In this study, we propose a new computational approach for the identification of the binding site for a specific ligand on a GPCR. The method is based on the Zernike polynomials and performs the ligand-GPCR association through a shape complementarity analysis of the local molecular surfaces. The method is parameter-free and it can distinguish, working on hundreds of experimentally GPCR-ligand complexes, binding pockets from randomly sampled regions on the receptor surface, obtaining an Area Under ROC curve of 0.77. Given its importance both as a model organism and in terms of applications, we thus investigated the olfactory receptors of the C. elegans, building a list of associations between 21 GPCRs belonging to its olfactory neurons and a set of possible ligands. Thus, we can not only carry out rapid and efficient screenings of drugs proposed for GPCRs, key targets in many pathologies, but also we laid the groundwork for computational mutagenesis processes, aimed at increasing or decreasing the binding affinity between ligands and receptors.
Collapse
|
35
|
Aroankins TS, Murali SK, Fenton RA, Wu Q. The Hydrogen-Coupled Oligopeptide Membrane Cotransporter Pept2 is SUMOylated in Kidney Distal Convoluted Tubule Cells. Front Mol Biosci 2021; 8:790606. [PMID: 34881291 PMCID: PMC8646034 DOI: 10.3389/fmolb.2021.790606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 11/08/2021] [Indexed: 11/13/2022] Open
Abstract
Protein post-translational modification by the Small Ubiquitin-like MOdifier (SUMO) on lysine residues is a reversible process highly important for transcription and protein stability. In the kidney, SUMOylation appears to be important for the cellular response to aldosterone. Therefore, in this study, we generated a SUMOylation profile of the aldosterone-sensitive kidney distal convoluted tubule (DCT) as a basis for understanding SUMOylation events in this cell type. Using mass spectrometry-based proteomics, 1037 SUMO1 and 552 SUMO2 sites, corresponding to 546 SUMO1 and 356 SUMO2 proteins, were identified from a modified mouse kidney DCT cell line (mpkDCT). SUMOylation of the renal hydrogen-coupled oligopeptide and drug co-transporter (Pept2) at one site (K139) was found to be highly regulated by aldosterone. Using immunolabelling of mouse kidney sections Pept2 was localized to DCT cells in vivo. Aldosterone stimulation of mpkDCT cell lines expressing wild-type Pept2 or mutant K139R-Pept2, post-transcriptionally increased Pept2 expression up to four-fold. Aldosterone decreased wild-type Pept2 abundance in the apical membrane domain of mpkDCT cells, but this response was absent in K139R-Pept2 expressing cells. In summary, we have generated a SUMOylation landscape of the mouse DCT and determined that SUMOylation plays an important role in the physiological regulation of Pept2 trafficking by aldosterone.
Collapse
Affiliation(s)
- Takwa S Aroankins
- Department of Biomedicine, Aarhus University, Aarhus, Denmark.,Department of Anesthesiology and Intensive Care, Sahlgrenska University Hospital, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | | | - Robert A Fenton
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Qi Wu
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
| |
Collapse
|
36
|
Harini K, Jayashree S, Tiwari V, Vishwanath S, Sowdhamini R. Ligand Docking Methods to Recognize Allosteric Inhibitors for G-Protein-Coupled Receptors. Bioinform Biol Insights 2021; 15:11779322211037769. [PMID: 34733103 PMCID: PMC8558589 DOI: 10.1177/11779322211037769] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 07/20/2021] [Indexed: 11/30/2022] Open
Abstract
G-protein-coupled receptors (GPCRs) are membrane proteins which play an important role in many cellular processes and are excellent drug targets. Despite the existence of several US Food and Drug Administration (FDA)-approved GPCR-targeting drugs, there is a continuing challenge of side effects owing to the nonspecific nature of drug binding. We have investigated the diversity of the ligand binding site for this class of proteins against their cognate ligands using computational docking, even if their structures are known already in the ligand-complexed form. The cognate ligand of some of these receptors dock at allosteric binding site with better score than the binding at the conservative site. Interestingly, amino acid residues at such allosteric binding site are not conserved across GPCR subfamilies. Such a computational approach can assist in the prediction of specific allosteric binders for GPCRs.
Collapse
Affiliation(s)
- K Harini
- Department of Bioinformatics, National Centre for Biological Sciences, Bangalore, India
| | - S Jayashree
- Department of Biotechnology, Vellore Institute of Technology, Vellore, India.,Royal Melbourne Institute of Technology (RMIT) University, Melbourne, VIC, Australia
| | - Vikas Tiwari
- Department of Bioinformatics, National Centre for Biological Sciences, Bangalore, India
| | - Sneha Vishwanath
- Department of Biophysics, Indian Institute of Science, Bangalore, India.,Department of Zoology, University of Cambridge, Cambridge, UK
| | - Ramanathan Sowdhamini
- Department of Bioinformatics, National Centre for Biological Sciences, Bangalore, India
| |
Collapse
|
37
|
Yin Y, Hu H, Yang Z, Xu H, Wu J. RealVS: Toward Enhancing the Precision of Top Hits in Ligand-Based Virtual Screening of Drug Leads from Large Compound Databases. J Chem Inf Model 2021; 61:4924-4939. [PMID: 34619030 DOI: 10.1021/acs.jcim.1c01021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Accurate modeling of compound bioactivities is essential for the virtual screening of drug leads. In real-world scenarios, pharmacists tend to choose from the top-k hit compounds ranked by predicted bioactivities from a large database with interest to continue wet experiments for drug discovery. Significant improvement of the precision of the top hits in ligand-based virtual screening of drug leads is more valuable than conventional schemes for accurately predicting the bioactivities of all compounds from a large database. Here, we proposed a new method, RealVS, to significantly improve the top hits' precision and learn interpretable key substructures associated with compound bioactivities. The features of RealVS involve the following points. (1) Abundant transferable information from the source domain was introduced for alleviating the insufficiency of inactive ligands associated with drug targets. (2) The adversarial domain alignment was adopted to fit the distribution of generated features of compounds from the training data set and that from the screening database for greater model generalization ability. (3) A novel objective function was proposed to simultaneously optimize the classification loss, regression loss, and adversarial loss, where most inactive ligands tend to be screened out before activity regression prediction. (4) Graph attention networks were adopted for learning key substructures associated with ligand bioactivities for better model interpretability. The results on a large number of benchmark data sets show that our method has significantly improved the precision of top hits under various k values in ligand-based virtual screening of drug leads from large compound databases, which is of great value in real-world scenarios. The web server of RealVS is freely available at noveldelta.com/RealVS for academic purposes, where virtual screening of hits from large compound databases is accessible.
Collapse
Affiliation(s)
- Yueming Yin
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Haifeng Hu
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Zhen Yang
- National Engineering Research Center of Communications and Networking, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Huajian Xu
- College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Jiansheng Wu
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| |
Collapse
|
38
|
Hu T, Zheng G, Xue D, Zhao S, Li F, Zhou F, Zhao F, Xie L, Tian C, Hua T, Zhao S, Xu Y, Zhong G, Liu ZJ, Makriyannis A, Stevens RC, Tao H. Rational Remodeling of Atypical Scaffolds for the Design of Photoswitchable Cannabinoid Receptor Tools. J Med Chem 2021; 64:13752-13765. [PMID: 34477367 DOI: 10.1021/acs.jmedchem.1c01088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Azobenzene-embedded photoswitchable ligands are the widely used chemical tools in photopharmacological studies. Current approaches to azobenzene introduction rely mainly on the isosteric replacement of typical azologable groups. However, atypical scaffolds may offer more opportunities for photoswitch remodeling, which are chemically in an overwhelming majority. Herein, we investigate the rational remodeling of atypical scaffolds for azobenzene introduction, as exemplified in the development of photoswitchable ligands for the cannabinoid receptor 2 (CB2). Based on the analysis of residue-type clusters surrounding the binding pocket, we conclude that among the three representative atypical arms of the CB2 antagonist, AM10257, the adamantyl arm is the most appropriate for azobenzene remodeling. The optimizing spacer length and attachment position revealed AzoLig 9 with excellent thermal bistability, decent photopharmacological switchability between its two configurations, and high subtype selectivity. This structure-guided approach gave new impetus in the extension of new chemical spaces for tool customization for increasingly diversified photo-pharmacological studies and beyond.
Collapse
Affiliation(s)
- Tao Hu
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China.,CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guoxun Zheng
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Dongxiang Xue
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Simeng Zhao
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Fei Li
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Fang Zhou
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Fei Zhao
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Linshan Xie
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Cuiping Tian
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Tian Hua
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Yueming Xu
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Guisheng Zhong
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Zhi-Jie Liu
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China
| | - Alexandros Makriyannis
- Center for Drug Discovery, Department of Pharmaceutical Sciences and Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| | - Raymond C Stevens
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China.,School of Life Science and Technology, ShanghaiTech University, Pudong, Shanghai 201210, China.,Departments of Biological Sciences and Chemistry, Bridge Institute, USC Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, California 90089, United States
| | - Houchao Tao
- iHuman Institute, ShanghaiTech University, Pudong, Shanghai 201210, China
| |
Collapse
|
39
|
Maciuszek M, Ortega-Gomez A, Maas SL, Garrido-Mesa J, Ferraro B, Perretti M, Merritt A, Nicolaes GAF, Soehnlein O, Chapman TM. Design, synthesis, and biological evaluation of novel pyrrolidinone small-molecule Formyl peptide receptor 2 agonists. Eur J Med Chem 2021; 226:113805. [PMID: 34536667 DOI: 10.1016/j.ejmech.2021.113805] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/23/2021] [Accepted: 08/24/2021] [Indexed: 10/20/2022]
Abstract
A series of Formyl peptide receptor 2 small molecule agonists with a pyrrolidinone scaffold, derived from a combination of pharmacophore modelling and docking studies, were designed and synthesized. The GLASS (GPCR-Ligand Association) database was screened using a pharmacophore model. The most promising novel ligand structures were chosen and then tested in cellular assays (calcium mobilization and β-arrestin assays). Amongst the selected ligands, two pyrrolidinone compounds (7 and 8) turned out to be the most active. Moreover compound 7 was able to reduce the number of adherent neutrophils in a human neutrophil static adhesion assay which indicates its anti-inflammatory and proresolving properties. Further exploration and optimization of new ligands showed that heterocyclic rings, e.g. pyrazole directly connected to the pyrrolidinone scaffold, provide good stability and a boost in the agonistic activity. The compounds of most interest (7 and 30) were tested in an ERK phosphorylation assay, demonstrating selectivity towards FPR2 over FPR1. Compound 7 was examined in an in vivo mouse pharmacokinetic study. Compound 7 may be a valuable in vivo tool and help improve understanding of the role of the FPR2 receptor in the resolution of inflammation process.
Collapse
Affiliation(s)
- Monika Maciuszek
- LifeArc, Accelerator Building, Open Innovation Campus, Stevenage, UK; The William Harvey Research Institute, Barts and the London School of Medicine, Queen Mary University of London, London, UK.
| | - Almudena Ortega-Gomez
- Institute for Cardiovascular Prevention (IPEK), LMU Munich Hospital, Munich, Germany
| | - Sanne L Maas
- Institute for Cardiovascular Prevention (IPEK), LMU Munich Hospital, Munich, Germany
| | - Jose Garrido-Mesa
- The William Harvey Research Institute, Barts and the London School of Medicine, Queen Mary University of London, London, UK
| | - Bartolo Ferraro
- Institute for Cardiovascular Prevention (IPEK), LMU Munich Hospital, Munich, Germany
| | - Mauro Perretti
- The William Harvey Research Institute, Barts and the London School of Medicine, Queen Mary University of London, London, UK
| | - Andy Merritt
- LifeArc, Accelerator Building, Open Innovation Campus, Stevenage, UK
| | - Gerry A F Nicolaes
- CARIM - School for Cardiovascular Sciences Department of Biochemistry, Maastricht University, Maastricht, Netherlands
| | - Oliver Soehnlein
- Institute for Cardiovascular Prevention (IPEK), LMU Munich Hospital, Munich, Germany; Department of Physiology and Pharmacology (FyFa), Karolinska Institute, Stockholm, Sweden; Institute for Experimental Pathology (ExPat), Centre for Molecular Biology of Inflammation, University of Münster, Münster, Germany
| | - Timothy M Chapman
- LifeArc, Accelerator Building, Open Innovation Campus, Stevenage, UK
| |
Collapse
|
40
|
Monticolo F, Chiusano ML. Computational Approaches for Cancer-Fighting: From Gene Expression to Functional Foods. Cancers (Basel) 2021; 13:4207. [PMID: 34439361 PMCID: PMC8393935 DOI: 10.3390/cancers13164207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/13/2021] [Accepted: 08/17/2021] [Indexed: 01/22/2023] Open
Abstract
It is today widely accepted that a healthy diet is very useful to prevent the risk for cancer or its deleterious effects. Nutrigenomics studies are therefore taking place with the aim to test the effects of nutrients at molecular level and contribute to the search for anti-cancer treatments. These efforts are expanding the precious source of information necessary for the selection of natural compounds useful for the design of novel drugs or functional foods. Here we present a computational study to select new candidate compounds that could play a role in cancer prevention and care. Starting from a dataset of genes that are co-expressed in programmed cell death experiments, we investigated on nutrigenomics treatments inducing apoptosis, and searched for compounds that determine the same expression pattern. Subsequently, we selected cancer types where the genes showed an opposite expression pattern and we confirmed that the apoptotic/nutrigenomics expression trend had a significant positive survival in cancer-affected patients. Furthermore, we considered the functional interactors of the genes as defined by public protein-protein interaction data, and inferred on their involvement in cancers and/or in programmed cell death. We identified 7 genes and, from available nutrigenomics experiments, 6 compounds effective on their expression. These 6 compounds were exploited to identify, by ligand-based virtual screening, additional molecules with similar structure. We checked for ADME criteria and selected 23 natural compounds representing suitable candidates for further testing their efficacy in apoptosis induction. Due to their presence in natural resources, novel drugs and/or the design of functional foods are conceivable from the presented results.
Collapse
Affiliation(s)
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, Università degli Studi di Napoli Federico II, Via Università 100, 80055 Portici, Italy;
| |
Collapse
|
41
|
Song B, Li Z, Lin X, Wang J, Wang T, Fu X. Pretraining model for biological sequence data. Brief Funct Genomics 2021; 20:181-195. [PMID: 34050350 PMCID: PMC8194843 DOI: 10.1093/bfgp/elab025] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 04/13/2021] [Accepted: 04/21/2021] [Indexed: 12/26/2022] Open
Abstract
With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiangzheng Fu
- Corresponding author: Xiangzheng Fu, College of Information Science and Engineering, Hunan University, Changsha, Hunan, China. Tel: 86-0731-88821907; E-mail:
| |
Collapse
|
42
|
Killoran MP, Levin S, Boursier ME, Zimmerman K, Hurst R, Hall MP, Machleidt T, Kirkland TA, Friedman Ohana R. An Integrated Approach toward NanoBRET Tracers for Analysis of GPCR Ligand Engagement. Molecules 2021; 26:molecules26102857. [PMID: 34065854 PMCID: PMC8151276 DOI: 10.3390/molecules26102857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 01/22/2023] Open
Abstract
Gaining insight into the pharmacology of ligand engagement with G-protein coupled receptors (GPCRs) under biologically relevant conditions is vital to both drug discovery and basic research. NanoLuc-based bioluminescence resonance energy transfer (NanoBRET) monitoring competitive binding between fluorescent tracers and unmodified test compounds has emerged as a robust and sensitive method to quantify ligand engagement with specific GPCRs genetically fused to NanoLuc luciferase or the luminogenic HiBiT peptide. However, development of fluorescent tracers is often challenging and remains the principal bottleneck for this approach. One way to alleviate the burden of developing a specific tracer for each receptor is using promiscuous tracers, which is made possible by the intrinsic specificity of BRET. Here, we devised an integrated tracer discovery workflow that couples machine learning-guided in silico screening for scaffolds displaying promiscuous binding to GPCRs with a blend of synthetic strategies to rapidly generate multiple tracer candidates. Subsequently, these candidates were evaluated for binding in a NanoBRET ligand-engagement screen across a library of HiBiT-tagged GPCRs. Employing this workflow, we generated several promiscuous fluorescent tracers that can effectively engage multiple GPCRs, demonstrating the efficiency of this approach. We believe that this workflow has the potential to accelerate discovery of NanoBRET fluorescent tracers for GPCRs and other target classes.
Collapse
Affiliation(s)
- Michael P. Killoran
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Sergiy Levin
- Promega Biosciences LLC, 277 Granada Drive, San Luis Obispo, CA 93401, USA; (S.L.); (T.A.K.)
| | - Michelle E. Boursier
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Kristopher Zimmerman
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Robin Hurst
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Mary P. Hall
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Thomas Machleidt
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Thomas A. Kirkland
- Promega Biosciences LLC, 277 Granada Drive, San Luis Obispo, CA 93401, USA; (S.L.); (T.A.K.)
| | - Rachel Friedman Ohana
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
- Correspondence: ; Tel.: +1-608-274-1181
| |
Collapse
|
43
|
Lee T, Lee S, Kang M, Kim S. Deep hierarchical embedding for simultaneous modeling of GPCR proteins in a unified metric space. Sci Rep 2021; 11:9543. [PMID: 33953216 PMCID: PMC8100104 DOI: 10.1038/s41598-021-88623-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open
Abstract
GPCR proteins belong to diverse families of proteins that are defined at multiple hierarchical levels. Inspecting relationships between GPCR proteins on the hierarchical structure is important, since characteristics of the protein can be inferred from proteins in similar hierarchical information. However, modeling of GPCR families has been performed separately for each of the family, subfamily, and sub-subfamily level. Relationships between GPCR proteins are ignored in these approaches as they process the information in the proteins with several disconnected models. In this study, we propose DeepHier, a deep learning model to simultaneously learn representations of GPCR family hierarchy from the protein sequences with a unified single model. Novel loss term based on metric learning is introduced to incorporate hierarchical relations between proteins. We tested our approach using a public GPCR sequence dataset. Metric distances in the deep feature space corresponded to the hierarchical family relation between GPCR proteins. Furthermore, we demonstrated that further downstream tasks, like phylogenetic reconstruction and motif discovery, are feasible in the constructed embedding space. These results show that hierarchical relations between sequences were successfully captured in both of technical and biological aspects.
Collapse
Affiliation(s)
- Taeheon Lee
- Looxid Labs, Seoul, 06628, Republic of Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Minji Kang
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea. .,Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea. .,Institute of Engineering Research, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
44
|
GPCR_LigandClassify.py; a rigorous machine learning classifier for GPCR targeting compounds. Sci Rep 2021; 11:9510. [PMID: 33947911 PMCID: PMC8097070 DOI: 10.1038/s41598-021-88939-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
The current study describes the construction of various ligand-based machine learning models to be used for drug-repurposing against the family of G-Protein Coupled Receptors (GPCRs). In building these models, we collected > 500,000 data points, encompassing experimentally measured molecular association data of > 160,000 unique ligands against > 250 GPCRs. These data points were retrieved from the GPCR-Ligand Association (GLASS) database. We have used diverse molecular featurization methods to describe the input molecules. Multiple supervised ML algorithms were developed, tested and compared for their accuracy, F scores, as well as for their Matthews' correlation coefficient scores (MCC). Our data suggest that combined with molecular fingerprinting, ensemble decision trees and gradient boosted trees ML algorithms are on the accuracy border of the rather sophisticated deep neural nets (DNNs)-based algorithms. On a test dataset, these models displayed an excellent performance, reaching a ~ 90% classification accuracy. Additionally, we showcase a few examples where our models were able to identify interesting connections between known drugs from the Drug-Bank database and members of the GPCR family of receptors. Our findings are in excellent agreement with previously reported experimental observations in the literature. We hope the models presented in this paper synergize with the currently ongoing interest of applying machine learning modeling in the field of drug repurposing and computational drug discovery in general.
Collapse
|
45
|
Cai T, Lim H, Abbu KA, Qiu Y, Nussinov R, Xie L. MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization. J Chem Inf Model 2021; 61:1570-1582. [PMID: 33757283 PMCID: PMC8154251 DOI: 10.1021/acs.jcim.0c01285] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Indexed: 01/14/2023]
Abstract
Small molecules play a critical role in modulating biological systems. Knowledge of chemical-protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical-protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Hansaim Lim
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Kyra Alyssa Abbu
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
| | - Yue Qiu
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Ruth Nussinov
- Computational
Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Lei Xie
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Helen
and Robert Appel Alzheimer’s Disease Research Institute, Feil
Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York 10021, United States
| |
Collapse
|
46
|
Chemogenomic approach to identifying nematode chemoreceptor drug targets in the entomopathogenic nematode Heterorhabditis bacteriophora. Comput Biol Chem 2021; 92:107464. [PMID: 33667976 DOI: 10.1016/j.compbiolchem.2021.107464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 02/18/2021] [Accepted: 02/22/2021] [Indexed: 11/21/2022]
Abstract
Parasitic nematodes constitute one of the major threats to human health, causing diseases of major socioeconomic importance worldwide. Recent estimates indicate that more than 1 billion people are infected with parasitic nematodes around the world. Current measures to combat parasitic nematode infections include anthelmintic drugs. However, heavy exposure to anthelmintics has selected populations of livestock parasitic nematodes that are no longer susceptible to the drugs, rendering several anthelmintics useless for parasitic nematode control in many areas of the world. The rapidity with which anthelmintic resistance developed in response to these drugs suggests that increasing the selective pressure on human parasitic nematodes will also rapidly generate resistant worm populations. Therefore, development of new anthelmintics is of major importance before resistance becomes widespread in human parasitic nematode populations. G-Protein Coupled Receptors (GPCRs) represent an important target for many pharmacological interventions due to their ubiquitous expression in various cell types. GPCRs contribute to numerous physiological processes, and their ligand binding sites located on cell surfaces make them accessible targets and attractive substrates in terms of druggability. In fact, ∼35 % of Food and Drug Administration (FDA) and European Medicines Agency (EMA) approved drugs target GPCRs and their associated proteins, with over 300 additional drugs targeting GPCRs at the clinical trial stage. Nematode Chemosensory GPCRs (NemChRs) are unique to nematodes, and therefore represent ideal substrates for target-based drug discovery. Here we set out to identify NemChRs that are transcriptionally active inside the host, and to use these NemChRs in a reverse pharmacological screen to impede parasitic development. Our data identified several NemChRs, and we focused on one that was expressed in neuronal cells and exhibited the highest fold change in transcription after host activation. Next, we performed homology modelling and molecular dynamics simulations of this NemChR in order to conduct a virtual screening campaign to identify candidate drug targets which were ranked and selected for experimental testing in bioassays. Taken together, our results identify and characterize a candidate NemChR drug target, and provide a chemogenomic pipeline for identifying nematicide substrates.
Collapse
|
47
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
48
|
|
49
|
Tiss A, Ben Boubaker R, Henrion D, Guissouma H, Chabbert M. Homology Modeling of Class A G-Protein-Coupled Receptors in the Age of the Structure Boom. Methods Mol Biol 2021; 2315:73-97. [PMID: 34302671 DOI: 10.1007/978-1-0716-1468-6_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
With 700 members, G protein-coupled receptors (GPCRs) of the rhodopsin family (class A) form the largest membrane receptor family in humans and are the target of about 30% of presently available pharmaceutical drugs. The recent boom in GPCR structures led to the structural resolution of 57 unique receptors in different states (39 receptors in inactive state only, 2 receptors in active state only and 16 receptors in different activation states). In spite of these tremendous advances, most computational studies on GPCRs, including molecular dynamics simulations, virtual screening and drug design, rely on GPCR models obtained by homology modeling. In this protocol, we detail the different steps of homology modeling with the MODELLER software, from template selection to model evaluation. The present structure boom provides closely related templates for most receptors. If, in these templates, some of the loops are not resolved, in most cases, the numerous available structures enable to find loop templates with similar length for equivalent loops. However, simultaneously, the large number of putative templates leads to model ambiguities that may require additional information based on multiple sequence alignments or molecular dynamics simulations to be resolved. Using the modeling of the human bradykinin receptor B1 as a case study, we show how several templates are managed by MODELLER, and how the choice of template(s) and of template fragments can improve the quality of the models. We also give examples of how additional information and tools help the user to resolve ambiguities in GPCR modeling.
Collapse
Affiliation(s)
- Asma Tiss
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France.,Laboratoire de Génétique, Immunologie et Pathologies Humaines, Département de Biologie, Faculté des Sciences de Tunis, Université de Tunis El Manar, Tunis, Tunisie
| | - Rym Ben Boubaker
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Daniel Henrion
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Hajer Guissouma
- Laboratoire de Génétique, Immunologie et Pathologies Humaines, Département de Biologie, Faculté des Sciences de Tunis, Université de Tunis El Manar, Tunis, Tunisie
| | - Marie Chabbert
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France.
| |
Collapse
|
50
|
Paki R, Nourani E, Farajzadeh D. Classification of G protein-coupled receptors using attention mechanism. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|