1
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
2
|
Bai Q, Xu T, Huang J, Pérez-Sánchez H. Geometric deep learning methods and applications in 3D structure-based drug design. Drug Discov Today 2024; 29:104024. [PMID: 38759948 DOI: 10.1016/j.drudis.2024.104024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024]
Abstract
3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.
Collapse
Affiliation(s)
- Qifeng Bai
- School of Basic Medical Sciences, Lanzhou University, Lanzhou 730000, Gansu, PR China.
| | | | - Junzhou Huang
- Department of Computer Science and Engineering, the University of Texas at Arlington, Arlington, TX 76019, USA
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia 30107, Spain.
| |
Collapse
|
3
|
Li C, Yao J, Wei W, Niu Z, Zeng X, Li J, Wang J. Geometry-Based Molecular Generation With Deep Constrained Variational Autoencoder. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4852-4861. [PMID: 35171779 DOI: 10.1109/tnnls.2022.3147790] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Finding target molecules with specific chemical properties plays a decisive role in drug development. We proposed GEOM-CVAE, a constrained variational autoencoder based on geometric representation for molecular generation with specific properties, which is protein-context-dependent. In terms of machine learning, it includes continuous feature embedding encoder and molecular generation decoder. Our key contribution is to propose an efficient geometric embedding method, including the spatial structure representations of drug molecule (converting the 3-D coordinates into image) and the geometric graph representations of protein target (modeling the protein surface as a mesh). The 3-D geometric information is vital to successful molecular generation, which is different from previous molecular generative methods based on 1-D or 2-D. Our model framework generates specific molecules in two phases, by first generating special image with molecular 3-D information to learn latent representations and generating molecules with constrained condition based on geometric graph convolution for specific protein and then inputting the generated structural molecules into a parser network for obtaining Simplified Molecular Input Line Entry System (SMILES) strings. Our model achieves competitive performance that implies its potential effectiveness to enable the exploration of the vast chemical space for drug discovery.
Collapse
|
4
|
Wang K, Yin Z, Sang C, Xia W, Wang Y, Sun T, Xu X. Geometric deep learning for the prediction of magnesium-binding sites in RNA structures. Int J Biol Macromol 2024; 262:130150. [PMID: 38365157 DOI: 10.1016/j.ijbiomac.2024.130150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 02/18/2024]
Abstract
Magnesium ions (Mg2+) are essential for the folding, functional expression, and structural stability of RNA molecules. However, predicting Mg2+-binding sites in RNA molecules based solely on RNA structures is still challenging. The molecular surface, characterized by a continuous shape with geometric and chemical properties, is important for RNA modelling and carries essential information for understanding the interactions between RNAs and Mg2+ ions. Here, we propose an approach named RNA-magnesium ion surface interaction fingerprinting (RMSIF), a geometric deep learning-based conceptual framework to predict magnesium ion binding sites in RNA structures. To evaluate the performance of RMSIF, we systematically enumerated decoy Mg2+ ions across a full-space grid within the range of 2 to 10 Å from the RNA molecule and made predictions accordingly. Visualization techniques were used to validate the prediction results and calculate success rates. Comparative assessments against state-of-the-art methods like MetalionRNA, MgNet, and Metal3DRNA revealed that RMSIF achieved superior success rates and accuracy in predicting Mg2+-binding sites. Additionally, in terms of the spatial distribution of Mg2+ ions within the RNA structures, a majority were situated in the deep grooves, while a minority occupied the shallow grooves. Collectively, the conceptual framework developed in this study holds promise for advancing insights into drug design, RNA co-transcriptional folding, and structure prediction.
Collapse
Affiliation(s)
- Kang Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Zuode Yin
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Chunjiang Sang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Wentao Xia
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Yan Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Tingting Sun
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China.
| |
Collapse
|
5
|
Goulard Coderc de Lacam E, Roux B, Chipot C. Classifying Protein-Protein Binding Affinity with Free-Energy Calculations and Machine Learning Approaches. J Chem Inf Model 2024; 64:1081-1091. [PMID: 38272021 DOI: 10.1021/acs.jcim.3c01586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Understanding the intricate phenomenon of neuronal wiring in the brain is of great interest in neuroscience. In the fruit fly, Drosophila melanogaster, the Dpr-DIP interactome has been identified to play an important role in this process. However, experimental data suggest that a merely limited subset of complexes, essentially 57 out of a total of 231, exhibit strong binding affinity. In this work, we sought to identify the residue-level molecular basis underlying the difference in binding affinity using a state-of-the-art methodology consisting of standard binding free-energy calculations with a geometrical route and machine learning (ML) techniques. We determined the binding affinity for two complexes using statistical mechanics simulations, achieving an excellent reproduction of the experimental data. Moreover, we predicted the binding free energy for two additional low-affinity complexes, devoid of experimental estimation, while simultaneously identifying key residues for the binding. Furthermore, through the use of ML algorithms, linear discriminant analysis, and random forest, we achieved remarkable accuracy, as high as 0.99, in discerning between strong (cognate) and weak (noncognate) binders. The presented ML approach encompasses easily transferable input features, enabling its broad application to any interactome while facilitating the identification of pivotal residues critical for binding interactions. The predictive power of the generated model was probed on similar protein families from 13 diverse species. Our ML model exhibited commendable performance on these additional data sets, showcasing its reliability and robustness across the species barrier.
Collapse
Affiliation(s)
- Emma Goulard Coderc de Lacam
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche no. 7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy Cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, The University of Chicago, 929 E. 57th Street W225, Chicago, Illinois 60637, United States
- Department of Chemistry, The University of Chicago, 5735 S Ellis Avenue, Chicago, Illinois 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche no. 7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy Cedex, France
- Department of Biochemistry and Molecular Biology, The University of Chicago, 929 E. 57th Street W225, Chicago, Illinois 60637, United States
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61820, United States
- Department of Chemistry, The University of Hawai'i at Ma̅noa, 2545 McCarthy Mall, Honolulu, Hawaii 96822, United States
| |
Collapse
|
6
|
Yuan M, Shen A, Fu K, Guan J, Ma Y, Qiao Q, Wang M. ProteinMAE: masked autoencoder for protein surface self-supervised learning. Bioinformatics 2023; 39:btad724. [PMID: 38019955 PMCID: PMC10713117 DOI: 10.1093/bioinformatics/btad724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/27/2023] [Accepted: 11/28/2023] [Indexed: 12/01/2023] Open
Abstract
SUMMARY The biological functions of proteins are determined by the chemical and geometric properties of their surfaces. Recently, with the booming progress of deep learning, a series of learning-based surface descriptors have been proposed and achieved inspirational performance in many tasks such as protein design, protein-protein interaction prediction, etc. However, they are still limited by the problem of label scarcity, since the labels are typically obtained through wet experiments. Inspired by the great success of self-supervised learning in natural language processing and computer vision, we introduce ProteinMAE, a self-supervised framework specifically designed for protein surface representation to mitigate label scarcity. Specifically, we propose an efficient network and utilize a large number of accessible unlabeled protein data to pretrain it by self-supervised learning. Then we use the pretrained weights as initialization and fine-tune the network on downstream tasks. To demonstrate the effectiveness of our method, we conduct experiments on three different downstream tasks including binding site identification in protein surface, ligand-binding protein pocket classification, and protein-protein interaction prediction. The extensive experiments show that our method not only successfully improves the network's performance on all downstream tasks, but also achieves competitive performance with state-of-the-art methods. Moreover, our proposed network also exhibits significant advantages in terms of computational cost, which only requires less than a tenth of memory cost of previous methods. AVAILABILITY AND IMPLEMENTATION https://github.com/phdymz/ProteinMAE.
Collapse
Affiliation(s)
- Mingzhi Yuan
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Ao Shen
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Kexue Fu
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Jiaming Guan
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Yingfan Ma
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Qin Qiao
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Manning Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| |
Collapse
|
7
|
Li P, Liu ZP. GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning. Nucleic Acids Res 2023; 51:e60. [PMID: 37070217 PMCID: PMC10250245 DOI: 10.1093/nar/gkad288] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 04/19/2023] Open
Abstract
Unveiling the nucleic acid binding sites of a protein helps reveal its regulatory functions in vivo. Current methods encode protein sites from the handcrafted features of their local neighbors and recognize them via a classification, which are limited in expressive ability. Here, we present GeoBind, a geometric deep learning method for predicting nucleic binding sites on protein surface in a segmentation manner. GeoBind takes the whole point clouds of protein surface as input and learns the high-level representation based on the aggregation of their neighbors in local reference frames. Testing GeoBind on benchmark datasets, we demonstrate GeoBind is superior to state-of-the-art predictors. Specific case studies are performed to show the powerful ability of GeoBind to explore molecular surfaces when deciphering proteins with multimer formation. To show the versatility of GeoBind, we further extend GeoBind to five other types of ligand binding sites prediction tasks and achieve competitive performances.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
8
|
Gainza P, Wehrle S, Van Hall-Beauvais A, Marchand A, Scheck A, Harteveld Z, Buckley S, Ni D, Tan S, Sverrisson F, Goverde C, Turelli P, Raclot C, Teslenko A, Pacesa M, Rosset S, Georgeon S, Marsden J, Petruzzella A, Liu K, Xu Z, Chai Y, Han P, Gao GF, Oricchio E, Fierz B, Trono D, Stahlberg H, Bronstein M, Correia BE. De novo design of protein interactions with learned surface fingerprints. Nature 2023; 617:176-184. [PMID: 37100904 PMCID: PMC10131520 DOI: 10.1038/s41586-023-05993-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 03/21/2023] [Indexed: 04/28/2023]
Abstract
Physical interactions between proteins are essential for most biological processes governing life1. However, the molecular determinants of such interactions have been challenging to understand, even as genomic, proteomic and structural data increase. This knowledge gap has been a major obstacle for the comprehensive understanding of cellular protein-protein interaction networks and for the de novo design of protein binders that are crucial for synthetic biology and translational applications2-9. Here we use a geometric deep-learning framework operating on protein surfaces that generates fingerprints to describe geometric and chemical features that are critical to drive protein-protein interactions10. We hypothesized that these fingerprints capture the key aspects of molecular recognition that represent a new paradigm in the computational design of novel protein interactions. As a proof of principle, we computationally designed several de novo protein binders to engage four protein targets: SARS-CoV-2 spike, PD-1, PD-L1 and CTLA-4. Several designs were experimentally optimized, whereas others were generated purely in silico, reaching nanomolar affinity with structural and mutational characterization showing highly accurate predictions. Overall, our surface-centric approach captures the physical and chemical determinants of molecular recognition, enabling an approach for the de novo design of protein interactions and, more broadly, of artificial proteins with function.
Collapse
Affiliation(s)
- Pablo Gainza
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Monte Rosa Therapeutics, Basel, Switzerland
| | - Sarah Wehrle
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexandra Van Hall-Beauvais
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anthony Marchand
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Andreas Scheck
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zander Harteveld
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Stephen Buckley
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Dongchun Ni
- Laboratory of Biological Electron Microscopy, Institute of Physics, School of Basic Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| | - Shuguang Tan
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Freyr Sverrisson
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Casper Goverde
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Priscilla Turelli
- Laboratory of Virology and Genetics, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Charlène Raclot
- Laboratory of Virology and Genetics, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Alexandra Teslenko
- Laboratory of Biophysical Chemistry of Macromolecules, School of Basic Sciences, Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Stéphane Rosset
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jane Marsden
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Aaron Petruzzella
- Swiss Institute for Experimental Cancer Research, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kefang Liu
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Zepeng Xu
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yan Chai
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Pu Han
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - George F Gao
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Elisa Oricchio
- Swiss Institute for Experimental Cancer Research, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Beat Fierz
- Laboratory of Biophysical Chemistry of Macromolecules, School of Basic Sciences, Institute of Chemical Sciences and Engineering (ISIC), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Didier Trono
- Laboratory of Virology and Genetics, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Henning Stahlberg
- Laboratory of Biological Electron Microscopy, Institute of Physics, School of Basic Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| | | | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
9
|
Chen W, Liu X, Zhang S, Chen S. Artificial intelligence for drug discovery: Resources, methods, and applications. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:691-702. [PMID: 36923950 PMCID: PMC10009646 DOI: 10.1016/j.omtn.2023.02.019] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Conventional wet laboratory testing, validations, and synthetic procedures are costly and time-consuming for drug discovery. Advancements in artificial intelligence (AI) techniques have revolutionized their applications to drug discovery. Combined with accessible data resources, AI techniques are changing the landscape of drug discovery. In the past decades, a series of AI-based models have been developed for various steps of drug discovery. These models have been used as complements of conventional experiments and have accelerated the drug discovery process. In this review, we first introduced the widely used data resources in drug discovery, such as ChEMBL and DrugBank, followed by the molecular representation schemes that convert data into computer-readable formats. Meanwhile, we summarized the algorithms used to develop AI-based models for drug discovery. Subsequently, we discussed the applications of AI techniques in pharmaceutical analysis including predicting drug toxicity, drug bioactivity, and drug physicochemical property. Furthermore, we introduced the AI-based models for de novo drug design, drug-target structure prediction, drug-target interaction, and binding affinity prediction. Moreover, we also highlighted the advanced applications of AI in drug synergism/antagonism prediction and nanomedicine design. Finally, we discussed the challenges and future perspectives on the applications of AI to drug discovery.
Collapse
Affiliation(s)
- Wei Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Xuesong Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Sanyin Zhang
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Shilin Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
10
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
11
|
Chirasani VR, Wang J, Sha C, Raup-Konsavage W, Vrana K, Dokholyan NV. Whole proteome mapping of compound-protein interactions. CURRENT RESEARCH IN CHEMICAL BIOLOGY 2022; 2:100035. [PMID: 38125869 PMCID: PMC10732549 DOI: 10.1016/j.crchbi.2022.100035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Off-target binding is one of the primary causes of toxic side effects of drugs in clinical development, resulting in failures of clinical trials. While off-target drug binding is a known phenomenon, experimental identification of the undesired protein binders can be prohibitively expensive due to the large pool of possible biological targets. Here, we propose a new strategy combining chemical similarity principle and deep learning to enable proteome-wide mapping of compound-protein interactions. We have developed a pipeline to identify the targets of bioactive molecules by matching them with chemically similar annotated "bait" compounds and ranking them with deep learning. We have constructed a user-friendly web server for drug-target identification based on chemical similarity (DRIFT) to perform searches across annotated bioactive compound datasets, thus enabling high-throughput, multi-ligand target identification, as well as chemical fragmentation of target-binding moieties.
Collapse
Affiliation(s)
- Venkat R. Chirasani
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, 17033, USA
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jian Wang
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, 17033, USA
| | - Congzhou Sha
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, 17033, USA
| | | | - Kent Vrana
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, 17033, USA
| | - Nikolay V. Dokholyan
- Department of Pharmacology, Penn State College of Medicine, Hershey, PA, 17033, USA
- Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA, 17033, USA
- Department of Chemistry, Pennsylvania State University, University Park, PA, 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| |
Collapse
|
12
|
Gupta A, Mukherjee A. Capturing surface complementarity in proteins using unsupervised learning and robust curvature measure. Proteins 2022; 90:1669-1683. [DOI: 10.1002/prot.26345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/06/2022] [Accepted: 04/01/2022] [Indexed: 11/07/2022]
Affiliation(s)
- Abhijit Gupta
- Department of Chemistry Indian Institute of Science Education and Research Pune Maharashtra India
| | - Arnab Mukherjee
- Department of Chemistry Indian Institute of Science Education and Research Pune Maharashtra India
| |
Collapse
|
13
|
Protein–Protein Interaction Prediction for Targeted Protein Degradation. Int J Mol Sci 2022; 23:ijms23137033. [PMID: 35806036 PMCID: PMC9266413 DOI: 10.3390/ijms23137033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/17/2022] [Accepted: 06/18/2022] [Indexed: 02/04/2023] Open
Abstract
Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies on the formation of a stable ternary complex involving two proteins. However, experimental methods to detect PPI sites are both costly and time-intensive. In recent years, machine learning-based methods have been developed as screening tools. While they are computationally more efficient than traditional docking methods and thus allow rapid execution, these tools have so far primarily been based on sequence information, and they are therefore limited in their ability to address spatial requirements. In addition, they have to date not been applied to targeted protein degradation. Here, we present a new deep learning architecture based on the concept of graph representation learning that can predict interaction sites and interactions of proteins based on their surface representations. We demonstrate that our model reaches state-of-the-art performance using AUROC scores on the established MaSIF dataset. We furthermore introduce a new dataset with more diverse protein interactions and show that our model generalizes well to this new data. These generalization capabilities allow our model to predict the PPIs relevant for targeted protein degradation, which we show by demonstrating the high accuracy of our model for PPI prediction on the available ternary complex data. Our results suggest that PPI prediction models can be a valuable tool for screening protein pairs while developing new drugs for targeted protein degradation.
Collapse
|
14
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
15
|
Scheck A, Rosset S, Defferrard M, Loukas A, Bonet J, Vandergheynst P, Correia BE. RosettaSurf-A surface-centric computational design approach. PLoS Comput Biol 2022; 18:e1009178. [PMID: 35294435 PMCID: PMC9015148 DOI: 10.1371/journal.pcbi.1009178] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 04/18/2022] [Accepted: 02/21/2022] [Indexed: 11/19/2022] Open
Abstract
Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.
Collapse
Affiliation(s)
- Andreas Scheck
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Stéphane Rosset
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Michaël Defferrard
- Signal Processing Laboratory (LTS2), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Andreas Loukas
- Signal Processing Laboratory (LTS2), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Jaume Bonet
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Pierre Vandergheynst
- Signal Processing Laboratory (LTS2), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Bruno E. Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
16
|
Li S, Cai C, Gong J, Liu X, Li H. A fast protein binding site comparison algorithm for proteome-wide protein function prediction and drug repurposing. Proteins 2021; 89:1541-1556. [PMID: 34245187 DOI: 10.1002/prot.26176] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 06/26/2021] [Accepted: 06/30/2021] [Indexed: 01/18/2023]
Abstract
The expansion of three-dimensional protein structures and enhanced computing power have significantly facilitated our understanding of protein sequence/structure/function relationships. A challenge in structural genomics is to predict the function of uncharacterized proteins. Protein function deconvolution based on global sequence or structural homology is impracticable when a protein relates to no other proteins with known function, and in such cases, functional relationships can be established by detecting their local ligand binding site similarity. Here, we introduce a sequence order-independent comparison algorithm, PocketShape, for structural proteome-wide exploration of protein functional site by fully considering the geometry of the backbones, orientation of the sidechains, and physiochemical properties of the pocket-lining residues. PocketShape is efficient in distinguishing similar from dissimilar ligand binding site pairs by retrieving 99.3% of the similar pairs while rejecting 100% of the dissimilar pairs on a dataset containing 1538 binding site pairs. This method successfully classifies 83 enzyme structures with diverse functions into 12 clusters, which is highly in accordance with the actual structural classification of proteins classification. PocketShape also achieves superior performances than other methods in protein profiling based on experimental data. Potential new applications for representative SARS-CoV-2 drugs Remdesivir and 11a are predicted. The high accuracy and time-efficient characteristics of PocketShape will undoubtedly make it a promising complementary tool for proteome-wide protein function inference and drug repurposing study.
Collapse
Affiliation(s)
- Shiliang Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Chaoqian Cai
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Jiayu Gong
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Xiaofeng Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China.,Research and Development Department, Jiangzhong Pharmaceutical Co., Ltd., Nanchang, China
| |
Collapse
|
17
|
Liu Q, Wang PS, Zhu C, Gaines BB, Zhu T, Bi J, Song M. OctSurf: Efficient hierarchical voxel-based molecular surface representation for protein-ligand affinity prediction. J Mol Graph Model 2021; 105:107865. [PMID: 33640787 DOI: 10.1016/j.jmgm.2021.107865] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 02/03/2021] [Accepted: 02/04/2021] [Indexed: 10/22/2022]
Abstract
Voxel-based 3D convolutional neural networks (CNNs) have been applied to predict protein-ligand binding affinity. However, the memory usage and computation cost of these voxel-based approaches increase cubically with respect to spatial resolution and sometimes make volumetric CNNs intractable at higher resolutions. Therefore, it is necessary to develop memory-efficient alternatives that can accelerate the convolutional operation on 3D volumetric representations of the protein-ligand interaction. In this study, we implement a novel volumetric representation, OctSurf, to characterize the 3D molecular surface of protein binding pockets and bound ligands. The OctSurf surface representation is built based on the octree data structure, which has been widely used in computer graphics to efficiently represent and store 3D object data. Vanilla 3D-CNN approaches often divide the 3D space of objects into equal-sized voxels. In contrast, OctSurf recursively partitions the 3D space containing the protein-ligand pocket into eight subspaces called octants. Only those octants containing van der Waals surface points of protein or ligand atoms undergo the recursive subdivision process until they reach the predefined octree depth, whereas unoccupied octants are kept intact to reduce the memory cost. Resulting non-empty leaf octants approximate molecular surfaces of the protein pocket and bound ligands. These surface octants, along with their chemical and geometric features, are used as the input to 3D-CNNs. Two kinds of CNN architectures, VGG and ResNet, are applied to the OctSurf representation to predict binding affinity. The OctSurf representation consumes much less memory than the conventional voxel representation at the same resolution. By restricting the convolution operation to only octants of the smallest size, our method also alleviates the overall computational overhead of CNN. A series of experiments are performed to demonstrate the disk storage and computational efficiency of the proposed learning method. Our code is available at the following GitHub repository: https://github.uconn.edu/mldrugdiscovery/OctSurf.
Collapse
Affiliation(s)
- Qinqing Liu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06279, USA
| | | | - Chunjiang Zhu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06279, USA
| | - Blake Blumenfeld Gaines
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06279, USA
| | - Tan Zhu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06279, USA
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06279, USA; Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06279, USA
| | - Minghu Song
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06279, USA.
| |
Collapse
|
18
|
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods 2019; 17:184-192. [DOI: 10.1038/s41592-019-0666-6] [Citation(s) in RCA: 172] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 10/28/2019] [Indexed: 02/05/2023]
|
19
|
Sato A, Tanimura N, Honma T, Konagaya A. Significance of Data Selection in Deep Learning for Reliable Binding Mode Prediction of Ligands in the Active Site of CYP3A4. Chem Pharm Bull (Tokyo) 2019; 67:1183-1190. [PMID: 31423003 DOI: 10.1248/cpb.c19-00443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
For rational drug design, it is essential to predict the binding mode of protein-ligand complexes. Although various machine learning-based models have been reported that use convolutional neural networks (deep learning) to predict binding modes from three-dimensional structures, there are few detailed reports on how best to construct and use datasets. Here, we examined how different datasets affected the prediction of the binding mode of CYP3A4 by a three-dimensional neural network when the number of crystal structures for the target protein was limited. We used four different training datasets: one large, general dataset containing various protein complexes and three smaller, more specific datasets containing complexes with CYP3A4-like pockets, complexes with CYP3A4-binding ligands, and complexes with CYP protein family members. We then trained models with different combinations of datasets with or without subsequent fine-tuning and evaluated the binding mode prediction performance of each model. The best receiver operating characteristic (ROC) area under the curve (AUC) model with respect to area under the receiver operating characteristic curve was obtained by training with a combination of the general protein and CYP family datasets. However, the ROC AUC-recall balanced model was obtained by training with this combination of datasets followed by fine-tuning with the CYP3A4-binding ligands dataset. Our results suggest that datasets that balance protein functionality and data size are important for optimizing binding mode prediction performance. In addition, datasets with large median binding pocket sizes may be important for the binding mode prediction specifically of CYP3A4.
Collapse
Affiliation(s)
- Atsuko Sato
- School of Computing, Department of Computer Science, Tokyo Institute of Technology
| | - Naoki Tanimura
- Science Solutions Division, Mizuho Information & Research Institute, Inc
| | - Teruki Honma
- School of Computing, Department of Computer Science, Tokyo Institute of Technology.,Center for Biosystems Dynamics Research, RIKEN.,Medical Sciences Innovation Hub Program, RIKEN
| | - Akihiko Konagaya
- School of Computing, Department of Computer Science, Tokyo Institute of Technology
| |
Collapse
|
20
|
Wang C, Aleksandrov AA, Yang Z, Forouhar F, Proctor EA, Kota P, An J, Kaplan A, Khazanov N, Boël G, Stockwell BR, Senderowitz H, Dokholyan NV, Riordan JR, Brouillette CG, Hunt JF. Ligand binding to a remote site thermodynamically corrects the F508del mutation in the human cystic fibrosis transmembrane conductance regulator. J Biol Chem 2018; 293:17685-17704. [PMID: 29903914 PMCID: PMC6240863 DOI: 10.1074/jbc.ra117.000819] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 05/31/2018] [Indexed: 01/07/2023] Open
Abstract
Many disease-causing mutations impair protein stability. Here, we explore a thermodynamic strategy to correct the disease-causing F508del mutation in the human cystic fibrosis transmembrane conductance regulator (hCFTR). F508del destabilizes nucleotide-binding domain 1 (hNBD1) in hCFTR relative to an aggregation-prone intermediate. We developed a fluorescence self-quenching assay for compounds that prevent aggregation of hNBD1 by stabilizing its native conformation. Unexpectedly, we found that dTTP and nucleotide analogs with exocyclic methyl groups bind to hNBD1 more strongly than ATP and preserve electrophysiological function of full-length F508del-hCFTR channels at temperatures up to 37 °C. Furthermore, nucleotides that increase open-channel probability, which reflects stabilization of an interdomain interface to hNBD1, thermally protect full-length F508del-hCFTR even when they do not stabilize isolated hNBD1. Therefore, stabilization of hNBD1 itself or of one of its interdomain interfaces by a small molecule indirectly offsets the destabilizing effect of the F508del mutation on full-length hCFTR. These results indicate that high-affinity binding of a small molecule to a remote site can correct a disease-causing mutation. We propose that the strategies described here should be applicable to identifying small molecules to help manage other human diseases caused by mutations that destabilize native protein conformation.
Collapse
Affiliation(s)
- Chi Wang
- From the Departments of Biological Sciences and
| | - Andrei A. Aleksandrov
- the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Zhengrong Yang
- the Department of Chemistry, University of Alabama, Birmingham, Alabama 35294, and
| | | | - Elizabeth A. Proctor
- the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Pradeep Kota
- the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Jianli An
- the Department of Chemistry, University of Alabama, Birmingham, Alabama 35294, and
| | - Anna Kaplan
- From the Departments of Biological Sciences and
| | - Netaly Khazanov
- the Department of Chemistry, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | | | - Brent R. Stockwell
- From the Departments of Biological Sciences and ,Chemistry, Columbia University, New York, New York 10027
| | - Hanoch Senderowitz
- the Department of Chemistry, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Nikolay V. Dokholyan
- the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - John R. Riordan
- the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | | | - John F. Hunt
- From the Departments of Biological Sciences and , To whom correspondence should be addressed. Tel.:
212-854-5443; Fax:
212-865-8246; E-mail:
| |
Collapse
|
21
|
Budowski-Tal I, Kolodny R, Mandel-Gutfreund Y. A Novel Geometry-Based Approach to Infer Protein Interface Similarity. Sci Rep 2018; 8:8192. [PMID: 29844500 PMCID: PMC5974305 DOI: 10.1038/s41598-018-26497-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 05/10/2018] [Indexed: 11/21/2022] Open
Abstract
The protein interface is key to understand protein function, providing a vital insight on how proteins interact with each other and with other molecules. Over the years, many computational methods to compare protein structures were developed, yet evaluating interface similarity remains a very difficult task. Here, we present PatchBag – a geometry based method for efficient comparison of protein surfaces and interfaces. PatchBag is a Bag-Of-Words approach, which represents complex objects as vectors, enabling to search interface similarity in a highly efficient manner. Using a novel framework for evaluating interface similarity, we show that PatchBag performance is comparable to state-of-the-art alignment-based structural comparison methods. The great advantage of PatchBag is that it does not rely on sequence or fold information, thus enabling to detect similarities between interfaces in unrelated proteins. We propose that PatchBag can contribute to reveal novel evolutionary and functional relationships between protein interfaces.
Collapse
Affiliation(s)
- Inbal Budowski-Tal
- Faculty of Biology, Technion, Israel Institute of Technology, Haifa, 3200003, Israel.,Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel.
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion, Israel Institute of Technology, Haifa, 3200003, Israel.
| |
Collapse
|
22
|
Axenopoulos A, Rafailidis D, Papadopoulos G, Houstis EN, Daras P. Similarity Search of Flexible 3D Molecules Combining Local and Global Shape Descriptors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:954-970. [PMID: 26561479 DOI: 10.1109/tcbb.2015.2498553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a framework for shape-based similarity search of 3D molecular structures is presented. The proposed framework exploits simultaneously the discriminative capabilities of a global, a local, and a hybrid local-global shape feature to produce a geometric descriptor that achieves higher retrieval accuracy than each feature does separately. Global and hybrid features are extracted using pairwise computations of diffusion distances between the points of the molecular surface, while the local feature is based on accumulating pairwise relations among oriented surface points into local histograms. The local features are integrated into a global descriptor vector using the bag-of-features approach. Due to the intrinsic property of its constituting shape features to be invariant to articulations of the 3D objects, the framework is appropriate for similarity search of flexible 3D molecules, while at the same time it is also accurate in retrieving rigid 3D molecules. The proposed framework is evaluated in flexible and rigid shape matching of 3D protein structures as well as in shape-based virtual screening of large ligand databases with quite promising results.
Collapse
|
23
|
Qiu T, Xiao H, Zhang Q, Qiu J, Yang Y, Wu D, Cao Z, Zhu R. Proteochemometric modeling of the antigen-antibody interaction: new fingerprints for antigen, antibody and epitope-paratope interaction. PLoS One 2015; 10:e0122416. [PMID: 25901362 PMCID: PMC4406442 DOI: 10.1371/journal.pone.0122416] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 02/20/2015] [Indexed: 01/12/2023] Open
Abstract
Despite the high specificity between antigen and antibody binding, similar epitopes can be recognized or cross-neutralized by paratopes of antibody with different binding affinities. How to accurately characterize this slight variation which may or may not change the antigen-antibody binding affinity is a key issue in this area. In this report, by combining cylinder model with shell structure model, a new fingerprint was introduced to describe both the structural and physical-chemical features of the antigen and antibody protein. Furthermore, beside the description of individual protein, the specific epitope-paratope interaction fingerprint (EPIF) was developed to reflect the bond and the environment of the antigen-antibody interface. Finally, Proteochemometric Modeling of the antigen-antibody interaction was established and evaluated on 429 antigen-antibody complexes. By using only protein descriptors, our model achieved the best performance ( R2=0.91,Qtest2=0.68) among peers. Further, together with EPIF as a new cross-term, our model ( R2=0.92,Qtest2=0.74) can significantly outperform peers with multiplication of ligand and protein descriptors as a cross-term ( R2≤0.81,Qtest2≤0.44). Results illustrated that: 1) our newly designed protein fingerprints and EPIF can better describe the antigen-antibody interaction; 2) EPIF is a better and specific cross-term in Proteochemometric Modeling for antigen-antibody interaction. The fingerprints designed in this study will provide assistance to the description of antigen-antibody binding, and in future, it may be valuable help for the high-throughput antibody screening. The algorithm is freely available on request.
Collapse
Affiliation(s)
- Tianyi Qiu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Han Xiao
- Department of Computer Science, University of Helsinki, Helsinki, FI-00014, Finland
| | - Qingchen Zhang
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jingxuan Qiu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yiyan Yang
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Dingfeng Wu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zhiwei Cao
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Center for Bioinformation Technology, Shanghai 201203, China
- * E-mail: (RZ); (ZC)
| | - Ruixin Zhu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- School of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian 116600, Liaoning, China
- * E-mail: (RZ); (ZC)
| |
Collapse
|
24
|
Pang B, Schlessman D, Kuang X, Zhao N, Shyu D, Korkin D, Shyu CR. An Integrated Approach to Sequence-Independent Local Alignment of Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:298-308. [PMID: 26357218 DOI: 10.1109/tcbb.2014.2355208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurate alignment of protein-protein binding sites can aid in protein docking studies and constructing templates for predicting structure of protein complexes, along with in-depth understanding of evolutionary and functional relationships. However, over the past three decades, structural alignment algorithms have focused predominantly on global alignments with little effort on the alignment of local interfaces. In this paper, we introduce the PBSalign (Protein-protein Binding Site alignment) method, which integrates techniques in graph theory, 3D localized shape analysis, geometric scoring, and utilization of physicochemical and geometrical properties. Computational results demonstrate that PBSalign is capable of identifying similar homologous and analogous binding sites accurately and performing alignments with better geometric match measures than existing protein-protein interface comparison tools. The proportion of better alignment quality generated by PBSalign is 46, 56, and 70 percent more than iAlign as judged by the average match index (MI), similarity index (SI), and structural alignment score (SAS), respectively. PBSalign provides the life science community an efficient and accurate solution to binding-site alignment while striking the balance between topological details and computational complexity.
Collapse
|
25
|
Ito JI, Ikeda K, Yamada K, Mizuguchi K, Tomii K. PoSSuM v.2.0: data update and a new function for investigating ligand analogs and target proteins of small-molecule drugs. Nucleic Acids Res 2014; 43:D392-8. [PMID: 25404129 PMCID: PMC4383952 DOI: 10.1093/nar/gku1144] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
PoSSuM (http://possum.cbrc.jp/PoSSuM/) is a database for detecting similar small-molecule binding sites on proteins. Since its initial release in 2011, PoSSuM has grown to provide information related to 49 million pairs of similar binding sites discovered among 5.5 million known and putative binding sites. This enlargement of the database is expected to enhance opportunities for biological and pharmaceutical applications, such as predictions of new functions and drug discovery. In this release, we have provided a new service named PoSSuM drug search (PoSSuMds) at http://possum.cbrc.jp/PoSSuM/drug_search/, in which we selected 194 approved drug compounds retrieved from ChEMBL, and detected their known binding pockets and pockets that are similar to them. Users can access and download all of the search results via a new web interface, which is useful for finding ligand analogs as well as potential target proteins. Furthermore, PoSSuMds enables users to explore the binding pocket universe within PoSSuM. Additionally, we have improved the web interface with new functions, including sortable tables and a viewer for visualizing and downloading superimposed pockets.
Collapse
Affiliation(s)
- Jun-ichi Ito
- Laboratory of Bioinformatics, National Institute of Biomedical Innovation (NIBIO), 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Kazuyoshi Ikeda
- Laboratory of Bioinformatics, National Institute of Biomedical Innovation (NIBIO), 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan Drug Discovery Informatics Group, System Solution Division, Level Five Co. Ltd., Shiodome Shibarikyu Bldg., 1-2-3 Kaigan, Minato-ku, Tokyo 105-0022, Japan
| | - Kazunori Yamada
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institute of Biomedical Innovation (NIBIO), 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan
| | - Kentaro Tomii
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
26
|
Koromyslova AD, Chugunov AO, Efremov RG. Deciphering fine molecular details of proteins' structure and function with a Protein Surface Topography (PST) method. J Chem Inf Model 2014; 54:1189-99. [PMID: 24689707 DOI: 10.1021/ci500158y] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Molecular surfaces are the key players in biomolecular recognition and interactions. Nowadays, it is trivial to visualize a molecular surface and surface-distributed properties in three-dimensional space. However, such a representation trends to be biased and ambiguous in case of thorough analysis. We present a new method to create 2D spherical projection maps of entire protein surfaces and manipulate with them--protein surface topography (PST). It permits visualization and thoughtful analysis of surface properties. PST helps to easily portray conformational transitions, analyze proteins' properties and their dynamic behavior, improve docking performance, and reveal common patterns and dissimilarities in molecular surfaces of related bioactive peptides. This paper describes basic usage of PST with an example of small G-proteins conformational transitions, mapping of caspase-1 intersubunit interface, and intrinsic "complementarity" in the conotoxin-acetylcholine binding protein complex. We suggest that PST is a beneficial approach for structure-function studies of bioactive peptides and small proteins.
Collapse
Affiliation(s)
- Anna D Koromyslova
- M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences , 117997, Moscow, Russia
| | | | | |
Collapse
|
27
|
Wang HW, Chu CH, Wang WC, Pai TW. A local average distance descriptor for flexible protein structure comparison. BMC Bioinformatics 2014; 15:95. [PMID: 24694083 PMCID: PMC3992163 DOI: 10.1186/1471-2105-15-95] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 03/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structures are flexible and often show conformational changes upon binding to other molecules to exert biological functions. As protein structures correlate with characteristic functions, structure comparison allows classification and prediction of proteins of undefined functions. However, most comparison methods treat proteins as rigid bodies and cannot retrieve similarities of proteins with large conformational changes effectively. RESULTS In this paper, we propose a novel descriptor, local average distance (LAD), based on either the geodesic distances (GDs) or Euclidean distances (EDs) for pairwise flexible protein structure comparison. The proposed method was compared with 7 structural alignment methods and 7 shape descriptors on two datasets comprising hinge bending motions from the MolMovDB, and the results have shown that our method outperformed all other methods regarding retrieving similar structures in terms of precision-recall curve, retrieval success rate, R-precision, mean average precision and F1-measure. CONCLUSIONS Both ED- and GD-based LAD descriptors are effective to search deformed structures and overcome the problems of self-connection caused by a large bending motion. We have also demonstrated that the ED-based LAD is more robust than the GD-based descriptor. The proposed algorithm provides an alternative approach for blasting structure database, discovering previously unknown conformational relationships, and reorganizing protein structure classification.
Collapse
Affiliation(s)
| | | | | | - Tun-Wen Pai
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan.
| |
Collapse
|
28
|
Jalencas X, Mestres J. Identification of Similar Binding Sites to Detect Distant Polypharmacology. Mol Inform 2013; 32:976-90. [PMID: 27481143 DOI: 10.1002/minf.201300082] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 07/29/2013] [Indexed: 01/19/2023]
Abstract
The ability of small molecules to interact with multiple proteins is referred to as polypharmacology. This property is often linked to the therapeutic action of drugs but it is known also to be responsible for many of their side effects. Because of its importance, the development of computational methods that can predict drug polypharmacology has become an important line of research that led recently to the identification of many novel targets for known drugs. Nowadays, the majority of these methods are based on measuring the similarity of a query molecule against the hundreds of thousands of molecules for which pharmacological data on thousands of proteins are available in public sources. However, similarity-based methods are inherently biased by the chemical coverage offered by the active molecules present in those public repositories, which limits significantly their capacity to predict interactions with proteins structurally and functionally unrelated to any of the already known targets for drugs. It is in this respect that structure-based methods aiming at identifying similar binding sites may offer an alternative complementary means to ligand-based methods for detecting distant polypharmacology. The different existing approaches to binding site detection, representation, comparison, and fragmentation are reviewed and recent successful applications presented.
Collapse
Affiliation(s)
- Xavier Jalencas
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550
| | - Jordi Mestres
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550.
| |
Collapse
|
29
|
von Behren MM, Volkamer A, Henzler AM, Schomburg KT, Urbaczek S, Rarey M. Fast protein binding site comparison via an index-based screening technology. J Chem Inf Model 2013; 53:411-22. [PMID: 23390978 DOI: 10.1021/ci300469h] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We present TrixP, a new index-based method for fast protein binding site comparison and function prediction. TrixP determines binding site similarities based on the comparison of descriptors that encode pharmacophoric and spatial features. Therefore, it adopts the efficient core components of TrixX, a structure-based virtual screening technology for large compound libraries. TrixP expands this technology by new components in order to allow a screening of protein libraries. TrixP accounts for the inherent flexibility of proteins employing a partial shape matching routine. After the identification of structures with matching pharmacophoric features and geometric shape, TrixP superimposes the binding sites and, finally, assesses their similarity according to the fit of pharmacophoric properties. TrixP is able to find analogies between closely and distantly related binding sites. Recovery rates of 81.8% for similar binding site pairs, assisted by rejecting rates of 99.5% for dissimilar pairs on a test data set containing 1331 pairs, confirm this ability. TrixP exclusively identifies members of the same protein family on top ranking positions out of a library consisting of 9802 binding sites. Furthermore, 30 predicted kinase binding sites can almost perfectly be classified into their known subfamilies.
Collapse
Affiliation(s)
- Mathias M von Behren
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | | | |
Collapse
|
30
|
Kastritis PL, Bonvin AMJJ. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J R Soc Interface 2012; 10:20120835. [PMID: 23235262 PMCID: PMC3565702 DOI: 10.1098/rsif.2012.0835] [Citation(s) in RCA: 276] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Interactions between proteins are orchestrated in a precise and time-dependent manner, underlying cellular function. The binding affinity, defined as the strength of these interactions, is translated into physico-chemical terms in the dissociation constant (Kd), the latter being an experimental measure that determines whether an interaction will be formed in solution or not. Predicting binding affinity from structural models has been a matter of active research for more than 40 years because of its fundamental role in drug development. However, all available approaches are incapable of predicting the binding affinity of protein–protein complexes from coordinates alone. Here, we examine both theoretical and experimental limitations that complicate the derivation of structure–affinity relationships. Most work so far has concentrated on binary interactions. Systems of increased complexity are far from being understood. The main physico-chemical measure that relates to binding affinity is the buried surface area, but it does not hold for flexible complexes. For the latter, there must be a significant entropic contribution that will have to be approximated in the future. We foresee that any theoretical modelling of these interactions will have to follow an integrative approach considering the biology, chemistry and physics that underlie protein–protein recognition.
Collapse
Affiliation(s)
- Panagiotis L Kastritis
- Bijvoet Center for Biomolecular Research, Faculty of Science, Chemistry, Utrecht University, , Padualaan 8, Utrecht, The Netherlands
| | | |
Collapse
|
31
|
Abstract
The identification and application of druggable pockets of targets play a key role in in silico drug design, which is a fundamental step in structure-based drug design. Herein, some recent progresses and developments of the computational analysis of pockets have been covered. Also, the pockets at the protein-protein interfaces (PPI) have been considered to further explore the pocket space for drug discovery. We have presented two case studies targeting the kinetic pockets generated by normal mode analysis and molecular dynamics method, respectively, in which we focus upon incorporating the pocket flexibility into the two-dimensional virtual screening with both affinity and specificity. We applied the specificity and affinity (SPA) score to quantitatively estimate affinity and evaluate specificity using the intrinsic specificity ratio (ISR) as a quantitative criterion. In one of two cases, we also included some applications of pockets located at the dimer interfaces to emphasize the role of PPI in drug discovery. This review will attempt to summarize the current status of this pocket issue and will present some prospective avenues of further inquiry.
Collapse
Affiliation(s)
- Xiliang Zheng
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, 5625 Renmin Street, Changchun, Jilin, 130022, People's Republic of China
| | | | | | | |
Collapse
|
32
|
Desaphy J, Azdimousa K, Kellenberger E, Rognan D. Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J Chem Inf Model 2012; 52:2287-99. [PMID: 22834646 DOI: 10.1021/ci300184x] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Estimating the pairwise similarity of protein-ligand binding sites is a fast and efficient way of predicting cross-reactivity and putative side effects of drug candidates. Among the many tools available, three-dimensional (3D) alignment-dependent methods are usually slow and based on simplified representations of binding site atoms or surfaces. On the other hand, fast and efficient alignment-free methods have recently been described but suffer from a lack of interpretability. We herewith present a novel binding site description (VolSite), coupled to an alignment and comparison tool (Shaper) combining the speed of alignment-free methods with the interpretability of alignment-dependent approaches. It is based on the comparison of negative images of binding cavities encoding both shape and pharmacophoric properties at regularly spaced grid points. Shaper approximates the resulting molecular shape with a smooth Gaussian function and aligns protein binding sites by optimizing their volume overlap. Volsite and Shaper were successfully applied to compare protein-ligand binding sites and to predict their structural druggability.
Collapse
Affiliation(s)
- Jérémy Desaphy
- Laboratory of Therapeutic Innovation, UMR 7200 Université de Strasbourg/CNRS, Medalis Drug Discovery Center, F-67400 Illkirch, France
| | | | | | | |
Collapse
|
33
|
Pang B, Kuang X, Zhao N, Korkin D, Shyu CR. PBSword: a web server for searching similar protein-protein binding sites. Nucleic Acids Res 2012; 40:W428-34. [PMID: 22689645 PMCID: PMC3394332 DOI: 10.1093/nar/gks527] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
PBSword is a web server designed for efficient and accurate comparisons and searches of geometrically similar protein–protein binding sites from a large-scale database. The basic idea of PBSword is that each protein binding site is first represented by a high-dimensional vector of ‘visual words’, which characterizes both the global and local shape features of the binding site. It then uses a scalable indexing technique to search for those binding sites whose visual words representations are similar to that of the query binding site. Our system is able to return ranked results of binding sites in short time from a database of 194 322 domain–domain binding sites. PBSword supports query by protein ID and by new structures uploaded by users. PBSword is a useful tool to investigate functional connections among proteins based on the local structures of binding site and has potential applications to protein–protein docking and drug discovery. The system is hosted at http://pbs.rnet.missouri.edu.
Collapse
Affiliation(s)
- Bin Pang
- Informatics Institute and Department of Computer Science, University of Missouri, Columbia, MO, USA
| | | | | | | | | |
Collapse
|
34
|
Pang B, Zhao N, Korkin D, Shyu CR. Fast protein binding site comparisons using visual words representation. ACTA ACUST UNITED AC 2012; 28:1345-52. [PMID: 22492639 DOI: 10.1093/bioinformatics/bts138] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Finding geometrically similar protein binding sites is crucial for understanding protein functions and can provide valuable information for protein-protein docking and drug discovery. As the number of known protein-protein interaction structures has dramatically increased, a high-throughput and accurate protein binding site comparison method is essential. Traditional alignment-based methods can provide accurate correspondence between the binding sites but are computationally expensive. RESULTS In this article, we present a novel method for the comparisons of protein binding sites using a 'visual words' representation (PBSword). We first extract geometric features of binding site surfaces and build a vocabulary of visual words by clustering a large set of feature descriptors. We then describe a binding site surface with a high-dimensional vector that encodes the frequency of visual words, enhanced by the spatial relationships among them. Finally, we measure the similarity of binding sites by utilizing metric space operations, which provide speedy comparisons between protein binding sites. Our experimental results show that PBSword achieves a comparable classification accuracy to an alignment-based method and improves accuracy of a feature-based method by 36% on a non-redundant dataset. PBSword also exhibits a significant efficiency improvement over an alignment-based method.
Collapse
Affiliation(s)
- Bin Pang
- Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
35
|
Vlachakis D, Tsiliki G, Tsagkrasoulis D, Carvalho CS, Megalooikonomou V, Kossida S. Speeding up the drug discovery process: structural similarity searches using molecular surfaces. ACTA ACUST UNITED AC 2012; 18:6-9. [PMID: 31440460 DOI: 10.14806/ej.18.1.501] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Dimitrios Vlachakis
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| | - Georgia Tsiliki
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| | - Dimosthenis Tsagkrasoulis
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| | - Carla Sofia Carvalho
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| | - Vasileios Megalooikonomou
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| | - Sofia Kossida
- Bioinformatics & Medical Informatics Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens
| |
Collapse
|
36
|
Protein surface characterization using an invariant descriptor. Int J Biomed Imaging 2011; 2011:918978. [PMID: 22144981 PMCID: PMC3227456 DOI: 10.1155/2011/918978] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 08/14/2011] [Indexed: 11/17/2022] Open
Abstract
Aim. To develop a new invariant descriptor for the characterization of protein surfaces, suitable for various analysis tasks, such as protein functional classification, and search and retrieval of protein surfaces over a large database. Methods. We start with a local descriptor of selected circular patches on the protein surface. The descriptor records the distance distribution between the central residue and the residues within the patch, keeping track of the number of particular pairwise residue cooccurrences in the patch. A global descriptor for the entire protein surface is then constructed by combining information from the local descriptors. Our method is novel in its focus on residue-specific distance distributions, and the use of residue-distance co-occurrences as the basis for the proposed protein surface descriptors. Results. Results are presented for protein classification and for retrieval for three protein families. For the three families, we obtained an area under the curve for precision and recall ranging from 0.6494 (without residue co-occurrences) to 0.6683 (with residue co-occurrences). Large-scale screening using two other protein families placed related family members at the top of the rank, with a number of uncharacterized proteins also retrieved. Comparative results with other proposed methods are included.
Collapse
|
37
|
Ito JI, Tabei Y, Shimizu K, Tsuda K, Tomii K. PoSSuM: a database of similar protein-ligand binding and putative pockets. Nucleic Acids Res 2011; 40:D541-8. [PMID: 22135290 PMCID: PMC3245044 DOI: 10.1093/nar/gkr1130] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Existing databases of ligand-binding sites offer databases of limited scale. For example, SitesBase covers only ~33,000 known binding sites. Inferring protein function and drug discovery purposes, however, demands a much more comprehensive database including known and putative-binding sites. Using a novel algorithm, we conducted a large-scale all-pairs similarity search for 1.8 million known and potential binding sites in the PDB, and discovered over 14 million similar pairs of binding sites. Here, we present the results as a relational database Pocket Similarity Search using Multiple-sketches (PoSSuM) including all the discovered pairs with annotations of various types. PoSSuM enables rapid exploration of similar binding sites among structures with different global folds as well as similar ones. Moreover, PoSSuM is useful for predicting the binding ligand for unbound structures, which provides important clues for characterizing protein structures with unclear functions. The PoSSuM database is freely available at http://possum.cbrc.jp/PoSSuM/.
Collapse
Affiliation(s)
- Jun-Ichi Ito
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan
| | | | | | | | | |
Collapse
|
38
|
Ito JI, Tabei Y, Shimizu K, Tomii K, Tsuda K. PDB-scale analysis of known and putative ligand-binding sites with structural sketches. Proteins 2011; 80:747-63. [PMID: 22113700 DOI: 10.1002/prot.23232] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 10/13/2011] [Accepted: 10/18/2011] [Indexed: 11/06/2022]
Abstract
Computational investigation of protein functions is one of the most urgent and demanding tasks in the field of structural bioinformatics. Exhaustive pairwise comparison of known and putative ligand-binding sites, across protein families and folds, is essential in elucidating the biological functions and evolutionary relationships of proteins. Given the vast amounts of data available now, existing 3D structural comparison methods are not adequate due to their computation time complexity. In this article, we propose a new bit string representation of binding sites called structural sketches, which is obtained by random projections of triplet descriptors. It allows us to use ultra-fast all-pair similarity search methods for strings with strictly controlled error rates. Exhaustive comparison of 1.2 million known and putative binding sites finished in ∼30 h on a single core to yield 88 million similar binding site pairs. Careful investigation of 3.5 million pairs verified by TM-align revealed several notable analogous sites across distinct protein families or folds. In particular, we succeeded in finding highly plausible functions of several pockets via strong structural analogies. These results indicate that our method is a promising tool for functional annotation of binding sites derived from structural genomics projects.
Collapse
Affiliation(s)
- Jun-Ichi Ito
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | | | | | | | | |
Collapse
|
39
|
Merelli I, Cozzi P, D'Agostino D, Clematis A, Milanesi L. Image-based surface matching algorithm oriented to structural biology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1004-1016. [PMID: 21566253 DOI: 10.1109/tcbb.2010.21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Emerging technologies for structure matching based on surface descriptions have demonstrated their effectiveness in many research fields. In particular, they can be successfully applied to in silico studies of structural biology. Protein activities, in fact, are related to the external characteristics of these macromolecules and the ability to match surfaces can be important to infer information about their possible functions and interactions. In this work, we present a surface-matching algorithm, based on encoding the outer morphology of proteins in images of local description, which allows us to establish point-to-point correlations among macromolecular surfaces using image-processing functions. Discarding methods relying on biological analysis of atomic structures and expensive computational approaches based on energetic studies, this algorithm can successfully be used for macromolecular recognition by employing local surface features. Results demonstrate that the proposed algorithm can be employed both to identify surface similarities in context of macromolecular functional analysis and to screen possible protein interactions to predict pairing capability.
Collapse
Affiliation(s)
- Ivan Merelli
- Institute for Biomedical Technologies, Italian National Research Council, Segrate (Milan), Italy.
| | | | | | | | | |
Collapse
|
40
|
Abstract
MOTIVATION Identifying the location of binding sites on proteins is of fundamental importance for a wide range of applications, including molecular docking, de novo drug design, structure identification and comparison of functional sites. Here we present Erebus, a web server that searches the entire Protein Data Bank for a given substructure defined by a set of atoms of interest, such as the binding scaffolds for small molecules. The identified substructure contains atoms having the same names, belonging to same amino acids and separated by the same distances (within a given tolerance) as the atoms of the query structure. The accuracy of a match is measured by the root-mean-square deviation or by the normal weight with a given variance. Tests show that our approach can reliably locate rigid binding scaffolds of drugs and metal ions. AVAILABILITY AND IMPLEMENTATION We provide this service through a web server at http://erebus.dokhlab.org.
Collapse
Affiliation(s)
- David Shirvanyants
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, NC 27599-7260, USA
| | | | | |
Collapse
|
41
|
Yin S, Dokholyan NV. Fingerprint-based structure retrieval using electron density. Proteins 2011; 79:1002-9. [PMID: 21287628 DOI: 10.1002/prot.22941] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Revised: 10/08/2010] [Accepted: 11/05/2010] [Indexed: 12/14/2022]
Abstract
We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy.
Collapse
Affiliation(s)
- Shuangye Yin
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA
| | | |
Collapse
|
42
|
Martin J. Beauty is in the eye of the beholder: proteins can recognize binding sites of homologous proteins in more than one way. PLoS Comput Biol 2010; 6:e1000821. [PMID: 20585553 PMCID: PMC2887470 DOI: 10.1371/journal.pcbi.1000821] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 05/18/2010] [Indexed: 11/18/2022] Open
Abstract
Understanding the mechanisms of protein-protein interaction is a fundamental problem with many practical applications. The fact that different proteins can bind similar partners suggests that convergently evolved binding interfaces are reused in different complexes. A set of protein complexes composed of non-homologous domains interacting with homologous partners at equivalent binding sites was collected in 2006, offering an opportunity to investigate this point. We considered 433 pairs of protein-protein complexes from the ABAC database (AB and AC binary protein complexes sharing a homologous partner A) and analyzed the extent of physico-chemical similarity at the atomic and residue level at the protein-protein interface. Homologous partners of the complexes were superimposed using Multiprot, and similar atoms at the interface were quantified using a five class grouping scheme and a distance cut-off. We found that the number of interfacial atoms with similar properties is systematically lower in the non-homologous proteins than in the homologous ones. We assessed the significance of the similarity by bootstrapping the atomic properties at the interfaces. We found that the similarity of binding sites is very significant between homologous proteins, as expected, but generally insignificant between the non-homologous proteins that bind to homologous partners. Furthermore, evolutionarily conserved residues are not colocalized within the binding sites of non-homologous proteins. We could only identify a limited number of cases of structural mimicry at the interface, suggesting that this property is less generic than previously thought. Our results support the hypothesis that different proteins can interact with similar partners using alternate strategies, but do not support convergent evolution.
Collapse
Affiliation(s)
- Juliette Martin
- Université de Lyon, Lyon, France; Université Lyon 1, IFR 128, CNRS, UMR 5086 Institut de Biologie et Chimie des Protéines (IBCP), Lyon, France.
| |
Collapse
|
43
|
Weill N, Rognan D. Alignment-free ultra-high-throughput comparison of druggable protein-ligand binding sites. J Chem Inf Model 2010; 50:123-35. [PMID: 20058856 DOI: 10.1021/ci900349y] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Inferring the biological function of a protein from its three-dimensional structure as well as explaining why a drug may bind to various targets is of crucial importance to modern drug discovery. Here we present a generic 4833-integer vector describing druggable protein-ligand binding sites that can be applied to any protein and any binding cavity. The fingerprint registers counts of pharmacophoric triplets from the Calpha atomic coordinates of binding-site-lining residues. Starting from a customized data set of diverse protein-ligand binding site pairs, the most appropriate metric and a similarity threshold could be defined for similar binding sites. The method (FuzCav) has been used in various scenarios: (i) screening a collection of 6000 binding sites for similarity to different queries; (ii) classifying protein families (serine endopeptidases, protein kinases) by binding site diversity; (iii) discriminating adenine-binding cavities from decoys. The fingerprint generation and comparison supports ultra-high throughput (ca. 1000 measures/s), does not require prior alignment of protein binding sites, and is able to detect local similarity among subpockets. It is thus particularly well suited to the functional annotation of novel genomic structures with low sequence identity to known X-ray templates.
Collapse
Affiliation(s)
- Nathanaël Weill
- Structural Chemogenomics, Laboratory of Therapeutic Innovation, UMR 7200 CNRS-UdS (Universite de Strasbourg), F-67400 Illkirch, France
| | | |
Collapse
|