1
|
Wang X, Gao X, Fan X, Huai Z, Zhang G, Yao M, Wang T, Huang X, Lai L. WUREN: Whole-modal union representation for epitope prediction. Comput Struct Biotechnol J 2024; 23:2122-2131. [PMID: 38817963 PMCID: PMC11137340 DOI: 10.1016/j.csbj.2024.05.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/14/2024] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
B-cell epitope identification plays a vital role in the development of vaccines, therapies, and diagnostic tools. Currently, molecular docking tools in B-cell epitope prediction are heavily influenced by empirical parameters and require significant computational resources, rendering a great challenge to meet large-scale prediction demands. When predicting epitopes from antigen-antibody complex, current artificial intelligence algorithms cannot accurately implement the prediction due to insufficient protein feature representations, indicating novel algorithm is desperately needed for efficient protein information extraction. In this paper, we introduce a multimodal model called WUREN (Whole-modal Union Representation for Epitope predictioN), which effectively combines sequence, graph, and structural features. It achieved AUC-PR scores of 0.213 and 0.193 on the solved structures and AlphaFold-generated structures, respectively, for the independent test proteins selected from DiscoTope3 benchmark. Our findings indicate that WUREN is an efficient feature extraction model for protein complexes, with the generalizable application potential in the development of protein-based drugs. Moreover, the streamlined framework of WUREN could be readily extended to model similar biomolecules, such as nucleic acids, carbohydrates, and lipids.
Collapse
Affiliation(s)
| | | | - Xuezhe Fan
- XtalPi Innovation Center, Beijing, China
| | - Zhe Huai
- XtalPi Innovation Center, Beijing, China
| | | | | | | | | | - Lipeng Lai
- XtalPi Innovation Center, Beijing, China
| |
Collapse
|
2
|
Wang C, Wang J, Song W, Luo G, Jiang T. EpiScan: accurate high-throughput mapping of antibody-specific epitopes using sequence information. NPJ Syst Biol Appl 2024; 10:101. [PMID: 39251627 PMCID: PMC11383971 DOI: 10.1038/s41540-024-00432-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 08/27/2024] [Indexed: 09/11/2024] Open
Abstract
The identification of antibody-specific epitopes on virus proteins is crucial for vaccine development and drug design. Nonetheless, traditional wet-lab approaches for the identification of epitopes are both costly and labor-intensive, underscoring the need for the development of efficient and cost-effective computational tools. Here, EpiScan, an attention-based deep learning framework for predicting antibody-specific epitopes, is presented. EpiScan adopts a multi-input and single-output strategy by designing independent blocks for different parts of antibodies, including variable heavy chain (VH), variable light chain (VL), complementary determining regions (CDRs), and framework regions (FRs). The block predictions are weighted and integrated for the prediction of potential epitopes. Using multiple experimental data samples, we show that EpiScan, which only uses antibody sequence information, can accurately map epitopes on specific antigen structures. The antibody-specific epitopes on the receptor binding domain (RBD) of SARS coronavirus 2 (SARS-CoV-2) were located by EpiScan, and the potentially valuable vaccine epitope was identified. EpiScan can expedite the epitope mapping process for high-throughput antibody sequencing data, supporting vaccine design and drug development. Availability: For the convenience of related wet-experimental researchers, the source code and web server of EpiScan are publicly available at https://github.com/gzBiomedical/EpiScan .
Collapse
Affiliation(s)
- Chuan Wang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Guangzhou National Laboratory, Guangzhou, China
| | | | - Wenjun Song
- Guangzhou National Laboratory, Guangzhou, China
- Institute of Integration of Traditional and Western Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Guanzheng Luo
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
| | - Taijiao Jiang
- Guangzhou National Laboratory, Guangzhou, China.
- State Key Laboratory of Respiratory Disease, The Key laboratory of Advanced Interdisciplinary Studies Center, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
3
|
Nasaev SS, Mukanov AR, Mishkorez IV, Kuznetsov II, Leibin IV, Dolgusheva VA, Pavlyuk GA, Manasyan AL, Veselovsky AV. Molecular Modeling Methods in the Development of Affine and Specific Protein-Binding Agents. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1451-1473. [PMID: 39245455 DOI: 10.1134/s0006297924080066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/12/2024] [Accepted: 07/11/2024] [Indexed: 09/10/2024]
Abstract
High-affinity and specific agents are widely applied in various areas, including diagnostics, scientific research, and disease therapy (as drugs and drug delivery systems). It takes significant time to develop them. For this reason, development of high-affinity agents extensively utilizes computer methods at various stages for the analysis and modeling of these molecules. The review describes the main affinity and specific agents, such as monoclonal antibodies and their fragments, antibody mimetics, aptamers, and molecularly imprinted polymers. The methods of their obtaining as well as their main advantages and disadvantages are briefly described, with special attention focused on the molecular modeling methods used for their analysis and development.
Collapse
Affiliation(s)
| | - Artem R Mukanov
- Research & Development Department, Xelari Ltd., Moscow, 121601, Russia
| | - Ivan V Mishkorez
- Research & Development Department, Xelari Ltd., Moscow, 121601, Russia
- Institute of Biomedical Chemistry, Moscow, 119121, Russia
| | - Ivan I Kuznetsov
- Research & Development Department, Xelari Ltd., Moscow, 121601, Russia
| | - Iosif V Leibin
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, 121205, Russia
| | | | - Gleb A Pavlyuk
- Research & Development Department, Xelari Ltd., Moscow, 121601, Russia
| | - Artem L Manasyan
- Research & Development Department, Xelari Ltd., Moscow, 121601, Russia
| | | |
Collapse
|
4
|
Pegoraro M, Dominé C, Rodolà E, Veličković P, Deac A. Geometric epitope and paratope prediction. Bioinformatics 2024; 40:btae405. [PMID: 38984742 PMCID: PMC11245313 DOI: 10.1093/bioinformatics/btae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/14/2024] [Accepted: 07/09/2024] [Indexed: 07/11/2024] Open
Abstract
MOTIVATION Identifying the binding sites of antibodies is essential for developing vaccines and synthetic antibodies. In this article, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information. RESULTS Specifically, we compare different geometric deep learning methods applied to proteins' inner (I-GEP) and outer (O-GEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that different geometrical representation information is useful for different tasks. Surface-based models are more efficient in predicting the binding of the epitope, while graph models are better in paratope prediction, both achieving significant performance improvements. Moreover, we analyze the impact of structural changes in antibodies and antigens resulting from conformational rearrangements or reconstruction errors. Through this investigation, we showcase the robustness of geometric deep learning methods and spectral geometric descriptors to such perturbations. AVAILABILITY AND IMPLEMENTATION The python code for the models, together with the data and the processing pipeline, is open-source and available at https://github.com/Marco-Peg/GEP.
Collapse
Affiliation(s)
- Marco Pegoraro
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | - Clémentine Dominé
- Gatsby Computational Neuroscience Unit, University College London, W1T 4JG, United-Kingdom
| | - Emanuele Rodolà
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | | | - Andreea Deac
- Département d’informatique et de recherche opérationelle, Université de Montréal, QC H2S 3H1, Canada
| |
Collapse
|
5
|
Jamasb AR, Morehead A, Joshi CK, Zhang Z, Didi K, Mathis S, Harris C, Tang J, Cheng J, Liò P, Blundell TL. Evaluating Representation Learning on the Protein Structure Universe. ARXIV 2024:arXiv:2406.13864v1. [PMID: 38947934 PMCID: PMC11213157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.
Collapse
|
6
|
Richardson E, Trevizani R, Greenbaum JA, Carter H, Nielsen M, Peters B. The receiver operating characteristic curve accurately assesses imbalanced datasets. PATTERNS (NEW YORK, N.Y.) 2024; 5:100994. [PMID: 39005487 PMCID: PMC11240176 DOI: 10.1016/j.patter.2024.100994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/05/2024] [Accepted: 05/03/2024] [Indexed: 07/16/2024]
Abstract
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
Collapse
Affiliation(s)
- Eve Richardson
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Raphael Trevizani
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Fiocruz Ceará, Fundação Oswaldo Cruz, Rua São José s/n, Precabura, Eusébio/CE, Brazil
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California, La Jolla, CA, USA
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| |
Collapse
|
7
|
Joubbi S, Micheli A, Milazzo P, Maccari G, Ciano G, Cardamone D, Medini D. Antibody design using deep learning: from sequence and structure design to affinity maturation. Brief Bioinform 2024; 25:bbae307. [PMID: 38960409 PMCID: PMC11221890 DOI: 10.1093/bib/bbae307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/20/2024] [Accepted: 06/12/2024] [Indexed: 07/05/2024] Open
Abstract
Deep learning has achieved impressive results in various fields such as computer vision and natural language processing, making it a powerful tool in biology. Its applications now encompass cellular image classification, genomic studies and drug discovery. While drug development traditionally focused deep learning applications on small molecules, recent innovations have incorporated it in the discovery and development of biological molecules, particularly antibodies. Researchers have devised novel techniques to streamline antibody development, combining in vitro and in silico methods. In particular, computational power expedites lead candidate generation, scaling and potential antibody development against complex antigens. This survey highlights significant advancements in protein design and optimization, specifically focusing on antibodies. This includes various aspects such as design, folding, antibody-antigen interactions docking and affinity maturation.
Collapse
Affiliation(s)
- Sara Joubbi
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Alessio Micheli
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
| | - Paolo Milazzo
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
| | - Giuseppe Maccari
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Giorgio Ciano
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Dario Cardamone
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Duccio Medini
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| |
Collapse
|
8
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
9
|
Chu L, Ruffolo JA, Harmalkar A, Gray JJ. Flexible protein-protein docking with a multitrack iterative transformer. Protein Sci 2024; 33:e4862. [PMID: 38148272 PMCID: PMC10804679 DOI: 10.1002/pro.4862] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 11/17/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and reranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, for example, structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multitrack iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments, GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the Database of Interacting Protein Structures (DIPS) test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under 1 s on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee‐Shin Chu
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey A. Ruffolo
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
10
|
Vottero P, Olivetti EC, D'Agostino LC, Di Grazia L, Vezzetti E, Aminpour M, Tuszynski JA, Marcolin F. Understanding the contagiousness of Covid-19 strains: A geometric approach. J Mol Graph Model 2024; 126:108670. [PMID: 37984193 DOI: 10.1016/j.jmgm.2023.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/22/2023]
Abstract
Protein-protein interaction occurs on surface patches with some degree of complementary geometric and chemical features. Building on this understanding, this study endeavors to characterize the spike protein of the SARS-CoV-2 virus at the morphological and geometrical levels in its Alpha, Delta, and Omicron variants. In particular, the affinity between different SARS-CoV-2 spike proteins and the ACE2 receptor present on the membrane of the human respiratory system cells is investigated. To achieve an adequate degree of geometrical accuracy, the 3D depth maps of the proteins in exam are filtered by developing an ad-hoc convolutional filter with a kernel implemented as a sphere of varying radius, simulating a ball rolling on the surface (similar to the 'rolling ball' filter). This ball ideally models a hypothetical molecule that could interface with the protein and is inspired by the geometric approach to macromolecule-ligand interactions proposed by Kuntz et al. in 1982. The aim is to mitigate the imperfections and to obtain a smoother surface that could be studied from a geometrical perspective for binding purposes. A set of geometric descriptors, borrowed from the 3D face analysis context is then mapped point-by-point onto protein depth maps. Following a feature extraction phase inspired by Histogram of Oriented Gradients and Local Binary Patterns, the final histogram features are used as input for a Support Vector Machine classifier to automatically classify the proteins according to their surface affinity, where a similarity in shape is observed between ACE2 and the spike protein of the SARS-CoV-2 Omicron variant. Finally, Root Mean Square Error analysis is used to quantify the geometrical affinity between the ACE2 receptor and the respective Receptor Binding Domains of the three SARS-CoV-2 variants, culminating in a geometrical explanation for the higher contagiousness of Omicron relative to the other variants under study.
Collapse
Affiliation(s)
- Paola Vottero
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Elena Carlotta Olivetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Lucia Chiara D'Agostino
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Luca Di Grazia
- Department of Computer Science, University of Stuttgart, Universitätsstr. 38, 70569, Stuttgart, Germany
| | - Enrico Vezzetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Maral Aminpour
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Jacek Adam Tuszynski
- Department of Physics, University of Alberta, Edmonton, AB, T6G 2H7, Canada; Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy; Department of Data Science and Engineering, The Silesian University of Technology, Gliwice, Poland.
| | - Federica Marcolin
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| |
Collapse
|
11
|
Khakzad H, Igashov I, Schneuing A, Goverde C, Bronstein M, Correia B. A new age in protein design empowered by deep learning. Cell Syst 2023; 14:925-939. [PMID: 37972559 DOI: 10.1016/j.cels.2023.10.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/22/2023] [Accepted: 10/11/2023] [Indexed: 11/19/2023]
Abstract
The rapid progress in the field of deep learning has had a significant impact on protein design. Deep learning methods have recently produced a breakthrough in protein structure prediction, leading to the availability of high-quality models for millions of proteins. Along with novel architectures for generative modeling and sequence analysis, they have revolutionized the protein design field in the past few years remarkably by improving the accuracy and ability to identify novel protein sequences and structures. Deep neural networks can now learn and extract the fundamental features of protein structures, predict how they interact with other biomolecules, and have the potential to create new effective drugs for treating disease. As their applicability in protein design is rapidly growing, we review the recent developments and technology in deep learning methods and provide examples of their performance to generate novel functional proteins.
Collapse
Affiliation(s)
- Hamed Khakzad
- Université de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France; École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Ilia Igashov
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Arne Schneuing
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Casper Goverde
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | | | - Bruno Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
12
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
13
|
Ghoreyshi ZS, George JT. Quantitative approaches for decoding the specificity of the human T cell repertoire. Front Immunol 2023; 14:1228873. [PMID: 37781387 PMCID: PMC10539903 DOI: 10.3389/fimmu.2023.1228873] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/17/2023] [Indexed: 10/03/2023] Open
Abstract
T cell receptor (TCR)-peptide-major histocompatibility complex (pMHC) interactions play a vital role in initiating immune responses against pathogens, and the specificity of TCRpMHC interactions is crucial for developing optimized therapeutic strategies. The advent of high-throughput immunological and structural evaluation of TCR and pMHC has provided an abundance of data for computational approaches that aim to predict favorable TCR-pMHC interactions. Current models are constructed using information on protein sequence, structures, or a combination of both, and utilize a variety of statistical learning-based approaches for identifying the rules governing specificity. This review examines the current theoretical, computational, and deep learning approaches for identifying TCR-pMHC recognition pairs, placing emphasis on each method's mathematical approach, predictive performance, and limitations.
Collapse
Affiliation(s)
- Zahra S. Ghoreyshi
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
| | - Jason T. George
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
- Engineering Medicine Program, Texas A&M University, Houston, TX, United States
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
| |
Collapse
|
14
|
Guarra F, Colombo G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J Chem Theory Comput 2023; 19:5315-5333. [PMID: 37527403 PMCID: PMC10448727 DOI: 10.1021/acs.jctc.3c00513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Indexed: 08/03/2023]
Abstract
The design of new biomolecules able to harness immune mechanisms for the treatment of diseases is a prime challenge for computational and simulative approaches. For instance, in recent years, antibodies have emerged as an important class of therapeutics against a spectrum of pathologies. In cancer, immune-inspired approaches are witnessing a surge thanks to a better understanding of tumor-associated antigens and the mechanisms of their engagement or evasion from the human immune system. Here, we provide a summary of the main state-of-the-art computational approaches that are used to design antibodies and antigens, and in parallel, we review key methodologies for epitope identification for both B- and T-cell mediated responses. A special focus is devoted to the description of structure- and physics-based models, privileged over purely sequence-based approaches. We discuss the implications of novel methods in engineering biomolecules with tailored immunological properties for possible therapeutic uses. Finally, we highlight the extraordinary challenges and opportunities presented by the possible integration of structure- and physics-based methods with emerging Artificial Intelligence technologies for the prediction and design of novel antigens, epitopes, and antibodies.
Collapse
Affiliation(s)
- Federica Guarra
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| | - Giorgio Colombo
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| |
Collapse
|
15
|
Roche R, Moussad B, Shuvo MH, Bhattacharya D. E(3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction. PLoS Comput Biol 2023; 19:e1011435. [PMID: 37651442 PMCID: PMC10499216 DOI: 10.1371/journal.pcbi.1011435] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 09/13/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available at https://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
16
|
Sunny S, Prakash PB, Gopakumar G, Jayaraj PB. DeepBindPPI: Protein-Protein Binding Site Prediction Using Attention Based Graph Convolutional Network. Protein J 2023; 42:276-287. [PMID: 37198346 PMCID: PMC10191823 DOI: 10.1007/s10930-023-10121-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 05/19/2023]
Abstract
Due to the importance of protein-protein interactions in defence mechanism of living body, attempts were made to investigate its attributes, including, but not limited to, binding affinity, and binding region. Contemporary strategies for binding site prediction largely resort to deep learning techniques but turned out to be low precision models. As laboratory experiments for drug discovery tasks utilize this information, increased false positives devalue the computational methods. This emphasize the need to develop enhanced strategies. DeepBindPPI employs deep learning technique to predict the binding regions of proteins, particularly antigen-antibody interaction sites. The results obtained are applied in a docking environment to confirm their correctness. An integration of graph convolutional network with attention mechanism predicts interacting amino acids with improved precision. The model learns the determining factors in interaction from a general pool of proteins and is then fine-tuned using antigen-antibody data. Comparison of the proposed method with existing techniques shows that the developed model has comparable performance. The use of a separate spatial network clearly improved the precision of the proposed method from 0.4 to 0.5. An attempt to utilize the interface information for docking using the HDOCK server gives promising results, with high-quality structures appearing in the top10 ranks.
Collapse
Affiliation(s)
- Sharon Sunny
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | | | - G. Gopakumar
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | - P. B. Jayaraj
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| |
Collapse
|
17
|
Dewey JA, Delalande C, Azizi SA, Lu V, Antonopoulos D, Babnigg G. Molecular Glue Discovery: Current and Future Approaches. J Med Chem 2023; 66:9278-9296. [PMID: 37437222 PMCID: PMC10805529 DOI: 10.1021/acs.jmedchem.3c00449] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
The intracellular interactions of biomolecules can be maneuvered to redirect signaling, reprogram the cell cycle, or decrease infectivity using only a few dozen atoms. Such "molecular glues," which can drive both novel and known interactions between protein partners, represent an enticing therapeutic strategy. Here, we review the methods and approaches that have led to the identification of small-molecule molecular glues. We first classify current FDA-approved molecular glues to facilitate the selection of discovery methods. We then survey two broad discovery method strategies, where we highlight the importance of factors such as experimental conditions, software packages, and genetic tools for success. We hope that this curation of methodologies for directed discovery will inspire diverse research efforts targeting a multitude of human diseases.
Collapse
Affiliation(s)
- Jeffrey A Dewey
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Clémence Delalande
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Saara-Anne Azizi
- Pritzker School of Medicine, University of Chicago, Chicago, Illinois 60637, United States
| | - Vivian Lu
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Dionysios Antonopoulos
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Gyorgy Babnigg
- Biosciences Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
18
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
19
|
Chu LS, Ruffolo JA, Harmalkar A, Gray JJ. Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.29.547134. [PMID: 37425754 PMCID: PMC10327054 DOI: 10.1101/2023.06.29.547134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and re-ranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, e.g., structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multi-track iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments (MSAs), GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. For a benchmark set of rigid targets, GeoDock obtains a 41% success rate, outperforming all the other tested methods. For a more challenging benchmark set of flexible targets, GeoDock achieves a similar number of top-model successes as the traditional method ClusPro [1], but fewer than ReplicaDock2 [2]. GeoDock attains an average inference speed of under one second on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey A Ruffolo
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
20
|
Choi J. Narrow funnel-like interaction energy distribution is an indicator of specific protein interaction partner. iScience 2023; 26:106911. [PMID: 37305691 PMCID: PMC10250834 DOI: 10.1016/j.isci.2023.106911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 04/28/2023] [Accepted: 05/12/2023] [Indexed: 06/13/2023] Open
Abstract
Protein interaction networks underlie countless biological mechanisms. However, most protein interaction predictions are based on biological evidence that are biased to well-known protein interaction or physical evidence that exhibits low accuracy for weak interactions and requires high computational power. In this study, a novel method has been suggested to predict protein interaction partners by investigating narrow funnel-like interaction energy distribution. In this study, it was demonstrated that various protein interactions including kinases and E3 ubiquitin ligases have narrow funnel-like interaction energy distribution. To analyze protein interaction distribution, modified scores of iRMS and TM-score are introduced. Then, using these scores, algorithm and deep learning model for prediction of protein interaction partner and substrate of kinase and E3 ubiquitin ligase were developed. The prediction accuracy was similar to or even better than that of yeast two-hybrid screening. Ultimately, this knowledge-free protein interaction prediction method will broaden our understanding of protein interaction networks.
Collapse
Affiliation(s)
- Juyoung Choi
- Department of Life Science, Sogang University, Seoul 04017, South Korea
| |
Collapse
|
21
|
Shuvo MH, Karim M, Roche R, Bhattacharya D. PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries. BIOINFORMATICS ADVANCES 2023; 3:vbad070. [PMID: 37351310 PMCID: PMC10281963 DOI: 10.1093/bioadv/vbad070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/17/2023] [Accepted: 06/01/2023] [Indexed: 06/24/2023]
Abstract
Motivation Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Results Here, we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. Availability and implementation An open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Mohimenul Karim
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
22
|
Shanshal M, Caimi PF, Adjei AA, Ma WW. T-Cell Engagers in Solid Cancers-Current Landscape and Future Directions. Cancers (Basel) 2023; 15:2824. [PMID: 37345160 DOI: 10.3390/cancers15102824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/15/2023] [Accepted: 05/15/2023] [Indexed: 06/23/2023] Open
Abstract
Monoclonal antibody treatment initially heralded an era of molecularly targeted therapy in oncology and is now widely applied in modulating anti-cancer immunity by targeting programmed cell receptors (PD-1, PD-L1), cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) and, more recently, lymphocyte-activation gene 3 (LAG3). Chimeric antigen receptor T-cell therapy (CAR-T) recently proved to be a valid approach to inducing anti-cancer immunity by directly modifying the host's immune cells. However, such cell-based therapy requires extensive resources such as leukapheresis, ex vivo modification and expansion of cytotoxic T-cells and current Good Manufacturing Practice (cGMP) laboratories and presents significant logistical challenges. Bi-/trispecific antibody technology is a novel pharmaceutical approach to facilitate the engagement of effector immune cells to potentially multiple cancer epitopes, e.g., the recently approved blinatumomab. This opens the opportunity to develop 'off-the-shelf' anti-cancer agents that achieve similar and/or complementary anti-cancer effects as those of modified immune cell therapy. The majority of bi-/trispecific antibodies target the tumor-associated antigens (TAA) located on the extracellular surface of cancer cells. The extracellular antigens represent just a small percentage of known TAAs and are often associated with higher toxicities because some of them are expressed on normal cells (off-target toxicity). In contrast, the targeting of intracellular TAAs such as mutant RAS and TP53 may lead to fewer off-target toxicities while still achieving the desired antitumor efficacy (on-target toxicity). Here, we provide a comprehensive review on the emerging field of bi-/tri-specific T-cell engagers and potential therapeutic opportunities.
Collapse
Affiliation(s)
| | | | | | - Wen Wee Ma
- Cleveland Clinic, Cleveland, OH 44195, USA
| |
Collapse
|
23
|
Saldinger JC, Raymond M, Elvati P, Violi A. Domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. NATURE COMPUTATIONAL SCIENCE 2023; 3:393-402. [PMID: 38177838 DOI: 10.1038/s43588-023-00438-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 03/24/2023] [Indexed: 01/06/2024]
Abstract
Although challenging, the accurate and rapid prediction of nanoscale interactions has broad applications for numerous biological processes and material properties. While several models have been developed to predict the interaction of specific biological components, they use system-specific information that hinders their application to more general materials. Here we present NeCLAS, a general and efficient machine learning pipeline that predicts the location of nanoscale interactions, providing human-intelligible predictions. NeCLAS outperforms current nanoscale prediction models for generic nanoparticles up to 10-20 nm, reproducing interactions for biological and non-biological systems. Two aspects contribute to these results: a low-dimensional representation of nanoparticles and molecules (to reduce the effect of data uncertainty), and environmental features (to encode the physicochemical neighborhood at multiple scales). This framework has several applications, from basic research to rapid prototyping and design in nanobiotechnology.
Collapse
Affiliation(s)
| | - Matt Raymond
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Paolo Elvati
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Angela Violi
- Chemical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
24
|
Krapp LF, Abriata LA, Cortés Rodriguez F, Dal Peraro M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat Commun 2023; 14:2175. [PMID: 37072397 PMCID: PMC10113261 DOI: 10.1038/s41467-023-37701-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 04/20/2023] Open
Abstract
Proteins are essential molecular building blocks of life, responsible for most biological functions as a result of their specific molecular interactions. However, predicting their binding interfaces remains a challenge. In this study, we present a geometric transformer that acts directly on atomic coordinates labeled only with element names. The resulting model-the Protein Structure Transformer, PeSTo-surpasses the current state of the art in predicting protein-protein interfaces and can also predict and differentiate between interfaces involving nucleic acids, lipids, ions, and small molecules with high confidence. Its low computational cost enables processing high volumes of structural data, such as molecular dynamics ensembles allowing for the discovery of interfaces that remain otherwise inconspicuous in static experimentally solved structures. Moreover, the growing foldome provided by de novo structural predictions can be easily analyzed, providing new opportunities to uncover unexplored biology.
Collapse
Affiliation(s)
- Lucien F Krapp
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Fabio Cortés Rodriguez
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland.
| |
Collapse
|
25
|
Isert C, Atz K, Schneider G. Structure-based drug design with geometric deep learning. Curr Opin Struct Biol 2023; 79:102548. [PMID: 36842415 DOI: 10.1016/j.sbi.2023.102548] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/16/2023] [Accepted: 01/24/2023] [Indexed: 02/26/2023]
Abstract
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland; ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 8093, Singapore.
| |
Collapse
|
26
|
Hurtado J, Flynn C, Lee JH, Salcedo EC, Cottrell CA, Skog PD, Burton DR, Nemazee D, Schief WR, Landais E, Sok D, Briney B. Efficient isolation of rare B cells using next-generation antigen barcoding. Front Cell Infect Microbiol 2023; 12:962945. [PMID: 36968243 PMCID: PMC10036767 DOI: 10.3389/fcimb.2022.962945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 12/28/2022] [Indexed: 03/12/2023] Open
Abstract
The ability to efficiently isolate antigen-specific B cells in high throughput will greatly accelerate the discovery of therapeutic monoclonal antibodies (mAbs) and catalyze rational vaccine development. Traditional mAb discovery is a costly and labor-intensive process, although recent advances in single-cell genomics using emulsion microfluidics allow simultaneous processing of thousands of individual cells. Here we present a streamlined method for isolation and analysis of large numbers of antigen-specific B cells, including next generation antigen barcoding and an integrated computational framework for B cell multi-omics. We demonstrate the power of this approach by recovering thousands of antigen-specific mAbs, including the efficient isolation of extremely rare precursors of VRC01-class and IOMA-class broadly neutralizing HIV mAbs.
Collapse
Affiliation(s)
- Jonathan Hurtado
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Center for Viral Systems Biology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
| | - Claudia Flynn
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
| | - Jeong Hyun Lee
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
| | - Eugenia C. Salcedo
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
| | - Christopher A. Cottrell
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
| | - Patrick D. Skog
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
| | - Dennis R. Burton
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- Ragon Institute of Massachusetts General Hospital (MGH), Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA, United States
| | - David Nemazee
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
| | - William R. Schief
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
- Ragon Institute of Massachusetts General Hospital (MGH), Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA, United States
| | - Elise Landais
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
| | - Devin Sok
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- IAVI Neutralizing Antibody Center, Scripps Research, La Jolla, CA, United States
- International AIDS Vaccine Initiative, New York, NY, United States
| | - Bryan Briney
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, United States
- Center for Viral Systems Biology, Scripps Research, La Jolla, CA, United States
- Consortium for HIV/AIDS Vaccine Development, Scripps Research, La Jolla, CA, United States
- San Diego Center for AIDS Research, Scripps Research, La Jolla, CA, United States
| |
Collapse
|
27
|
Rui H, Ashton KS, Min J, Wang C, Potts PR. Protein-protein interfaces in molecular glue-induced ternary complexes: classification, characterization, and prediction. RSC Chem Biol 2023; 4:192-215. [PMID: 36908699 PMCID: PMC9994104 DOI: 10.1039/d2cb00207h] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/02/2023] [Indexed: 01/04/2023] Open
Abstract
Molecular glues are a class of small molecules that stabilize the interactions between proteins. Naturally occurring molecular glues are present in many areas of biology where they serve as central regulators of signaling pathways. Importantly, several clinical compounds act as molecular glue degraders that stabilize interactions between E3 ubiquitin ligases and target proteins, leading to their degradation. Molecular glues hold promise as a new generation of therapeutic agents, including those molecular glue degraders that can redirect the protein degradation machinery in a precise way. However, rational discovery of molecular glues is difficult in part due to the lack of understanding of the protein-protein interactions they stabilize. In this review, we summarize the structures of known molecular glue-induced ternary complexes and the interface properties. Detailed analysis shows different mechanisms of ternary structure formation. Additionally, we also review computational approaches for predicting protein-protein interfaces and highlight the promises and challenges. This information will ultimately help inform future approaches for rational molecular glue discovery.
Collapse
Affiliation(s)
- Huan Rui
- Center for Research Acceleration by Digital Innovation, Amgen Research Thousand Oaks CA 91320 USA
| | - Kate S Ashton
- Medicinal Chemistry, Amgen Research Thousand Oaks CA 91320 USA
| | - Jaeki Min
- Induced Proximity Platform, Amgen Research Thousand Oaks CA 91320 USA
| | - Connie Wang
- Digital, Technology & Innovation, Amgen Thousand Oaks CA 91320 USA
| | | |
Collapse
|
28
|
Shuvo MH, Karim M, Roche R, Bhattacharya D. PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.14.528528. [PMID: 36824789 PMCID: PMC9949034 DOI: 10.1101/2023.02.14.528528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Here we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of the individual interactions between the interfacial residues using a multihead graph attention network and then probabilistically combines the estimated quality of the interfacial residues for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study reveals that the performance gains are connected to the effectiveness of the multihead graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. An open-source software implementation of PIQLE, licensed under the GNU General Public License v3, is freely available at https://github.com/Bhattacharya-Lab/PIQLE .
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Mohimenul Karim
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| |
Collapse
|
29
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
30
|
Predicting unseen antibodies’ neutralizability via adaptive graph neural networks. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00553-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
31
|
Xu Z, Ismanto HS, Zhou H, Saputri DS, Sugihara F, Standley DM. Advances in antibody discovery from human BCR repertoires. FRONTIERS IN BIOINFORMATICS 2022; 2:1044975. [PMID: 36338807 PMCID: PMC9631452 DOI: 10.3389/fbinf.2022.1044975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open
Abstract
Antibodies make up an important and growing class of compounds used for the diagnosis or treatment of disease. While traditional antibody discovery utilized immunization of animals to generate lead compounds, technological innovations have made it possible to search for antibodies targeting a given antigen within the repertoires of B cells in humans. Here we group these innovations into four broad categories: cell sorting allows the collection of cells enriched in specificity to one or more antigens; BCR sequencing can be performed on bulk mRNA, genomic DNA or on paired (heavy-light) mRNA; BCR repertoire analysis generally involves clustering BCRs into specificity groups or more in-depth modeling of antibody-antigen interactions, such as antibody-specific epitope predictions; validation of antibody-antigen interactions requires expression of antibodies, followed by antigen binding assays or epitope mapping. Together with innovations in Deep learning these technologies will contribute to the future discovery of diagnostic and therapeutic antibodies directly from humans.
Collapse
Affiliation(s)
- Zichang Xu
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, Suita, Japan
| | - Hendra S. Ismanto
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, Suita, Japan
| | - Hao Zhou
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, Suita, Japan
| | - Dianita S. Saputri
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, Suita, Japan
| | - Fuminori Sugihara
- Core Instrumentation Facility, Immunology Frontier Research Center, Osaka University, Suita, Japan
| | - Daron M. Standley
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, Suita, Japan
- Department Systems Immunology, Immunology Frontier Research Center, Osaka University, Suita, Japan
| |
Collapse
|
32
|
Skolnick J, Zhou H. Implications of the Essential Role of Small Molecule Ligand Binding Pockets in Protein-Protein Interactions. J Phys Chem B 2022; 126:6853-6867. [PMID: 36044742 PMCID: PMC9484464 DOI: 10.1021/acs.jpcb.2c04525] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/18/2022] [Indexed: 11/28/2022]
Abstract
Protein-protein interactions (PPIs) and protein-metabolite interactions play a key role in many biochemical processes, yet they are often viewed as being independent. However, the fact that small molecule drugs have been successful in inhibiting PPIs suggests a deeper relationship between protein pockets that bind small molecules and PPIs. We demonstrate that 2/3 of PPI interfaces, including antibody-epitope interfaces, contain at least one significant small molecule ligand binding pocket. In a representative library of 50 distinct protein-protein interactions involving hundreds of mutations, >75% of hot spot residues overlap with small molecule ligand binding pockets. Hence, ligand binding pockets play an essential role in PPIs. In representative cases, evolutionary unrelated monomers that are involved in different multimeric interactions yet share the same pocket are predicted to bind the same metabolites/drugs; these results are confirmed by examples in the PDB. Thus, the binding of a metabolite can shift the equilibrium between monomers and multimers. This implicit coupling of PPI equilibria, termed "metabolic entanglement", was successfully employed to suggest novel functional relationships among protein multimers that do not directly interact. Thus, the current work provides an approach to unify metabolomics and protein interactomics.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332, United States
| | - Hongyi Zhou
- Center for the Study of Systems
Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332, United States
| |
Collapse
|
33
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
34
|
Nguyen TM, Nguyen T, Tran T. Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring. Brief Bioinform 2022; 23:bbac269. [PMID: 35788823 PMCID: PMC9353967 DOI: 10.1093/bib/bbac269] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/20/2022] [Accepted: 06/08/2022] [Indexed: 12/04/2022] Open
Abstract
Predicting the drug-target interaction is crucial for drug discovery as well as drug repurposing. Machine learning is commonly used in drug-target affinity (DTA) problem. However, the machine learning model faces the cold-start problem where the model performance drops when predicting the interaction of a novel drug or target. Previous works try to solve the cold start problem by learning the drug or target representation using unsupervised learning. While the drug or target representation can be learned in an unsupervised manner, it still lacks the interaction information, which is critical in drug-target interaction. To incorporate the interaction information into the drug and protein interaction, we proposed using transfer learning from chemical-chemical interaction (CCI) and protein-protein interaction (PPI) task to drug-target interaction task. The representation learned by CCI and PPI tasks can be transferred smoothly to the DTA task due to the similar nature of the tasks. The result on the DTA datasets shows that our proposed method has advantages compared to other pre-training methods in the DTA task.
Collapse
Affiliation(s)
- Tri Minh Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| |
Collapse
|
35
|
Protein–Protein Interaction Prediction for Targeted Protein Degradation. Int J Mol Sci 2022; 23:ijms23137033. [PMID: 35806036 PMCID: PMC9266413 DOI: 10.3390/ijms23137033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/17/2022] [Accepted: 06/18/2022] [Indexed: 02/04/2023] Open
Abstract
Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies on the formation of a stable ternary complex involving two proteins. However, experimental methods to detect PPI sites are both costly and time-intensive. In recent years, machine learning-based methods have been developed as screening tools. While they are computationally more efficient than traditional docking methods and thus allow rapid execution, these tools have so far primarily been based on sequence information, and they are therefore limited in their ability to address spatial requirements. In addition, they have to date not been applied to targeted protein degradation. Here, we present a new deep learning architecture based on the concept of graph representation learning that can predict interaction sites and interactions of proteins based on their surface representations. We demonstrate that our model reaches state-of-the-art performance using AUROC scores on the established MaSIF dataset. We furthermore introduce a new dataset with more diverse protein interactions and show that our model generalizes well to this new data. These generalization capabilities allow our model to predict the PPIs relevant for targeted protein degradation, which we show by demonstrating the high accuracy of our model for PPI prediction on the available ternary complex data. Our results suggest that PPI prediction models can be a valuable tool for screening protein pairs while developing new drugs for targeted protein degradation.
Collapse
|
36
|
Hummer AM, Abanades B, Deane CM. Advances in computational structure-based antibody design. Curr Opin Struct Biol 2022; 74:102379. [PMID: 35490649 DOI: 10.1016/j.sbi.2022.102379] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/28/2022] [Accepted: 03/17/2022] [Indexed: 12/12/2022]
Abstract
Antibodies are currently the most important class of biotherapeutics and are used to treat numerous diseases. Recent advances in computational methods are ushering in a new era of antibody design, driven in part by accurate structure prediction. Previously, structure-based antibody design has been limited to a relatively small number of cases where accurate structures or models of both the target antigen and antibody were available. As we move towards a time where it is possible to accurately model most antibodies and antigens, and to reliably predict their binding site, there is vast potential for true computational antibody design. In this review, we describe the latest methods that promise to launch a paradigm shift towards entirely in silico structure-based antibody design.
Collapse
Affiliation(s)
- Alissa M Hummer
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK. https://twitter.com/@AlissaHummer
| | - Brennan Abanades
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK. https://twitter.com/@brennanaba
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
| |
Collapse
|
37
|
Taneishi K, Tsuchiya Y. Structure-based analyses of gut microbiome-related proteins by neural networks and molecular dynamics simulations. Curr Opin Struct Biol 2022; 73:102336. [DOI: 10.1016/j.sbi.2022.102336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/18/2021] [Accepted: 01/14/2022] [Indexed: 11/03/2022]
|
38
|
Lim H, Cankara F, Tsai CJ, Keskin O, Nussinov R, Gursoy A. Artificial intelligence approaches to human-microbiome protein–protein interactions. Curr Opin Struct Biol 2022; 73:102328. [DOI: 10.1016/j.sbi.2022.102328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 12/01/2021] [Accepted: 12/31/2021] [Indexed: 02/08/2023]
|
39
|
Lee D, Xiong D, Wierbowski S, Li L, Liang S, Yu H. Deep learning methods for 3D structural proteome and interactome modeling. Curr Opin Struct Biol 2022; 73:102329. [PMID: 35139457 PMCID: PMC8957610 DOI: 10.1016/j.sbi.2022.102329] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/05/2021] [Accepted: 12/31/2021] [Indexed: 12/19/2022]
Abstract
Bolstered by recent methodological and hardware advances, deep learning has increasingly been applied to biological problems and structural proteomics. Such approaches have achieved remarkable improvements over traditional machine learning methods in tasks ranging from protein contact map prediction to protein folding, prediction of protein-protein interaction interfaces, and characterization of protein-drug binding pockets. In particular, emergence of ab initio protein structure prediction methods including AlphaFold2 has revolutionized protein structural modeling. From a protein function perspective, numerous deep learning methods have facilitated deconvolution of the exact amino acid residues and protein surface regions responsible for binding other proteins or small molecule drugs. In this review, we provide a comprehensive overview of recent deep learning methods applied in structural proteomics.
Collapse
|
40
|
Mahita J, Kim DG, Son S, Choi Y, Kim HS, Bailey-Kellogg C. Computational epitope binning reveals functional equivalence of sequence-divergent paratopes. Comput Struct Biotechnol J 2022; 20:2169-2180. [PMID: 35615020 PMCID: PMC9118127 DOI: 10.1016/j.csbj.2022.04.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 04/27/2022] [Accepted: 04/27/2022] [Indexed: 11/26/2022] Open
Abstract
Epitope binning groups target-specific protein binders recognizing the same binding region. The “Epibin” method utilizes docking models to computationally predict competition and identify bins. Epibin recapitulated binding competition of repebody variants as determined by immunoassays. In addition, Epibin enabled identification of ‘paratope-equivalent’ residues in sequence-dissimilar variants. Computational epitope binning can scale to allow characterization of entire antigen-specific antibody repertoires.
The therapeutic efficacy of a protein binder largely depends on two factors: its binding site and its binding affinity. Advances in in vitro library display screening and next-generation sequencing have enabled accelerated development of strong binders, yet identifying their binding sites still remains a major challenge. The differentiation, or “binning”, of binders into different groups that recognize distinct binding sites on their target is a promising approach that facilitates high-throughput screening of binders that may show different biological activity. Here we study the extent to which the information contained in the amino acid sequences comprising a set of target-specific binders can be leveraged to bin them, inferring functional equivalence of their binding regions, or paratopes, based directly on comparison of the sequences, their modeled structures, or their modeled interactions. Using a leucine-rich repeat binding scaffold known as a “repebody” as the source of diversity in recognition against interleukin-6 (IL-6), we show that the “Epibin” approach introduced here effectively utilized structural modelling and docking to extract specificity information encoded in the repebody amino acid sequences and thereby successfully recapitulate IL-6 binding competition observed in immunoassays. Furthermore, our computational binning provided a basis for designing in vitro mutagenesis experiments to pinpoint specificity-determining residues. Finally, we demonstrate that the Epibin approach can extend to antibodies, retrospectively comparing its predictions to results from antigen-specific antibody competition studies. The study thus demonstrates the utility of modeling structure and binding from the amino acid sequences of different binders against the same target, and paves the way for larger-scale binning and analysis of entire repertoires.
Collapse
|
41
|
BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction. J Mol Biol 2022; 434:167556. [DOI: 10.1016/j.jmb.2022.167556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/12/2022] [Accepted: 03/16/2022] [Indexed: 11/20/2022]
|
42
|
Stringer B, de Ferrante H, Abeln S, Heringa J, Feenstra KA, Haydarlou R. PIPENN: protein interface prediction from sequence with an ensemble of neural nets. Bioinformatics 2022; 38:2111-2118. [PMID: 35150231 PMCID: PMC9004643 DOI: 10.1093/bioinformatics/btac071] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/16/2022] [Accepted: 02/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hans de Ferrante
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - K Anton Feenstra
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | | |
Collapse
|
43
|
Mahbub S, Bayzid MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform 2022; 23:6518045. [PMID: 35106547 DOI: 10.1093/bib/bbab578] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites. RESULTS We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET's network behavior to provide insights about the causes of its decisions. AVAILABILITY EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET. CONTACT shams_bayzid@cse.buet.ac.bd.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science University of Maryland, College Park, Maryland 20742, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
44
|
Xie Z, Xu J. Deep graph learning of inter-protein contacts. Bioinformatics 2021; 38:947-953. [PMID: 34755837 PMCID: PMC8796373 DOI: 10.1093/bioinformatics/btab761] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/06/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. RESULTS We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments. Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54% on the homodimers and 52% on all the dimers, much higher than 30% obtained by the latest deep learning method DeepHomo on the homodimers and 15% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/zw2x/glinter. The datasets are available at https://github.com/zw2x/glinter/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Xie
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- To whom correspondence should be addressed.
| |
Collapse
|
45
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|