1
|
Hashemi N, Hao B, Ignatov M, Paschalidis IC, Vakili P, Vajda S, Kozakov D. Improved prediction of MHC-peptide binding using protein language models. Front Bioinform 2023; 3:1207380. [PMID: 37663788 PMCID: PMC10469926 DOI: 10.3389/fbinf.2023.1207380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.
Collapse
Affiliation(s)
- Nasser Hashemi
- Division of Systems Engineering, Boston University, Boston, MA, United States
| | - Boran Hao
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
| | - Ioannis Ch. Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, United States
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Pirooz Vakili
- Division of Systems Engineering, Boston University, Boston, MA, United States
| | - Sandor Vajda
- Division of Systems Engineering, Boston University, Boston, MA, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
- Department of Chemistry, Boston University, Boston, MA, United States
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| |
Collapse
|
2
|
Pollard ZA, Roshandelpoor A, Vakili P, Ryan E, Goldfarb JL. Towards Tunable Polymer Foam Fabrication: A Case Study to Advance Green Materials Development in Limited Data Scenarios. AIChE J 2022. [DOI: 10.1002/aic.17984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Zoe A. Pollard
- Department of Biological and Environmental Engineering Cornell University Ithaca NY
| | | | - Pirooz Vakili
- Division of Systems Engineering Boston University Boston MA
- Department of Mechanical Engineering Boston University Boston MA
| | - Emily Ryan
- Department of Mechanical Engineering Boston University Boston MA
- Division of Materials Science and Engineering Boston University Boston MA
| | - Jillian L. Goldfarb
- Department of Biological and Environmental Engineering Cornell University Ithaca NY
| |
Collapse
|
3
|
Sotudian S, Desta IT, Hashemi N, Zarbafian S, Kozakov D, Vakili P, Vajda S, Paschalidis IC. Improved cluster ranking in protein-protein docking using a regression approach. Comput Struct Biotechnol J 2021; 19:2269-2278. [PMID: 33995918 PMCID: PMC8102165 DOI: 10.1016/j.csbj.2021.04.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 11/21/2022] Open
Abstract
We develop a Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) method to rank clusters of similar protein complex conformations generated by an underlying docking program. The method leverages robust regression to predict the relative quality difference between any pair or clusters and combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show improvement by 24-100% in ranking acceptable or better quality clusters first, and by 15-100% in ranking medium or better quality clusters first. We compare the RRPCC-ClusPro combination to a number of alternatives, and show that very different machine learning approaches to scoring docked structures yield similar success rates. Finally, we discuss the current limitations on sampling and scoring, looking ahead to further improvements. Interestingly, some features important for improved scoring are internal energy terms that occur only due to the local energy minimization applied in the refinement stage following rigid body docking.
Collapse
Affiliation(s)
| | | | - Nasser Hashemi
- Division of Systems Engineering, Boston University, Boston, USA
| | | | - Dima Kozakov
- Laufer Center for Physical and Quantitative Biology, Institute for Advanced Computational Sciences, Stony Brook University, Stony Brook, USA
| | - Pirooz Vakili
- Division of Systems Engineering, Boston University, Boston, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University
- Department of Chemistry, Boston University
| | - Ioannis Ch. Paschalidis
- Division of Systems Engineering, Boston University, Boston, USA
- Department of Biomedical Engineering, Boston University
- Department of Electrical & Computer Engineering, and Faculty for Computing & Data Sciences, Boston University
| |
Collapse
|
4
|
Mirzaei H, Zarbafian S, Villar E, Mottarella S, Beglov D, Vajda S, Paschalidis IC, Vakili P, Kozakov D. Energy Minimization on Manifolds for Docking Flexible Molecules. J Chem Theory Comput 2016; 11:1063-76. [PMID: 26478722 DOI: 10.1021/ct500155t] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
In this paper, we extend a recently introduced rigid body minimization algorithm, defined on manifolds, to the problem of minimizing the energy of interacting flexible molecules. The goal is to integrate moving the ligand in six dimensional rotational/translational space with internal rotations around rotatable bonds within the two molecules. We show that adding rotational degrees of freedom to the rigid moves of the ligand results in an overall optimization search space that is a manifold to which our manifold optimization approach can be extended. The effectiveness of the method is shown for three different docking problems of increasing complexity. First, we minimize the energy of fragment-size ligands with a single rotatable bond as part of a protein mapping method developed for the identification of binding hot spots. Second, we consider energy minimization for docking a flexible ligand to a rigid protein receptor, an approach frequently used in existing methods. In the third problem, we account for flexibility in both the ligand and the receptor. Results show that minimization using the manifold optimization algorithm is substantially more efficient than minimization using a traditional all-atom optimization algorithm while producing solutions of comparable quality. In addition to the specific problems considered, the method is general enough to be used in a large class of applications such as docking multidomain proteins with flexible hinges. The code is available under open source license (at http://cluspro.bu.edu/Code/Code_Rigtree.tar) and with minimal effort can be incorporated into any molecular modeling package.
Collapse
|
5
|
Mamonov AB, Moghadasi M, Mirzaei H, Zarbafian S, Grove LE, Bohnuud T, Vakili P, Paschalidis IC, Vajda S, Kozakov D. Focused grid-based resampling for protein docking and mapping. J Comput Chem 2016; 37:961-70. [PMID: 26837000 PMCID: PMC4814242 DOI: 10.1002/jcc.24273] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Revised: 08/31/2015] [Accepted: 09/26/2015] [Indexed: 12/27/2022]
Abstract
The fast Fourier transform (FFT) sampling algorithm has been used with success in application to protein-protein docking and for protein mapping, the latter docking a variety of small organic molecules for the identification of binding hot spots on the target protein. Here we explore the local rather than global usage of the FFT sampling approach in docking applications. If the global FFT based search yields a near-native cluster of docked structures for a protein complex, then focused resampling of the cluster generally leads to a substantial increase in the number of conformations close to the native structure. In protein mapping, focused resampling of the selected hot spot regions generally reveals further hot spots that, while not as strong as the primary hot spots, also contribute to ligand binding. The detection of additional ligand binding regions is shown by the improved overlap between hot spots and bound ligands.
Collapse
Affiliation(s)
- Artem B. Mamonov
- Department of Biomedical Engineering, Boston University, Boston MA 02215
| | - Mohammad Moghadasi
- Center for Information and Systems Engineering, Boston University, Boston, MA 02215
| | - Hanieh Mirzaei
- Center for Information and Systems Engineering, Boston University, Boston, MA 02215
| | - Shahrooz Zarbafian
- Department of Mechanical Engineering, Boston University, Boston MA 02215
| | - Laurie E. Grove
- Department of Sciences, Wentworth Institute of Technology, Boston, MA 02115, USA
| | - Tanggis Bohnuud
- Department of Biomedical Engineering, Boston University, Boston MA 02215
| | - Pirooz Vakili
- Center for Information and Systems Engineering, Boston University, Boston, MA 02215
- Department of Mechanical Engineering, Boston University, Boston MA 02215
| | - Ioannis Ch. Paschalidis
- Center for Information and Systems Engineering, Boston University, Boston, MA 02215
- Department of Electrical and Computer Engineering, Boston University, Boston MA 02215
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston MA 02215
- Center for Information and Systems Engineering, Boston University, Boston, MA 02215
- Department of Chemistry, Boston University, Boston MA 02215
| | - Dima Kozakov
- Department of Biomedical Engineering, Boston University, Boston MA 02215
- Departemnt of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11790
| |
Collapse
|
6
|
Kozakov D, Vakili P, Paschalidis IC, Vajda S. 46 Encounter complexes and dimensionality reduction in protein-protein association. J Biomol Struct Dyn 2015. [DOI: 10.1080/07391102.2015.1032595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Moghadasi M, Mirzaei H, Mamonov A, Vakili P, Vajda S, Paschalidis IC, Kozakov D. The impact of side-chain packing on protein docking refinement. J Chem Inf Model 2015; 55:872-81. [PMID: 25714358 PMCID: PMC4734134 DOI: 10.1021/ci500380a] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
We study the impact of optimizing the side-chain positions in the interface region between two proteins during the process of binding. Mathematically, the problem is similar to side-chain prediction, which has been extensively explored in the process of protein structure prediction. The protein-protein docking application, however, has a number of characteristics that necessitate different algorithmic and implementation choices. In this work, we implement a distributed approximate algorithm that can be implemented on multiprocessor architectures and enables a trade-off between accuracy and running speed. We report computational results on benchmarks of enzyme-inhibitor and other types of complexes, establishing that the side-chain flexibility our algorithm introduces substantially improves the performance of docking protocols. Furthermore, we establish that the inclusion of unbound side-chain conformers in the side-chain positioning problem is critical in these performance improvements. The code is available to the community under open source license.
Collapse
Affiliation(s)
- Mohammad Moghadasi
- Division of Systems Engineering & Center for Information and Systems Engineering
| | - Hanieh Mirzaei
- Division of Systems Engineering & Center for Information and Systems Engineering
| | | | - Pirooz Vakili
- Division of Systems Engineering, and Department of Mechanical Engineering
| | | | - Ioannis Ch. Paschalidis
- Department of Electrical and Computer Engineering, Division of Systems Engineering, and Department of Biomedical Engineering
| | | |
Collapse
|
8
|
Vakili P, Mirzaei H, Zarbafian S, Paschalidis IC, Kozakov D, Vajda S. Optimization on the space of rigid and flexible motions: an alternative manifold optimization approach. Proc IEEE Conf Decis Control 2015; 2014:5825-5830. [PMID: 25774073 DOI: 10.1109/cdc.2014.7040301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper we consider the problem of minimization of a cost function that depends on the location and poses of one or more rigid bodies, or bodies that consist of rigid parts hinged together. We present a unified setting for formulating this problem as an optimization on an appropriately defined manifold for which efficient manifold optimizations can be developed. This setting is based on a Lie group representation of the rigid movements of a body that is different from what is commonly used for this purpose. We illustrate this approach by using the steepest descent algorithm on the manifold of the search space and specify conditions for its convergence.
Collapse
Affiliation(s)
- Pirooz Vakili
- Dept. of Mechanical Eng. and Division of Systems Eng., Boston University
| | | | | | | | - Dima Kozakov
- D. Kozakov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University
| | - Sandor Vajda
- D. Kozakov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University
| |
Collapse
|
9
|
Nan F, Moghadasi M, Vakili P, Vajda S, Kozakov D, Ch. Paschalidis I. A Subspace Semi-Definite programming-based Underestimation (SSDU) method for stochastic global optimization in protein docking. Proc IEEE Conf Decis Control 2014; 2014:4623-4628. [PMID: 25914440 PMCID: PMC4405505 DOI: 10.1109/cdc.2014.7040111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We propose a new stochastic global optimization method targeting protein docking problems. The method is based on finding a general convex polynomial underestimator to the binding energy function in a permissive subspace that possesses a funnel-like structure. We use Principal Component Analysis (PCA) to determine such permissive subspaces. The problem of finding the general convex polynomial underestimator is reduced into the problem of ensuring that a certain polynomial is a Sum-of-Squares (SOS), which can be done via semi-definite programming. The underestimator is then used to bias sampling of the energy function in order to recover a deep minimum. We show that the proposed method significantly improves the quality of docked conformations compared to existing methods.
Collapse
Affiliation(s)
- Feng Nan
- Division of Systems Engineering, Boston University
| | | | - Pirooz Vakili
- Department of Mechanical Engineering and Division of Systems Engineering, Boston University
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University
| | - Dima Kozakov
- Department of Biomedical Engineering, Boston University
| | - Ioannis Ch. Paschalidis
- Corresponding author. Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Mary's St., Boston, MA 02215, , http://ionia.bu.edu/
| |
Collapse
|
10
|
Chowdhury R, Beglov D, Moghadasi M, Paschalidis IC, Vakili P, Vajda S, Bajaj C, Kozakov D. Efficient Maintenance and Update of Nonbonded Lists in Macromolecular Simulations. J Chem Theory Comput 2014; 10:4449-4454. [PMID: 25328494 PMCID: PMC4196749 DOI: 10.1021/ct400474w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Indexed: 11/28/2022]
Abstract
Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists.
Collapse
Affiliation(s)
- Rezaul Chowdhury
- Computer Science Department, Stony Brook University , Stony Brook, New York 11790, United States
| | - Dmitri Beglov
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| | - Mohammad Moghadasi
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| | - Ioannis Ch Paschalidis
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States ; Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| | - Pirooz Vakili
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| | - Sandor Vajda
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| | - Chandrajit Bajaj
- Department of Computer Science, University of Texas at Austin , Austin, Texas 78712, United States
| | - Dima Kozakov
- Department of Mechanical Engineering, Division of Systems Engineering, and Department of Electrical and Computer Engineering, Boston University , Boston, Massachusetts 02215, United States
| |
Collapse
|
11
|
Kozakov D, Li K, Hall DR, Beglov D, Zheng J, Vakili P, Schueler-Furman O, Paschalidis IC, Clore GM, Vajda S. Encounter complexes and dimensionality reduction in protein-protein association. eLife 2014; 3:e01370. [PMID: 24714491 PMCID: PMC3978769 DOI: 10.7554/elife.01370] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
An outstanding challenge has been to understand the mechanism whereby proteins associate. We report here the results of exhaustively sampling the conformational space in protein–protein association using a physics-based energy function. The agreement between experimental intermolecular paramagnetic relaxation enhancement (PRE) data and the PRE profiles calculated from the docked structures shows that the method captures both specific and non-specific encounter complexes. To explore the energy landscape in the vicinity of the native structure, the nonlinear manifold describing the relative orientation of two solid bodies is projected onto a Euclidean space in which the shape of low energy regions is studied by principal component analysis. Results show that the energy surface is canyon-like, with a smooth funnel within a two dimensional subspace capturing over 75% of the total motion. Thus, proteins tend to associate along preferred pathways, similar to sliding of a protein along DNA in the process of protein-DNA recognition. DOI:http://dx.doi.org/10.7554/eLife.01370.001 Proteins rarely act alone. Instead, they tend to bind to other proteins to form structures known as complexes. When two proteins come together to form a complex, they twist and turn through a series of intermediate states before they form the actual complex. These intermediate states are difficult to study because they don’t last for very long, which means that our knowledge of how complexes are formed remains incomplete. One promising approach for studying the formation of complexes is called paramagnetic relaxation enhancement. In this technique certain areas in one of the proteins are labelled with magnetic particles, which produce signals when the two proteins are close to each other. Repeating the measurement several times with the magnetic particles in different positions provides information about the overall structure of the complex. Computational modelling can then be used to work out the fine details of the structure, including the shapes of the intermediate structures made by the proteins as they interact. A computer method called docking can be used to predict the most favourable positions that the proteins can take, relative to one another, in a complex. This involves calculating the energy contained in the system, with the correct structure having the lowest energy. Docking methods also predict protein models with slightly higher energies, but with structures that are radically different. Modellers usually ignore these structures, but comparing the docking results to paramagnetic relaxation enhancement data, Kozakov et al. found that these structures actually represent the intermediate states. Analysing the structure of the intermediate states revealed that the movement of the two proteins relative to one another is severely restricted as they form the final complex. Kozakov et al. found that proteins associate along preferred pathways, similar to the way a protein slides along DNA in the process of protein-DNA recognition. Knowing that the movement of the proteins is restricted in this way will enable researchers to improve the efficiency of docking calculations. DOI:http://dx.doi.org/10.7554/eLife.01370.002
Collapse
Affiliation(s)
- Dima Kozakov
- Department of Biomedical Engineering, Boston University, Boston, United States
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Moghadasi M, Kozakov D, Vakili P, Vajda S, Paschalidis IC. A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking *. Proc IEEE Conf Decis Control 2013:739-744. [PMID: 24844567 PMCID: PMC4024309 DOI: 10.1109/cdc.2013.6759970] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Side-chain positioning (SCP) is an important component of computational protein docking methods. Existing SCP methods and available software have been designed for protein folding applications where side-chain positioning is also important. As a result they do not take into account significant special structure that SCP for docking exhibits. We propose a new algorithm which poses SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. We develop an approximate algorithm which solves a relaxation of the MWIS and then rounds the solution to obtain a high-quality feasible solution to the problem. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Motivated by the special structure in docking, we establish optimality guarantees for a certain class of graphs. Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure and are comparable to the ones obtained by a state-of-the-art method. The results are substantially improved if rotamers from unbound protein structures are included in the search. We also establish that the use of our SCP algorithm substantially improves docking results.
Collapse
Affiliation(s)
| | | | | | | | - Ioannis Ch. Paschalidis
- Corresponding author: Dept. of Electrical & Computer Eng., Boston University, 8 Mary’s St., Boston, MA 02215,
| |
Collapse
|
13
|
Mirzaei H, Villar E, Mottarella S, Beglov D, Paschalidis IC, Vajda S, Kozakov D, Vakili P. Flexible Refinement of Protein-Ligand Docking on Manifolds. Proc IEEE Conf Decis Control 2013:1392-1397. [PMID: 24830567 DOI: 10.1109/cdc.2013.6760077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Our work is motivated by energy minimization of biological macromolecules, an essential step in computational docking. By allowing some ligand flexibility, we generalize a recently introduced novel representation of rigid body minimization as an optimization on the [Formula: see text] manifold, rather than on the commonly used Special Euclidean group SE(3). We show that the resulting flexible docking can also be formulated as an optimization on a Lie group that is the direct product of simpler Lie groups for which geodesics and exponential maps can be easily obtained. Our computational results for a local optimization algorithm developed based on this formulation show that it is about an order of magnitude faster than the state-of-the-art local minimization algorithms for computational protein-small molecule docking.
Collapse
Affiliation(s)
| | | | | | - Dmitri Beglov
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, midas, dbeglov,
| | - Ioannis Ch Paschalidis
- Dept. of Electrical & Computer Eng., and Division of Systems Eng., Boston University, 8 Mary's St., Boston, MA 02215,
| | - Sandor Vajda
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, midas, dbeglov,
| | - Dima Kozakov
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, midas, dbeglov,
| | | |
Collapse
|
14
|
Moghadasi M, Kozakov D, Mamonov AB, Vakili P, Vajda S, Paschalidis IC. A Message Passing Approach to Side Chain Positioning with Applications in Protein Docking Refinement. Proc IEEE Conf Decis Control 2012:2310-2315. [PMID: 23515575 DOI: 10.1109/cdc.2012.6426600] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP.
Collapse
|
15
|
Mirzaei H, Kozakov D, Beglov D, Paschalidis IC, Vajda S, Vakili P. A New Approach to Rigid Body Minimization with Application to Molecular Docking. Proc IEEE Conf Decis Control 2012:2983-2988. [PMID: 24763338 DOI: 10.1109/cdc.2012.6426267] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Our work is motivated by energy minimization in the space of rigid affine transformations of macromolecules, an essential step in computational protein-protein docking. We introduce a novel representation of rigid body motion that leads to a natural formulation of the energy minimization problem as an optimization on the [Formula: see text] manifold, rather than the commonly used SE(3). The new representation avoids the complications associated with optimization on the SE(3) manifold and provides additional flexibilities for optimization not available in that formulation. The approach is applicable to general rigid body minimization problems. Our computational results for a local optimization algorithm developed based on the new approach show that it is about an order of magnitude faster than a state of art local minimization algorithms for computational protein-protein docking.
Collapse
Affiliation(s)
| | - Dima Kozakov
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, {midas, dbeglov, vajda}@bu.edu
| | - Dmitri Beglov
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, {midas, dbeglov, vajda}@bu.edu
| | | | - Sandor Vajda
- D. Kozakov, D. Beglov, and S. Vajda are with the Dept. of Biomedical Eng., Boston University, {midas, dbeglov, vajda}@bu.edu
| | | |
Collapse
|
16
|
Mirzaei H, Beglov D, Paschalidis IC, Vajda S, Vakili P, Kozakov D. Rigid Body Energy Minimization on Manifolds for Molecular Docking. J Chem Theory Comput 2012; 8:4374-4380. [PMID: 23382659 DOI: 10.1021/ct300272j] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradient evaluations and leads to a more efficient algorithm. We have selected the LBFGS quasi-Newton method for local optimization since it uses only gradient information to obtain second order information about the energy function and avoids the far more costly direct Hessian evaluations. Two applications, one in protein-protein docking, and the other in protein-small molecular interactions, as part of macromolecular docking protocols are presented. The code is available to the community under open source license, and with minimal effort can be incorporated into any molecular modeling package.
Collapse
Affiliation(s)
- Hanieh Mirzaei
- Division of Systems Engineering, Department of Biomedical Engineering, Department of Electrical and Computer Engineering, and Department of Mechanical Engineering, Boston University, Boston, USA
| | | | | | | | | | | |
Collapse
|
17
|
Kozakov D, Hall DR, Beglov D, Brenke R, Comeau SR, Shen Y, Li K, Zheng J, Vakili P, Paschalidis IC, Vajda S. Achieving reliability and high accuracy in automated protein docking: ClusPro, PIPER, SDU, and stability analysis in CAPRI rounds 13-19. Proteins 2011; 78:3124-30. [PMID: 20818657 DOI: 10.1002/prot.22835] [Citation(s) in RCA: 193] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Our approach to protein-protein docking includes three main steps. First, we run PIPER, a rigid body docking program based on the Fast Fourier Transform (FFT) correlation approach, extended to use pairwise interactions potentials. Second, the 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. Third, the stability of the clusters is analyzed by short Monte Carlo simulations, and the structures are refined by the medium-range optimization method SDU. The first two steps of this approach are implemented in the ClusPro 2.0 protein-protein docking server. Despite being fully automated, the last step is computationally too expensive to be included in the server. When comparing the models obtained in CAPRI rounds 13-19 by ClusPro, by the refinement of the ClusPro predictions and by all predictor groups, we arrived at three conclusions. First, for the first time in the CAPRI history, our automated ClusPro server was able to compete with the best human predictor groups. Second, selecting the top ranked models, our current protocol reliably generates high-quality structures of protein-protein complexes from the structures of separately crystallized proteins, even in the absence of biological information, provided that there is limited backbone conformational change. Third, despite occasional successes, homology modeling requires further improvement to achieve reliable docking results.
Collapse
Affiliation(s)
- Dima Kozakov
- BioMolecular Engineering Research Center, Boston University, Boston, Massachusetts 02215, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Shen Y, Paschalidis IC, Vakili P, Vajda S. Protein docking by the underestimation of free energy funnels in the space of encounter complexes. PLoS Comput Biol 2008; 4:e1000191. [PMID: 18846200 PMCID: PMC2538569 DOI: 10.1371/journal.pcbi.1000191] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Accepted: 08/22/2008] [Indexed: 11/19/2022] Open
Abstract
Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 Å ligand interface Cα root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods. Protein–protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genomewide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop computational docking methods that can elucidate the details of specific interactions at the atomic level. Protein–protein docking generally starts with a rigid body search that generates a large number of docked conformations with good shape, electrostatic, and chemical complementarity. The conformations are clustered to obtain a manageable number of models, but the current methods are unable to select the most likely structure among these models. Here we describe a refinement algorithm that, applied to the individual clusters, improves the quality of the models. The better models are suitable for higher-accuracy energy calculation, thereby increasing the chances that near-native structures can be identified, and thus the refinement increases the reliability of the entire docking algorithm.
Collapse
Affiliation(s)
- Yang Shen
- BioMolecular Engineering Research Center, Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Center for Information and Systems Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Ioannis Ch. Paschalidis
- Center for Information and Systems Engineering, Boston University, Boston, Massachusetts, United States of America
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Pirooz Vakili
- Center for Information and Systems Engineering, Boston University, Boston, Massachusetts, United States of America
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Mechanical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Sandor Vajda
- BioMolecular Engineering Research Center, Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
19
|
Paschalidis IC, Shen Y, Vakili P, Vajda S. Protein-protein docking with reduced potentials by exploiting multi-dimensional energy funnels. Conf Proc IEEE Eng Med Biol Soc 2007; 2006:5330-3. [PMID: 17946298 DOI: 10.1109/iembs.2006.260790] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We propose a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. Our approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using a global optimization method we have developed - the semi-definite underestimation (SDU) method - which can exploit a funnel-like energy function. We compared our approach with Monte Carlo on a set of 10 protein complexes using two residue-level potentials. To achieve the same level of performance (produce a near-native < or =3 A RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average.
Collapse
Affiliation(s)
- Ioannis Ch Paschalidis
- Center for Information & Systems Eng., and Dept. of Manufacturing Eng., Boston University, Blookline, MA 2446, USA.
| | | | | | | |
Collapse
|
20
|
Paschalidis IC, Shen Y, Vakili P, Vajda S. SDU: A Semidefinite Programming-Based Underestimation Method for Stochastic Global Optimization in Protein Docking. IEEE Trans Automat Contr 2007; 52:664-676. [PMID: 19759849 PMCID: PMC2744142 DOI: 10.1109/tac.2007.894518] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This paper introduces a new stochastic global optimization method targeting protein-protein docking problems, an important class of problems in computational structural biology. The method is based on finding general convex quadratic underestimators to the binding energy function that is funnel-like. Finding the optimum underestimator requires solving a semidefinite programming problem, hence the name semidefinite programming-based underestimation (SDU). The underestimator is used to bias sampling in the search region. It is established that under appropriate conditions SDU locates the global energy minimum with probability approaching one as the sample size grows. A detailed comparison of SDU with a related method of convex global underestimator (CGU), and computational results for protein-protein docking problems are provided.
Collapse
Affiliation(s)
- Ioannis Ch. Paschalidis
- Center for Information and Systems Engineering, and Department of Manufacturing Engineering, and the Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215 USA (e-mail: )
| | - Yang Shen
- Center for Information and Systems Engineering, and Department of Manufacturing Engineering, Boston University, Boston, MA 02215 USA (e-mail: )
| | - Pirooz Vakili
- Center for Information and Systems Engineering, and Department of Manufacturing Engineering, Boston University, Boston, MA 02215 USA (e-mail: )
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215 USA (e-mail: )
| |
Collapse
|