1
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
2
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
3
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
4
|
Xu G, Ma T, Wang Q, Ma J. OPUS-SSF: A side-chain-inclusive scoring function for ranking protein structural models. Protein Sci 2019; 28:1157-1162. [PMID: 30919509 DOI: 10.1002/pro.3608] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/21/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
We introduce a side-chain-inclusive scoring function, named OPUS-SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS-CSF [Xu et al., Protein Sci. 2018; 27: 286-292], which exclusively uses main chain information, OPUS-SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS-SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS-CSF, OPUS-SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS-SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China.,Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| |
Collapse
|
5
|
Pei J, Zheng Z, Merz KM. Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection. J Chem Inf Model 2019; 59:1919-1929. [DOI: 10.1021/acs.jcim.8b00734] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jun Pei
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Zheng Zheng
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
- Institute for Cyber Enabled Research, Michigan State University, 567 Wilson Road, East Lansing, Michigan 48824, United States
| |
Collapse
|
6
|
Chu H, Liu H. TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones. J Chem Inf Model 2018; 58:430-442. [PMID: 29314837 DOI: 10.1021/acs.jcim.7b00677] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To construct backbone structures of high designability is a primary aspect of computational protein design. We report here a side chain-independent statistical energy that aims at realistic modeling of through-space packing of polypeptide backbones. To mitigate the lack of explicit amino acid side chains, the model treats the interbackbone site packing as being dependent on peptide local conformation. In addition, new variables suitable for statistical analysis, one for relative orientation and another for distance, have been introduced to represent the intersite geometry based on the asymmetrical tetrahedron organization of distinct chemical groups surrounding the Cα-carbon atoms. The resulting tetrahedron-based backbone statistical energy (tetraBASE) model has been used to optimize the tertiary organizations of secondary structure elements (SSEs) of designated types with Monte Caro simulated annealing, starting from artificial initial configurations. The tetraBASE minimum energy structures can reproduce SSE packing frequently observed in native proteins with atomic root-mean-square deviations of 1-2 Å. The model has also been tested by examining the stability of native SSE arrangements under tetraBASE. The results suggest that tetraBASE model can be used to effectively represent interbackbone packing when designing backbone structures without explicitly knowing side chain types.
Collapse
Affiliation(s)
- Huanyu Chu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China
| | - Haiyan Liu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China.,Collaborative Innovation Center of Chemistry for Life Sciences , 230027 Hefei, Anhui China
| |
Collapse
|
7
|
Xu G, Ma T, Zang T, Wang Q, Ma J. OPUS-CSF: A C-atom-based scoring function for ranking protein structural models. Protein Sci 2017; 27:286-292. [PMID: 29047165 PMCID: PMC5734313 DOI: 10.1002/pro.3327] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 10/14/2017] [Accepted: 10/16/2017] [Indexed: 12/12/2022]
Abstract
We report a C‐atom‐based scoring function, named OPUS‐CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS‐CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all‐atom empirical potentials. The average correlation coefficient with TM‐score was also comparable with those of other potentials. OPUS‐CSF is a highly coarse‐grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing, China.,Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| |
Collapse
|
8
|
Xu G, Ma T, Zang T, Sun W, Wang Q, Ma J. OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. J Mol Biol 2017; 429:3113-3120. [PMID: 28864201 DOI: 10.1016/j.jmb.2017.08.013] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 07/27/2017] [Accepted: 08/22/2017] [Indexed: 01/18/2023]
Abstract
We report a new distance- and orientation-dependent, all-atom statistical potential derived from side-chain packing, named OPUS-DOSP, for protein structure modeling. The framework of OPUS-DOSP is based on OPUS-PSP, previously developed by us [JMB (2008), 376, 288-301], with refinement and new features. In particular, distance or orientation contribution is considered depending on the range of contact distance. A new auxiliary function in energy function is also introduced, in addition to the traditional Boltzmann term, in order to adjust the contributions of extreme cases. OPUS-DOSP was tested on 11 decoy sets commonly used for statistical potential benchmarking. Among 278 native structures, 239 and 249 native structures were recognized by OPUS-DOSP without and with the auxiliary function, respectively. The results show that OPUS-DOSP has an increased decoy recognition capability comparing with those of other relevant potentials to date.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States
| | - Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing 100084, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, United States
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, China; Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States; Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, United States.
| |
Collapse
|
9
|
Borguesan B, Inostroza-Ponta M, Dorn M. NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins. J Comput Biol 2017; 24:255-265. [DOI: 10.1089/cmb.2016.0074] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Bruno Borguesan
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Mario Inostroza-Ponta
- Departamento de Ingeniería Informática, Center for Biotechnology and Bioengineering, Universidad de Santiago de Chile, Santiago, Chile
| | - Márcio Dorn
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| |
Collapse
|
10
|
|
11
|
Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015; 427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]
|
12
|
Zheng F, Zhang J, Grigoryan G. Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships. Structure 2015; 23:961-971. [DOI: 10.1016/j.str.2015.03.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 03/02/2015] [Accepted: 03/22/2015] [Indexed: 02/08/2023]
|
13
|
Subramaniam S, Senes A. Backbone dependency further improves side chain prediction efficiency in the Energy-based Conformer Library (bEBL). Proteins 2014; 82:3177-87. [PMID: 25212195 DOI: 10.1002/prot.24685] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 08/21/2014] [Accepted: 09/03/2014] [Indexed: 12/11/2022]
Abstract
Side chain optimization is an integral component of many protein modeling applications. In these applications, the conformational freedom of the side chains is often explored using libraries of discrete, frequently occurring conformations. Because side chain optimization can pose a computationally intensive combinatorial problem, the nature of these conformer libraries is important for ensuring efficiency and accuracy in side chain prediction. We have previously developed an innovative method to create a conformer library with enhanced performance. The Energy-based Library (EBL) was obtained by analyzing the energetic interactions between conformers and a large number of natural protein environments from crystal structures. This process guided the selection of conformers with the highest propensity to fit into spaces that should accommodate a side chain. Because the method requires a large crystallographic data-set, the EBL was created in a backbone-independent fashion. However, it is well established that side chain conformation is strongly dependent on the local backbone geometry, and that backbone-dependent libraries are more efficient in side chain optimization. Here we present the backbone-dependent EBL (bEBL), whose conformers are independently sorted for each populated region of Ramachandran space. The resulting library closely mirrors the local backbone-dependent distribution of side chain conformation. Compared to the EBL, we demonstrate that the bEBL uses fewer conformers to produce similar side chain prediction outcomes, thus further improving performance with respect to the already efficient backbone-independent version of the library.
Collapse
Affiliation(s)
- Sabareesh Subramaniam
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, 53706
| | | |
Collapse
|
14
|
Moal IH, Fernandez-Recio J. Intermolecular Contact Potentials for Protein-Protein Interactions Extracted from Binding Free Energy Changes upon Mutation. J Chem Theory Comput 2013; 9:3715-27. [PMID: 26584123 DOI: 10.1021/ct400295z] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Understanding and predicting the energetics of protein-protein interactions is fundamental to the structural modeling of protein complexes. Binding free energy can be approximated as a sum of pairwise atomic or residue contact energies, which are commonly inferred from contact frequencies observed in experimental protein structures. However, such statistically inferred potentials require certain assumptions and approximation. Here, we explore the possibility of deriving atomic and residue contact potentials directly from experimental binding free energy changes following mutation and present a number of such potentials. The first set of potentials is obtained by unweighted least-squares fitting and bootsrap aggregating. The second set is calculated using a weighting scheme optimized against absolute binding affinity data, so as to account for the over-representation of certain complexes, residues, and families of interactions. The congruence of the potentials with known physical chemistry is investigated. The potentials are further validated by ranking and clustering protein-protein docking poses.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernandez-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
15
|
Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy functions in de novo protein design: current challenges and future prospects. Annu Rev Biophys 2013; 42:315-35. [PMID: 23451890 DOI: 10.1146/annurev-biophys-083012-130315] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In the past decade, a concerted effort to successfully capture specific tertiary packing interactions produced specific three-dimensional structures for many de novo designed proteins that are validated by nuclear magnetic resonance and/or X-ray crystallographic techniques. However, the success rate of computational design remains low. In this review, we provide an overview of experimentally validated, de novo designed proteins and compare four available programs, RosettaDesign, EGAD, Liang-Grishin, and RosettaDesign-SR, by assessing designed sequences computationally. Computational assessment includes the recovery of native sequences, the calculation of sizes of hydrophobic patches and total solvent-accessible surface area, and the prediction of structural properties such as intrinsic disorder, secondary structures, and three-dimensional structures. This computational assessment, together with a recent community-wide experiment in assessing scoring functions for interface design, suggests that the next-generation protein-design scoring function will come from the right balance of complementary interaction terms. Such balance may be found when more negative experimental data become available as part of a training set.
Collapse
Affiliation(s)
- Zhixiu Li
- School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | | | |
Collapse
|
16
|
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel 2012; 25:507-21. [PMID: 22661385 PMCID: PMC3449398 DOI: 10.1093/protein/gzs024] [Citation(s) in RCA: 173] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Revised: 04/14/2012] [Accepted: 04/19/2012] [Indexed: 11/12/2022] Open
Abstract
Recent clinical trials using antibodies with low toxicity and high efficiency have raised expectations for the development of next-generation protein therapeutics. However, the process of obtaining therapeutic antibodies remains time consuming and empirical. This review summarizes recent progresses in the field of computer-aided antibody development mainly focusing on antibody modeling, which is divided essentially into two parts: (i) modeling the antigen-binding site, also called the complementarity determining regions (CDRs), and (ii) predicting the relative orientations of the variable heavy (V(H)) and light (V(L)) chains. Among the six CDR loops, the greatest challenge is predicting the conformation of CDR-H3, which is the most important in antigen recognition. Further computational methods could be used in drug development based on crystal structures or homology models, including antibody-antigen dockings and energy calculations with approximate potential functions. These methods should guide experimental studies to improve the affinities and physicochemical properties of antibodies. Finally, several successful examples of in silico structure-based antibody designs are reviewed. We also briefly review structure-based antigen or immunogen design, with application to rational vaccine development.
Collapse
Affiliation(s)
- Daisuke Kuroda
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, Japan.
| | | | | | | |
Collapse
|
17
|
Yu CM, Peng HP, Chen IC, Lee YC, Chen JB, Tsai KC, Chen CT, Chang JY, Yang EW, Hsu PC, Jian JW, Hsu HJ, Chang HJ, Hsu WL, Huang KF, Ma AC, Yang AS. Rationalization and design of the complementarity determining region sequences in an antibody-antigen recognition interface. PLoS One 2012; 7:e33340. [PMID: 22457753 PMCID: PMC3310866 DOI: 10.1371/journal.pone.0033340] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 02/14/2012] [Indexed: 12/01/2022] Open
Abstract
Protein-protein interactions are critical determinants in biological systems. Engineered proteins binding to specific areas on protein surfaces could lead to therapeutics or diagnostics for treating diseases in humans. But designing epitope-specific protein-protein interactions with computational atomistic interaction free energy remains a difficult challenge. Here we show that, with the antibody-VEGF (vascular endothelial growth factor) interaction as a model system, the experimentally observed amino acid preferences in the antibody-antigen interface can be rationalized with 3-dimensional distributions of interacting atoms derived from the database of protein structures. Machine learning models established on the rationalization can be generalized to design amino acid preferences in antibody-antigen interfaces, for which the experimental validations are tractable with current high throughput synthetic antibody display technologies. Leave-one-out cross validation on the benchmark system yielded the accuracy, precision, recall (sensitivity) and specificity of the overall binary predictions to be 0.69, 0.45, 0.63, and 0.71 respectively, and the overall Matthews correlation coefficient of the 20 amino acid types in the 24 interface CDR positions was 0.312. The structure-based computational antibody design methodology was further tested with other antibodies binding to VEGF. The results indicate that the methodology could provide alternatives to the current antibody technologies based on animal immune systems in engineering therapeutic and diagnostic antibodies against predetermined antigen epitopes.
Collapse
Affiliation(s)
- Chung-Ming Yu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Ing-Chien Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yu-Ching Lee
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | | | - Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - Po-Chiang Hsu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Hung-Ju Hsu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Graduate Institute of Life Sciences, National Defense University, Taipei, Taiwan
| | - Hung-Ju Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biochemical Science, National Taiwan University, Taipei, Taiwan
- Chemical Biology and Molecular Biophysics Program, Taiwan International Graduate Program, Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - Kai-Fa Huang
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Alex Che Ma
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
18
|
Sundaramurthy P, Sreenivasan R, Shameer K, Gakkhar S, Sowdhamini R. HORIBALFRE program: Higher Order Residue Interactions Based ALgorithm for Fold REcognition. Bioinformation 2011; 7:352-9. [PMID: 22355236 PMCID: PMC3280490 DOI: 10.6026/97320630007352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 11/24/2011] [Indexed: 11/23/2022] Open
Abstract
Understanding the functional and structural implication of a protein encoded in novel genes using function association or fold recognition approaches remains to be a challenging task in the current era of genomes, metagenomes and personal genomes. In an attempt to enhance potential-based fold-recognition methods in recognizing remote homology between proteins, we propose a new approach "Higher Order Residue Interaction Based ALgorithm for Fold REcognition (HORIBALFRE)". Higher order residue interactions refer to a class of interactions in protein structures mediated by C(α) or C(β) atoms within a pre-defined distance cut-off. Higher order residue interactions (pairwise, triplet and quadruplet interactions) play a vital role in attaining the stable conformation of a protein structure. In HORIBALFRE, we incorporated the potential contributions from two body (pairwise) interactions, three body (triplet interactions) and four-body (quadruple interaction) interactions, to implement a new fold recognition algorithm. Core of HORIBALFRE algorithm includes the potentials generated from a library of protein structure derived from manually curated CAMPASS database of structure based sequence alignment. We used Fischer's dataset, with 68 templates and 56 target sequences, derived from SCOP database and performed one-against-all sequence alignment using TCoffee. Various potentials were derived using custom scripts and these potentials were incorporated in the HORIBALFRE algorithm. In this manuscript, we report outline of a novel fold recognition algorithm and initial results. Our results show that inclusion of quadruplet class of higher order residue interaction improves fold recognition.
Collapse
Affiliation(s)
- Pandurangan Sundaramurthy
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee -247667, India
| | - Raashi Sreenivasan
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Centre for Biotechnology, Anna University, Chennai - 600025, India
- University of Wisconsin-Madison, Madison, WI 53706-1481, USA; 5Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55901 USA
| | - Khader Shameer
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
- Authors contributed equally to this work
| | - Sunita Gakkhar
- Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee -247667, India
| | - Ramanathan Sowdhamini
- National Center for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore - 560065, India
| |
Collapse
|
19
|
Hu X, Hu H, Beratan DN, Yang W. A gradient-directed Monte Carlo approach for protein design. J Comput Chem 2010; 31:2164-8. [PMID: 20186860 DOI: 10.1002/jcc.21506] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We develop a new global optimization strategy, gradient-directed Monte Carlo (GDMC) sampling, to optimize protein sequence for a target structure using RosettaDesign. GDMC significantly improves the sampling of sequence space, compared to the classical Monte Carlo search protocol, for a fixed backbone conformation as well as for the simultaneous optimization of sequence and structure. As such, GDMC sampling enhances the efficiency of protein design.
Collapse
Affiliation(s)
- Xiangqian Hu
- Department of Chemistry, French Family Science Center, Duke University, Durham, North Carolina 27708-0346, USA
| | | | | | | |
Collapse
|
20
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
21
|
Klenin K, Strodel B, Wales DJ, Wenzel W. Modelling proteins: conformational sampling and reconstruction of folding kinetics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1814:977-1000. [PMID: 20851219 DOI: 10.1016/j.bbapap.2010.09.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Revised: 09/03/2010] [Accepted: 09/05/2010] [Indexed: 01/08/2023]
Abstract
In the last decades biomolecular simulation has made tremendous inroads to help elucidate biomolecular processes in-silico. Despite enormous advances in molecular dynamics techniques and the available computational power, many problems involve long time scales and large-scale molecular rearrangements that are still difficult to sample adequately. In this review we therefore summarise recent efforts to fundamentally improve this situation by decoupling the sampling of the energy landscape from the description of the kinetics of the process. Recent years have seen the emergence of many advanced sampling techniques, which permit efficient characterisation of the relevant family of molecular conformations by dispensing with the details of the short-term kinetics of the process. Because these methods generate thermodynamic information at best, they must be complemented by techniques to reconstruct the kinetics of the process using the ensemble of relevant conformations. Here we review recent advances for both types of methods and discuss their perspectives to permit efficient and accurate modelling of large-scale conformational changes in biomolecules. This article is part of a Special Issue entitled: Protein Dynamics: Experimental and Computational Approaches.
Collapse
Affiliation(s)
- Konstantin Klenin
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, P.O. Box 3640, D-76021 Karlsruhe, Germany
| | | | | | | |
Collapse
|
22
|
Solis AD, Rackovsky SR. Information-theoretic analysis of the reference state in contact potentials used for protein structure prediction. Proteins 2010; 78:1382-97. [PMID: 20034109 DOI: 10.1002/prot.22652] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi-chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information-theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA.
| | | |
Collapse
|
23
|
DeBartolo J, Hocky G, Wilde M, Xu J, Freed KF, Sosnick TR. Protein structure prediction enhanced with evolutionary diversity: SPEED. Protein Sci 2010; 19:520-34. [PMID: 20066664 DOI: 10.1002/pro.330] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of phi,psi backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.
Collapse
Affiliation(s)
- Joe DeBartolo
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | | | |
Collapse
|
24
|
Sundaramurthy P, Shameer K, Sreenivasan R, Gakkhar S, Sowdhamini R. HORI: a web server to compute Higher Order Residue Interactions in protein structures. BMC Bioinformatics 2010; 11 Suppl 1:S24. [PMID: 20122196 PMCID: PMC3009495 DOI: 10.1186/1471-2105-11-s1-s24] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Folding of a protein into its three dimensional structure is influenced by both local and global interactions within a protein. Higher order residue interactions, like pairwise, triplet and quadruplet ones, play a vital role in attaining the stable conformation of the protein structure. It is generally agreed that higher order interactions make significant contribution to the potential energy landscape of folded proteins and therefore it is important to identify them to estimate their contributions to overall stability of a protein structure. RESULTS We developed HORI [Higher order residue interactions in proteins], a web server for the calculation of global and local higher order interactions in protein structures. The basic algorithm of HORI is designed based on the classical concept of four-body nearest-neighbour propensities of amino-acid residues. It has been proved that higher order residue interactions up to the level of quadruple interactions plays a major role in the three-dimensional structure of proteins and is an important feature that can be used in protein structure analysis. CONCLUSION HORI server will be a useful resource for the structural bioinformatics community to perform analysis on protein structures based on higher order residue interactions. HORI server is a highly interactive web server designed in three modules that enables the user to analyse higher order residue interactions in protein structures. HORI server is available from the URL: http://caps.ncbs.res.in/hori.
Collapse
Affiliation(s)
- Pandurangan Sundaramurthy
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, 560065, India.
| | | | | | | | | |
Collapse
|
25
|
Ruiz-Blanco Yasser B, García Y, Sotomayor-Torres C, Yovani MP. New set of 2D/3D thermodynamic indices for proteins. A formalism based on “Molten Globule” theory. ACTA ACUST UNITED AC 2010. [DOI: 10.1016/j.phpro.2010.10.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
26
|
Statistical theory of neutral protein evolution by random site mutations. J CHEM SCI 2009. [DOI: 10.1007/s12039-009-0105-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
27
|
Bhattacherjee A, Biswas P. Combinatorial design of protein sequences with applications to lattice and real proteins. J Chem Phys 2009; 131:125101. [DOI: 10.1063/1.3236519] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
28
|
Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009; 5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open
Abstract
The three-dimensional structures of proteins are stabilized by the interactions between amino acid residues. Here we report a method where four distances are calculated between any two side chains to provide an exact spatial definition of their bonds. The data were binned into a four-dimensional grid and compared to a random model, from which the preference for specific four-distances was calculated. A clear relation between the quality of the experimental data and the tightness of the distance distribution was observed, with crystal structure data providing far tighter distance distributions than NMR data. Since the four-distance data have higher information content than classical bond descriptions, we were able to identify many unique inter-residue features not found previously in proteins. For example, we found that the side chains of Arg, Glu, Val and Leu are not symmetrical in respect to the interactions of their head groups. The described method may be developed into a function, which computationally models accurately protein structures.
Collapse
Affiliation(s)
- Mati Cohen
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gideon Schreiber
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
29
|
Bhattacherjee A, Biswas P. Statistical Theory of Protein Sequence Design by Random Mutation. J Phys Chem B 2009; 113:5520-7. [DOI: 10.1021/jp810515s] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Parbati Biswas
- Department of Chemistry, University of Delhi, Delhi-110007
| |
Collapse
|
30
|
Li Q, Zhou C, Liu H. Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins 2009; 74:820-36. [PMID: 18704928 DOI: 10.1002/prot.22191] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.
Collapse
Affiliation(s)
- Quan Li
- School of Life Sciences, and Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, Anhui 230027, China
| | | | | |
Collapse
|
31
|
Chiu YY, Hwang JK, Yang JM. Soft energy function and generic evolutionary method for discriminating native from nonnative protein conformations. J Comput Chem 2008; 29:1364-73. [PMID: 18181137 DOI: 10.1002/jcc.20897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have developed a soft energy function, termed GEMSCORE, for the protein structure prediction, which is one of emergent issues in the computational biology. The GEMSORE consists of the van der Waals, the hydrogen-bonding potential and the solvent potential with 12 parameters which are optimized by using a generic evolutionary method. The GEMSCORE is able to successfully identify 86 native proteins among 96 target proteins on six decoy sets from more 70,000 near-native structures. For these six benchmark datasets, the predictive performance of the GEMSCORE, based on native structure ranking and Z-scores, was superior to eight other energy functions. Our method is based solely on a simple and linear function and thus is considerably faster than other methods that rely on the additional complex calculations. In addition, the GEMSCORE recognized 17 and 2 native structures as the first and the second rank, respectively, among 21 targets in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction). These results suggest that the GEMSCORE is fast and performs well to discriminate between native and nonnative structures from thousands of protein structure candidates. We believe that GEMSCORE is robust and should be a useful energy function for the protein structure prediction.
Collapse
Affiliation(s)
- Yi-yuan Chiu
- Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan
| | | | | |
Collapse
|
32
|
Rakhmanov SV, Makeev VJ. Stochastic modeling of noninteracting probes in the protein structure space for construction of knowledge-based potentials for atom-atom interactions. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350908030019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
33
|
Abstract
A distance-dependent knowledge-based potential for protein-protein interactions is derived and tested for application in protein design. Information on residue type specific C(alpha) and C(beta) pair distances is extracted from complex crystal structures in the Protein Data Bank and used in the form of radial distribution functions. The use of only backbone and C(beta) position information allows generation of relative protein-protein orientation poses with minimal sidechain information. Further coarse-graining can be done simply in the same theoretical framework to give potentials for residues of known type interacting with unknown type, as in a one-sided interface design problem. Both interface design via pose generation followed by sidechain repacking and localized protein-protein docking tests are performed on 39 nonredundant antibody-antigen complexes for which crystal structures are available. As reference, Lennard-Jones potentials, unspecific for residue type and biasing toward varying degrees of residue pair separation are used as controls. For interface design, the knowledge-based potentials give the best combination of consistently designable poses, low RMSD to the known structure, and more tightly bound interfaces with no added computational cost. 77% of the poses could be designed to give complexes with negative free energies of binding. Generally, larger interface separation promotes designability, but weakens the binding of the resulting designs. A localized docking test shows that the knowledge-based nature of the potentials improves performance and compares respectably with more sophisticated all-atoms potentials.
Collapse
Affiliation(s)
- Louis A Clark
- Biogen Idec Inc., Protein Engineering Group, Cambridge, Massachusetts 02142, USA.
| | | |
Collapse
|
34
|
Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007; 68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-0320, USA
| | | | | |
Collapse
|
35
|
OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. J Mol Biol 2007; 376:288-301. [PMID: 18177896 DOI: 10.1016/j.jmb.2007.11.033] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Revised: 11/06/2007] [Accepted: 11/13/2007] [Indexed: 11/22/2022]
Abstract
Here we report an orientation-dependent statistical all-atom potential derived from side-chain packing, named OPUS-PSP. It features a basis set of 19 rigid-body blocks extracted from the chemical structures of all 20 amino acid residues. The potential is generated from the orientation-specific packing statistics of pairs of those blocks in a non-redundant structural database. The purpose of such an approach is to capture the essential elements of orientation dependence in molecular packing interactions. Tests of OPUS-PSP on commonly used decoy sets demonstrate that it significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and consistency in achieving high Z-scores across decoy sets. As OPUS-PSP excludes interactions among main-chain atoms, its success highlights the crucial importance of side-chain packing in forming native protein structures. Moreover, OPUS-PSP does not explicitly include solvation terms, and thus the potential should perform well when the solvation effect is difficult to determine, such as in membrane proteins. Overall, OPUS-PSP is a generally applicable potential for protein structure modeling, especially for handling side-chain conformations, one of the most difficult steps in high-accuracy protein structure prediction and refinement.
Collapse
|
36
|
Fung HK, Floudas CA, Taylor MS, Zhang L, Morikis D. Toward full-sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys J 2007; 94:584-99. [PMID: 17827237 PMCID: PMC2157230 DOI: 10.1529/biophysj.107.110627] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
In this article, we introduce and apply our de novo protein design framework, which observes true backbone flexibility, to the redesign of human beta-defensin-2, a 41-residue cationic antimicrobial peptide of the innate immune system. The flexible design templates are generated using molecular dynamics simulations with both Generalized Born implicit solvation and explicit water molecules. These backbone templates were employed in addition to the x-ray crystal structure for designing human beta-defensin-2. The computational efficiency of our framework was demonstrated with the full-sequence design of the peptide with flexible backbone templates, corresponding to the mutation of all positions except the native cysteines.
Collapse
Affiliation(s)
- Ho Ki Fung
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey, USA
| | | | | | | | | |
Collapse
|
37
|
Wu Y, Lu M, Chen M, Li J, Ma J. OPUS-Ca: a knowledge-based potential function requiring only Calpha positions. Protein Sci 2007; 16:1449-63. [PMID: 17586777 PMCID: PMC2206690 DOI: 10.1110/ps.072796107] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
38
|
Rakhmanov SV, Makeev VJ. Atomic hydration potentials using a Monte Carlo Reference State (MCRS) for protein solvation modeling. BMC STRUCTURAL BIOLOGY 2007; 7:19. [PMID: 17397537 PMCID: PMC1852318 DOI: 10.1186/1472-6807-7-19] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2006] [Accepted: 03/30/2007] [Indexed: 11/10/2022]
Abstract
Background Accurate description of protein interaction with aqueous solvent is crucial for modeling of protein folding, protein-protein interaction, and drug design. Efforts to build a working description of solvation, both by continuous models and by molecular dynamics, yield controversial results. Specifically constructed knowledge-based potentials appear to be promising for accounting for the solvation at the molecular level, yet have not been used for this purpose. Results We developed original knowledge-based potentials to study protein hydration at the level of atom contacts. The potentials were obtained using a new Monte Carlo reference state (MCRS), which simulates the expected probability density of atom-atom contacts via exhaustive sampling of structure space with random probes. Using the MCRS allowed us to calculate the expected atom contact densities with high resolution over a broad distance range including very short distances. Knowledge-based potentials for hydration of protein atoms of different types were obtained based on frequencies of their contacts at different distances with protein-bound water molecules, in a non-redundant training data base of 1776 proteins with known 3D structures. Protein hydration sites were predicted in a test set of 12 proteins with experimentally determined water locations. The MCRS greatly improves prediction of water locations over existing methods. In addition, the contribution of the energy of macromolecular solvation into total folding free energy was estimated, and tested in fold recognition experiments. The correct folds were preferred over all the misfolded decoys for the majority of proteins from the improved Rosetta decoy set based on the structure hydration energy alone. Conclusion MCRS atomic hydration potentials provide a detailed distance-dependent description of hydropathies of individual protein atoms. This allows placement of water molecules on the surface of proteins and in protein interfaces with much higher precision. The potentials provide a means to estimate the total solvation energy for a protein structure, in many cases achieving a successful fold recognition. Possible applications of atomic hydration potentials to structure verification, protein folding and stability, and protein-protein interactions are discussed.
Collapse
Affiliation(s)
- Sergei V Rakhmanov
- Institute of Genetics and Selection of Industrial Microorganisms, State Research Centre GosNIIgenetika, 1Dorozhny proezd, 1, Moscow, Russia
| | - Vsevolod J Makeev
- Institute of Genetics and Selection of Industrial Microorganisms, State Research Centre GosNIIgenetika, 1Dorozhny proezd, 1, Moscow, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova str. 32, Moscow, Russia
| |
Collapse
|
39
|
Deechongkit S, Aoki KH, Park SS, Kerwin BA. Biophysical comparability of the same protein from different manufacturers: a case study using Epoetin alfa from Epogen and Eprex. J Pharm Sci 2006; 95:1931-43. [PMID: 16850392 DOI: 10.1002/jps.20649] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study focuses on the development and application of biophysical methodology to characterize conformations of Epogen and Eprex, the injectable formulations of recombinant human Epoetin alfa produced by different manufacturers and commonly used for the treatment of renal anemia. In these studies Eprex, from prefilled syringes, and Epogen bulk product formulated in a buffer similar to the Eprex formulation, were purified by anion-exchange chromatography. Analytical ultracentrifugation studies of the purified main peak from each sample demonstrated that Epogen contains a single component with an s value of 2.51 while Eprex contains a single component with the same molecular weight but with an s value of 2.44 suggesting a slight difference in hydrodynamic structure. The degree of alpha-helicity was compared by far-UV circular dichroism and shown to contain slight differences. Intrinsic tryptophan fluorescence and near-UV circular dichroism were assessed and demonstrated additional differences between the proteins. Finally, the global stability of the proteins was monitored using thermal unfolding monitored by far-UV circular dichroism. The Epoetin alfa of Epogen demonstrated complete reversibility while the Epoetin alfa purified from Eprex demonstrated only 80%-85% thermal reversibility when heated to 100 degrees C. Together the data indicate that the proteins are not structurally identical.
Collapse
Affiliation(s)
- Songpon Deechongkit
- Department of Pharmaceutics, Amgen, Inc., One Amgen Center Drive, Thousand Oaks, California 91320, USA
| | | | | | | |
Collapse
|
40
|
Poole AM, Ranganathan R. Knowledge-based potentials in protein design. Curr Opin Struct Biol 2006; 16:508-13. [PMID: 16843652 DOI: 10.1016/j.sbi.2006.06.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Revised: 06/07/2006] [Accepted: 06/30/2006] [Indexed: 02/03/2023]
Abstract
Knowledge-based potentials are statistical parameters derived from databases of known protein properties that empirically capture aspects of the physical chemistry of protein structure and function. These potentials play a key role in protein design by improving the accuracy of physics-based models of interatomic interactions and enhancing the computational efficiency of the design process by limiting the complexity of searching sequence space. Recently, knowledge-based potentials (in isolation or in combination with physics-based potentials) have been applied to the modification of existing protein function, the redesign of natural protein folds and the complete design of a non-natural protein fold. In addition, knowledge-based potentials appear to be providing important information about the global topology of amino acid interactions in natural proteins. A detailed study of the methods and products of these protein design efforts promises to greatly expand our understanding of proteins and the evolutionary process that created them.
Collapse
Affiliation(s)
- Alan M Poole
- Howard Hughes Medical Institute, Department of Pharmacology and the Green Comprehensive Center Division for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | |
Collapse
|
41
|
Grigoryan G, Zhou F, Lustig SR, Ceder G, Morgan D, Keating AE. Ultra-fast evaluation of protein energies directly from sequence. PLoS Comput Biol 2006; 2:e63. [PMID: 16789811 PMCID: PMC1479088 DOI: 10.1371/journal.pcbi.0020063] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2006] [Accepted: 04/24/2006] [Indexed: 11/22/2022] Open
Abstract
The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 107 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1–4.7 kcal/mol, R2 = 0.7–1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets—a coiled coil, a zinc finger, and a WW domain—as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling. Many applications in computational structural biology involve evaluating the energy of a protein adopting a specific structure. A variety of functions are used for this purpose. Statistical potentials are fast to evaluate but do not have a clear biophysical basis, whereas physics-based functions consist of well-defined terms that can be costly to compute. This paper describes how the theory of cluster expansion, originally developed to describe the energies of alloys, can be applied to generate a physical potential for proteins that is extremely fast to evaluate. Cluster expansion is a way of representing a property of a system as a discrete function of its degrees of freedom. In this paper, it is used for the problem of protein design, where the energy is determined by the identities and conformations of amino acids at different sites on a fixed protein backbone. Application of cluster expansion to three small protein folds—the α-helical coiled coil, the zinc finger, and the WW domain—shows that protein sequence can be mapped directly to energy using a surprisingly simple function that maintains high accuracy. Promising results on these small systems suggest that the theory may have utility for macromolecular modeling more generally.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Fei Zhou
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Steve R Lustig
- DuPont Central Research and Development, Experimental Station, Wilmington, Delaware, United States of America
| | - Gerbrand Ceder
- Department of Material Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Dane Morgan
- Department of Material Science and Engineering, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Amy E Keating
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
42
|
Abstract
We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
Collapse
Affiliation(s)
- Y Dehouck
- Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, 1050 Brussels, Belgium.
| | | | | |
Collapse
|
43
|
Zhou F, Grigoryan G, Lustig SR, Keating AE, Ceder G, Morgan D. Coarse-graining protein energetics in sequence variables. PHYSICAL REVIEW LETTERS 2005; 95:148103. [PMID: 16241695 DOI: 10.1103/physrevlett.95.148103] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Indexed: 05/05/2023]
Abstract
We show that cluster expansions (CE), previously used to model solid-state materials with binary or ternary configurational disorder, can be extended to the protein design problem. We present a generalized CE framework, in which properties such as energy can be unambiguously expanded in the amino-acid sequence space. The CE coarse grains over nonsequence degrees of freedom (e.g., side-chain conformations) and thereby simplifies the problem of designing proteins, or predicting the compatibility of a sequence with a given structure, by many orders of magnitude. The CE is physically transparent, and can be evaluated through linear regression on the energies of training sequences. We show, as example, that good prediction accuracy is obtained with up to pairwise interactions for a coiled-coil backbone, and that triplet interactions are important in the energetics of a more globular zinc-finger backbone.
Collapse
Affiliation(s)
- Fei Zhou
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | |
Collapse
|
44
|
Ota N, Agard DA. Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J Mol Biol 2005; 351:345-54. [PMID: 16005893 DOI: 10.1016/j.jmb.2005.05.043] [Citation(s) in RCA: 180] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2004] [Revised: 04/21/2005] [Accepted: 05/19/2005] [Indexed: 11/20/2022]
Abstract
A variety of experimental evidence suggests that rapid, long-range propagation of conformational changes through the core of proteins plays a vital role in allosteric communication. Here, we describe a non-equilibrium molecular dynamics simulation method, anisotropic thermal diffusion (ATD), which allowed us to observe a dominant intramolecular signaling pathway in PSD-95, a member of the PDZ domain protein family. The observed pathway is in good accordance with a pathway previously inferred using a multiple sequence analysis of 276 PDZ domain proteins. In comparison with conventional solution molecular dynamics methods, the ATD method provides greatly enhanced signal-to-noise, allowing long-distance correlations to be observed clearly. The ATD method requires neither a large number of homologous proteins, nor extremely long simulation times to obtain a complete signaling pathway within a protein. Therefore, the ATD method should prove to be a powerful and general complement to experimental efforts to understand the physical basis of intramolecular signaling.
Collapse
Affiliation(s)
- Nobuyuki Ota
- Howard Hughes Medical Institute and Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143-2240, USA
| | | |
Collapse
|
45
|
Minshull J, Ness JE, Gustafsson C, Govindarajan S. Predicting enzyme function from protein sequence. Curr Opin Chem Biol 2005; 9:202-9. [PMID: 15811806 DOI: 10.1016/j.cbpa.2005.02.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
There are two main reasons to try to predict an enzyme's function from its sequence. The first is to identify the components and thus the functional capabilities of an organism, the second is to create enzymes with specific properties. Genomics, expression analysis, proteomics and metabonomics are largely directed towards understanding how information flows from DNA sequence to protein functions within an organism. This review focuses on information flow in the opposite direction: the applicability of what is being learned from natural enzymes to improve methods for catalyst design.
Collapse
|
46
|
|
47
|
Dehouck Y, Gilis D, Rooman M. Database-derived potentials dependent on protein size for in silico folding and design. Biophys J 2005; 87:171-81. [PMID: 15240455 PMCID: PMC1304340 DOI: 10.1529/biophysj.103.037861] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Knowledge-based potentials are widely used in simulations of protein folding, structure prediction, and protein design. Their advantages include limited computational requirements and the ability to deal with low-resolution protein models compatible with long-scale simulations. Their drawbacks comprehend their dependence on specific features of the dataset from which they are derived, such as the size of the proteins it contains, and their physical meaning is still a subject of debate. We address these issues by probing the theoretical validity of these potentials as mean-force potentials that take the solvent implicitly into account and involve entropic contributions due to atomic degrees of freedom and solvation. The dependence on the size of the system is checked on distance-dependent amino acid pair potentials, derived from six protein structure sets containing proteins of increasing length N. For large inter-residue distances, they are found to display the theoretically predicted 1/N behavior weighted by a factor depending on the boundaries and the compressibility of the system. For short distances, different trends are observed according to the nature of the residue pairs and their ability to form, for example, electrostatic, cation-pi or pi-pi interactions, or hydrophobic packing. The results of this analysis are used to devise a novel protein size-dependent distance potential, which displays an improved performance in discriminating native sequence-structure matches among decoy models.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique Génomique et Structurale, Université Libre de Bruxelles, Brussels, Belgium.
| | | | | |
Collapse
|
48
|
O'Donoghue P, Luthey-Schulten Z. Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information. J Mol Biol 2005; 346:875-94. [PMID: 15713469 DOI: 10.1016/j.jmb.2004.11.053] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2004] [Revised: 11/11/2004] [Accepted: 11/17/2004] [Indexed: 11/22/2022]
Abstract
We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.
Collapse
Affiliation(s)
- Patrick O'Donoghue
- Department of Chemistry, University of Illinois at Urbana-Champaign, 600 S. Mathews, Urbana, IL 61801, USA
| | | |
Collapse
|
49
|
Pei J, Grishin NV. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004; 56:782-94. [PMID: 15281130 DOI: 10.1002/prot.20158] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
50
|
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2004; 86:235-77. [PMID: 15288760 DOI: 10.1016/j.pbiomolbio.2003.09.003] [Citation(s) in RCA: 207] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|