1
|
Zhang Y, Dong M, Deng J, Wu J, Zhao Q, Gao X, Xiong D. Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions. Commun Biol 2024; 7:1400. [PMID: 39462102 PMCID: PMC11513059 DOI: 10.1038/s42003-024-07066-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 10/14/2024] [Indexed: 10/28/2024] Open
Abstract
Assessing mutation impact on the binding affinity change (ΔΔG) of protein-protein interactions (PPIs) plays a crucial role in unraveling structural-functional intricacies of proteins and developing innovative protein designs. In this study, we present a deep learning framework, PIANO, for improved prediction of ΔΔG in PPIs. The PIANO framework leverages a graph masked self-distillation scheme for protein structural geometric representation pre-training, which effectively captures the structural context representations surrounding mutation sites, and makes predictions using a multi-branch network consisting of multiple encoders for amino acids, atoms, and protein sequences. Extensive experiments demonstrated its superior prediction performance and the capability of pre-trained encoder in capturing meaningful representations. Compared to previous methods, PIANO can be widely applied on both holo complex structures and apo monomer structures. Moreover, we illustrated the practical applicability of PIANO in highlighting pathogenic mutations and crucial proteins, and distinguishing de novo mutations in disease cases and controls in PPI systems. Overall, PIANO offers a powerful deep learning tool, which may provide valuable insights into the study of drug design, therapeutic intervention, and protein engineering.
Collapse
Affiliation(s)
- Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Mingyuan Dong
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Junsheng Deng
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Jiafeng Wu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Qiuye Zhao
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| | - Xieping Gao
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, 410081, China.
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| |
Collapse
|
2
|
Li D, Zhu Y, Zhang W, Liu J, Yang X, Liu Z, Wei D. AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. Interdiscip Sci 2024:10.1007/s12539-024-00662-7. [PMID: 39367992 DOI: 10.1007/s12539-024-00662-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/18/2024] [Accepted: 09/22/2024] [Indexed: 10/07/2024]
Abstract
The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.
Collapse
Affiliation(s)
- Daixi Li
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China.
- Pengcheng Laboratory, Shenzhen, 518055, China.
| | - Yuqi Zhu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Wujie Zhang
- Chemical and Biomolecular Engineering Program, Physics and Chemistry Department, Milwaukee School of Engineering, Milwaukee, 53202, USA
| | - Jing Liu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Xiaochen Yang
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Zhihong Liu
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Dongqing Wei
- Pengcheng Laboratory, Shenzhen, 518055, China
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation, Center On Antibacterial Resistances, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
3
|
Datta Darshan VM, Arumugam N, Almansour AI, Sivaramakrishnan V, Kanchi S. In silico energetic and molecular dynamic simulations studies demonstrate potential effect of the point mutations with implications for protein engineering in BDNF. Int J Biol Macromol 2024; 271:132247. [PMID: 38750847 DOI: 10.1016/j.ijbiomac.2024.132247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/01/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024]
Abstract
Protein engineering by directed evolution is time-consuming. Hence, in silico techniques like FoldX-Yasara for ∆∆G calculation, and SNPeffect for predicting propensity for aggregation, amyloid formation, and chaperone binding are employed to design proteins. Here, we used in silico techniques to engineer BDNF-NTF3 interaction and validated it using mutations with known functional implications for NGF dimer. The structures of three mutants representing a positive, negative, or neutral ∆∆G involving two interface residues in BDNF and two mutations representing a neutral and positive ∆∆G in NGF, which is aligned with BDNF, were selected for molecular dynamics (MD) simulation. Our MD results conclude that the secondary structure of individual protomers of the positive and negative mutants displayed a similar or different conformation from the NTF3 monomer, respectively. The positive mutants showed fewer hydrophobic interactions and higher hydrogen bonds compared to the wild-type, negative, and neutral mutants with similar SASA, suggesting solvent-mediated disruption of hydrogen-bonded interactions. Similar results were obtained for mutations with known functional implications for NGF and BDNF. The results suggest that mutations with known effects in homologous proteins could help in validation, and in silico directed evolution experiments could be a viable alternative to the experimental technique used for protein engineering.
Collapse
Affiliation(s)
- V M Datta Darshan
- Disease Biology Lab, Department of Biosciences, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, Andhra Pradesh 515134, India
| | - Natarajan Arumugam
- Department of Chemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
| | - Abdulrahman I Almansour
- Department of Chemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
| | - Venketesh Sivaramakrishnan
- Disease Biology Lab, Department of Biosciences, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, Andhra Pradesh 515134, India.
| | - Subbarao Kanchi
- Department of Physics, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, Andhra Pradesh 515134, India.
| |
Collapse
|
4
|
Islam S, Pantazes RJ. Developing similarity matrices for antibody-protein binding interactions. PLoS One 2023; 18:e0293606. [PMID: 37883504 PMCID: PMC10602319 DOI: 10.1371/journal.pone.0293606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The inventions of AlphaFold and RoseTTAFold are revolutionizing computational protein science due to their abilities to reliably predict protein structures. Their unprecedented successes are due to the parallel consideration of several types of information, one of which is protein sequence similarity information. Sequence homology has been studied for many decades and depends on similarity matrices to define how similar or different protein sequences are to one another. A natural extension of predicting protein structures is predicting the interactions between proteins, but similarity matrices for protein-protein interactions do not exist. This study conducted a mutational analysis of 384 non-redundant antibody-protein antigen complexes to calculate antibody-protein interaction similarity matrices. Every important residue in each antibody and each antigen was mutated to each of the other 19 commonly occurring amino acids and the percentage changes in interaction energies were calculated using three force fields: CHARMM, Amber, and Rosetta. The data were used to construct six interaction similarity matrices, one for antibodies and another for antigens using each force field. The matrices exhibited both commonalities, such as mutations of aromatic and charged residues being the most detrimental, and differences, such as Rosetta predicting mutations of serines to be better tolerated than either Amber or CHARMM. A comparison to nine previously published similarity matrices for protein sequences revealed that the new interaction matrices are more similar to one another than they are to any of the previous matrices. The created similarity matrices can be used in force field specific applications to help guide decisions regarding mutations in protein-protein binding interfaces.
Collapse
Affiliation(s)
- Sumaiya Islam
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Robert J. Pantazes
- Department of Chemical Engineering, Auburn University, Auburn, Alabama, United States of America
| |
Collapse
|
5
|
Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels. PLoS Comput Biol 2023; 19:e1011460. [PMID: 37713443 PMCID: PMC10529646 DOI: 10.1371/journal.pcbi.1011460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 09/27/2023] [Accepted: 08/24/2023] [Indexed: 09/17/2023] Open
Abstract
Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ∆V1/2, with a RMSE ~ 32 mV and correlation coefficient of R ~ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ∆V1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.
Collapse
Affiliation(s)
- Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Guohui Zhang
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Nathan Ji
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Kelli M. White
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Lu Han
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Zhiguang Jia
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Jingyi Shi
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jianmin Cui
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| |
Collapse
|
6
|
Chen J, Woldring DR, Huang F, Huang X, Wei GW. Topological deep learning based deep mutational scanning. Comput Biol Med 2023; 164:107258. [PMID: 37506452 PMCID: PMC10528359 DOI: 10.1016/j.compbiomed.2023.107258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/28/2023] [Accepted: 07/08/2023] [Indexed: 07/30/2023]
Abstract
High-throughput deep mutational scanning (DMS) experiments have significantly impacted protein engineering, drug discovery, immunology, cancer biology, and evolutionary biology by enabling the systematic understanding of protein functions. However, the mutational space associated with proteins is astronomically large, making it overwhelming for current experimental capabilities. Therefore, alternative methods for DMS are imperative. We propose a topological deep learning (TDL) paradigm to facilitate in silico DMS. We utilize a new topological data analysis (TDA) technique based on the persistent spectral theory, also known as persistent Laplacian, to capture both topological invariants and the homotopic shape evolution of data. To validate our TDL-DMS model, we use SARS-CoV-2 datasets and show excellent accuracy and reliability for binding interface mutations. This finding is significant for SARS-CoV-2 variant forecasting and designing effective antibodies and vaccines. Our proposed model is expected to have a significant impact on drug discovery, vaccine design, precision medicine, and protein engineering.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Daniel R Woldring
- Department of Chemical Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Faqing Huang
- Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Xuefei Huang
- Department of Chemistry, Michigan State University, MI 48824, USA; Department of Biomedical Engineering, Michigan State University, East Lansing, MI 48824, USA; The Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
7
|
Li J, Kang G, Wang J, Yuan H, Wu Y, Meng S, Wang P, Zhang M, Wang Y, Feng Y, Huang H, de Marco A. Affinity maturation of antibody fragments: A review encompassing the development from random approaches to computational rational optimization. Int J Biol Macromol 2023; 247:125733. [PMID: 37423452 DOI: 10.1016/j.ijbiomac.2023.125733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 07/11/2023]
Abstract
Routinely screened antibody fragments usually require further in vitro maturation to achieve the desired biophysical properties. Blind in vitro strategies can produce improved ligands by introducing random mutations into the original sequences and selecting the resulting clones under more and more stringent conditions. Rational approaches exploit an alternative perspective that aims first at identifying the specific residues potentially involved in the control of biophysical mechanisms, such as affinity or stability, and then to evaluate what mutations could improve those characteristics. The understanding of the antigen-antibody interactions is instrumental to develop this process the reliability of which, consequently, strongly depends on the quality and completeness of the structural information. Recently, methods based on deep learning approaches critically improved the speed and accuracy of model building and are promising tools for accelerating the docking step. Here, we review the features of the available bioinformatic instruments and analyze the reports illustrating the result obtained with their application to optimize antibody fragments, and nanobodies in particular. Finally, the emerging trends and open questions are summarized.
Collapse
Affiliation(s)
- Jiaqi Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Guangbo Kang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Jiewen Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Haibin Yuan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Yili Wu
- Zhejiang Provincial Clinical Research Center for Mental Disorders, School of Mental Health and the Affiliated Kangning Hospital, Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Wenzhou Medical University, Oujiang Laboratory, Wenzhou, Zhejiang 325035, China
| | - Shuxian Meng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Ping Wang
- New Technology R&D Department, Tianjin Modern Innovative TCM Technology Company Limited, Tianjin 300392, China
| | - Miao Zhang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; China Resources Biopharmaceutical Company Limited, Beijing 100029, China
| | - Yuli Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Tianjin Pharmaceutical Da Ren Tang Group Corporation Limited, Traditional Chinese Pharmacy Research Institute, Tianjin Key Laboratory of Quality Control in Chinese Medicine, Tianjin 300457, China; State Key Laboratory of Drug Delivery Technology and Pharmacokinetics, Tianjin Institute of Pharmaceutical Research, Tianjin 300193, China
| | - Yuanhang Feng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - He Huang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
| | - Ario de Marco
- Laboratory for Environmental and Life Sciences, University of Nova Gorica, Nova Gorica, Slovenia.
| |
Collapse
|
8
|
Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.24.546384. [PMID: 37425916 PMCID: PMC10327070 DOI: 10.1101/2023.06.24.546384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ΔV 1/2 , with a RMSE ∼ 32 mV and correlation coefficient of R ∼ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V 1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ΔV 1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction. Author Summary Deep machine learning has brought many exciting breakthroughs in chemistry, physics and biology. These models require large amount of training data and struggle when the data is scarce. The latter is true for predictive modeling of the function of complex proteins such as ion channels, where only hundreds of mutational data may be available. Using the big potassium (BK) channel as a biologically important model system, we demonstrate that a reliable predictive model of its voltage gating property could be derived from only 473 mutational data by incorporating physics-derived features, which include dynamic properties from molecular dynamics simulations and energetic quantities from Rosetta mutation calculations. We show that the final random forest model captures key trends and hotspots in mutational effects of BK voltage gating, such as the important role of pore hydrophobicity. A particularly curious prediction is that mutations of two adjacent residues on the S5 helix would always have opposite effects on the gating voltage, which was confirmed by experimental characterization of four novel mutations. The current work demonstrates the importance and effectiveness of incorporating physics in predictive modeling of protein function with scarce data.
Collapse
Affiliation(s)
- Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| | - Guohui Zhang
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| | - Nathan Ji
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, USA
| | - Kelli M White
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Lu Han
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Zhiguang Jia
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| | - Jingyi Shi
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Jianmin Cui
- Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
9
|
Dutagaci B, Duan B, Qiu C, Kaplan CD, Feig M. Characterization of RNA polymerase II trigger loop mutations using molecular dynamics simulations and machine learning. PLoS Comput Biol 2023; 19:e1010999. [PMID: 36947548 PMCID: PMC10069792 DOI: 10.1371/journal.pcbi.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 04/03/2023] [Accepted: 03/06/2023] [Indexed: 03/23/2023] Open
Abstract
Catalysis and fidelity of multisubunit RNA polymerases rely on a highly conserved active site domain called the trigger loop (TL), which achieves roles in transcription through conformational changes and interaction with NTP substrates. The mutations of TL residues cause distinct effects on catalysis including hypo- and hyperactivity and altered fidelity. We applied molecular dynamics simulation (MD) and machine learning (ML) techniques to characterize TL mutations in the Saccharomyces cerevisiae RNA Polymerase II (Pol II) system. We did so to determine relationships between individual mutations and phenotypes and to associate phenotypes with MD simulated structural alterations. Using fitness values of mutants under various stress conditions, we modeled phenotypes along a spectrum of continual values. We found that ML could predict the phenotypes with 0.68 R2 correlation from amino acid sequences alone. It was more difficult to incorporate MD data to improve predictions from machine learning, presumably because MD data is too noisy and possibly incomplete to directly infer functional phenotypes. However, a variational auto-encoder model based on the MD data allowed the clustering of mutants with different phenotypes based on structural details. Overall, we found that a subset of loss-of-function (LOF) and lethal mutations tended to increase distances of TL residues to the NTP substrate, while another subset of LOF and lethal substitutions tended to confer an increase in distances between TL and bridge helix (BH). In contrast, some of the gain-of-function (GOF) mutants appear to cause disruption of hydrophobic contacts among TL and nearby helices.
Collapse
Affiliation(s)
- Bercem Dutagaci
- Department of Molecular and Cell Biology, University of California Merced, Merced, California, United States of America
| | - Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Craig D. Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
10
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
11
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| |
Collapse
|
13
|
Mariz BDP, Carvalho S, Batalha IL, Pina AS. Artificial enzymes bringing together computational design and directed evolution. Org Biomol Chem 2021; 19:1915-1925. [PMID: 33443278 DOI: 10.1039/d0ob02143a] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Enzymes are proteins that catalyse chemical reactions and, as such, have been widely used to facilitate a variety of natural and industrial processes, dating back to ancient times. In fact, the global enzymes market is projected to reach $10.5 billion in 2024. The development of computational and DNA editing tools boosted the creation of artificial enzymes (de novo enzymes) - synthetic or organic molecules created to present abiological catalytic functions. These novel catalysts seek to expand the catalytic power offered by nature through new functions and properties. In this manuscript, we discuss the advantages of combining computational design with directed evolution for the development of artificial enzymes and how this strategy allows to fill in the gaps that these methods present individually by providing key insights about the sequence-function relationship. We also review examples, and respective strategies, where this approach has enabled the creation of artificial enzymes with promising catalytic activity. Such key enabling technologies are opening new windows of opportunity in a variety of industries, including pharmaceutical, chemical, biofuels, and food, contributing towards a more sustainable development.
Collapse
Affiliation(s)
- Beatriz de Pina Mariz
- UCIBIO, Chemistry Department, School of Science and Technology, NOVA University of Lisbon, 2829-516 Caparica, Portugal.
| | - Sara Carvalho
- UCIBIO, Chemistry Department, School of Science and Technology, NOVA University of Lisbon, 2829-516 Caparica, Portugal.
| | - Iris L Batalha
- Nanoscience Centre, Department of Engineering, University of Cambridge, 11 J.J. Thomson Avenue, Cambridge, CB3 0FF, UK
| | - Ana Sofia Pina
- UCIBIO, Chemistry Department, School of Science and Technology, NOVA University of Lisbon, 2829-516 Caparica, Portugal.
| |
Collapse
|
14
|
Tripathi S, Dsouza NR, Urrutia R, Zimmermann MT. Structural bioinformatics enhances mechanistic interpretation of genomic variation, demonstrated through the analyses of 935 distinct RAS family mutations. Bioinformatics 2021; 37:1367-1375. [PMID: 33226070 PMCID: PMC8208742 DOI: 10.1093/bioinformatics/btaa972] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 10/04/2020] [Accepted: 11/11/2020] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Protein-coding genetic alterations are frequently observed in Clinical Genetics, but the high yield of variants of uncertain significance remains a limitation in decision making. RAS-family GTPases are cancer drivers, but only 54 variants, across all family members, fall within well-known hotspots. However, extensive sequencing has identified 881 non-hotspot variants for which significance remains to be investigated. RESULTS Here, we evaluate 935 missense variants from seven RAS genes, observed in cancer, RASopathies and the healthy adult population. We characterized hotspot variants, previously studied experimentally, using 63 sequence- and 3D structure-based scores, chosen by their breadth of biophysical properties. Applying scores that display best correlation with experimental measures, we report new valuable mechanistic inferences for both hot-spot and non-hotspot variants. Moreover, we demonstrate that 3D scores have little-to-no correlation with those based on DNA sequence, which are commonly used in Clinical Genetics. Thus, combined, these new knowledge bear significant relevance. AVAILABILITY AND IMPLEMENTATION All genomic and 3D scores, and markdown for generating figures, are provided in our supplemental data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Swarnendu Tripathi
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Precision Medicine Simulation Unit, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA
| | - Nikita R Dsouza
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Precision Medicine Simulation Unit, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA
| | - Raul Urrutia
- Precision Medicine Simulation Unit, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Department of Surgery, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA
| | - Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Precision Medicine Simulation Unit, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Clinical and Translational Sciences Institute, Genomic Sciences and Precision Medicine Center, Milwaukee, WI 53226, USA.,Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
15
|
Abbasi WA, Abbas SA, Andleeb S. PANDA: Predicting the change in proteins binding affinity upon mutations by finding a signal in primary structures. J Bioinform Comput Biol 2021; 19:2150015. [PMID: 34126874 DOI: 10.1142/s0219720021500153] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately determining a change in protein binding affinity upon mutations is important to find novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be supported with computational methods. Most of the available computational prediction techniques depend upon protein structures that bound their applicability to only protein complexes with recognized 3D structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation and question the effectiveness of [Formula: see text]-fold cross-validation (CV) across mutations adopted in previous studies to assess the generalization ability of such predictors with no known mutation during training. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA performs comparably to the existing methods gauged through an appropriate CV scheme and an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods. We made PANDA easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/panda, respectively.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Syed Ali Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Saiqa Andleeb
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| |
Collapse
|
16
|
Wätzig H, Hoffstedt M, Krebs F, Minkner R, Scheller C, Zagst H. Protein analysis and stability: Overcoming trial-and-error by grouping according to physicochemical properties. J Chromatogr A 2021; 1649:462234. [PMID: 34038775 DOI: 10.1016/j.chroma.2021.462234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/27/2021] [Accepted: 04/28/2021] [Indexed: 12/15/2022]
Abstract
Today proteins are possibly the most important class of substances. Yet new tasks for proteins are still often solved by trial-and-error approaches. However, in some areas these euphemistically called "screening approaches" are not suitable. E.g. stability tests just take too long and therefore require a more strategic, target-orientated concept. This concept is available by grouping proteins according to their physicochemical properties and then pulling out the right drawer for new tasks. These properties include size, then charge and hydrophobicity as well as their patchinesses, and the degree of order. In addition, solubility, the content of (free) enthalpy, aromatic-amino-acid- and α/β-frequency as well as helix capping, and corresponding patchiness, the number of specific motifs and domains as well as the typical concentration range can be helpful to discriminate between different groups of proteins. Analyzing correlations will reduce the necessary amount of parameters and additional ones, which may be still undiscovered at the present time, can be identified looking at protein subgroups with similar physicochemical properties which still behave heterogeneously. Step-by-step the methodology will be improved. Possibly protein stability will be the driver of this process, but all other areas such as production, purification and analytics including sample pre-treatment and the choice of appropriate separation conditions for e.g. chromatography and electrophoresis will profit from a rational strategy.
Collapse
Affiliation(s)
- Hermann Wätzig
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany.
| | - Marc Hoffstedt
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany
| | - Finja Krebs
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany
| | - Robert Minkner
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany
| | - Christin Scheller
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany
| | - Holger Zagst
- Technische Universität Braunschweig, Institute of Medicinal and Pharmaceutical Chemistry, Beethovenstraße 55, Braunschweig 38106, Germany
| |
Collapse
|
17
|
Durairaj J, Melillo E, Bouwmeester HJ, Beekwilder J, de Ridder D, van Dijk ADJ. Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases. PLoS Comput Biol 2021; 17:e1008197. [PMID: 33750949 PMCID: PMC8016262 DOI: 10.1371/journal.pcbi.1008197] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 04/01/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.
Collapse
Affiliation(s)
- Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | | | - Harro J. Bouwmeester
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Jules Beekwilder
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, The Netherlands
- Laboratory of Plant Physiology, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
18
|
Strokach A, Lu TY, Kim PM. ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations. J Mol Biol 2021; 433:166810. [PMID: 33450251 DOI: 10.1016/j.jmb.2021.166810] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/19/2020] [Accepted: 01/03/2021] [Indexed: 12/21/2022]
Abstract
The ELASPIC web server allows users to evaluate the effect of mutations on protein folding and protein-protein interaction on a proteome-wide scale. It uses homology models of proteins and protein-protein interactions, which have been precalculated for several proteomes, and machine learning models, which integrate structural information with sequence conservation scores, in order to make its predictions. Since the original publication of the ELASPIC web server, several advances have motivated a revisiting of the problem of mutation effect prediction. First, progress in neural network architectures and self-supervised pre-trained has resulted in models which provide more informative embeddings of protein sequence and structure than those used by the original version of ELASPIC. Second, the amount of training data has increased several-fold, largely driven by advances in deep mutation scanning and other multiplexed assays of variant effect. Here, we describe two machine learning models which leverage the recent advances in order to achieve superior accuracy in predicting the effect of mutation on protein folding and protein-protein interaction. The models incorporate features generated using pre-trained transformer- and graph convolution-based neural networks, and are trained to optimize a ranking objective function, which permits the use of heterogeneous training data. The outputs from the new models have been incorporated into the ELASPIC web server, available at http://elaspic.kimlab.org.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Tian Yu Lu
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
| |
Collapse
|
19
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
20
|
Huang X, Zheng W, Pearce R, Zhang Y. SSIPe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function. Bioinformatics 2020; 36:2429-2437. [PMID: 31830252 DOI: 10.1093/bioinformatics/btz926] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 11/08/2019] [Accepted: 12/09/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein-protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. RESULTS We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. AVAILABILITY AND IMPLEMENTATION Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
21
|
Meseguer A, Dominguez L, Bota PM, Aguirre‐Plans J, Bonet J, Fernandez‐Fuentes N, Oliva B. Using collections of structural models to predict changes of binding affinity caused by mutations in protein-protein interactions. Protein Sci 2020; 29:2112-2130. [PMID: 32797645 PMCID: PMC7513729 DOI: 10.1002/pro.3930] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 08/04/2020] [Accepted: 08/05/2020] [Indexed: 12/24/2022]
Abstract
Protein-protein interactions (PPIs) in all the molecular aspects that take place both inside and outside cells. However, determining experimentally the structure and affinity of PPIs is expensive and time consuming. Therefore, the development of computational tools, as a complement to experimental methods, is fundamental. Here, we present a computational suite: MODPIN, to model and predict the changes of binding affinity of PPIs. In this approach we use homology modeling to derive the structures of PPIs and score them using state-of-the-art scoring functions. We explore the conformational space of PPIs by generating not a single structural model but a collection of structural models with different conformations based on several templates. We apply the approach to predict the changes in free energy upon mutations and splicing variants of large datasets of PPIs to statistically quantify the quality and accuracy of the predictions. As an example, we use MODPIN to study the effect of mutations in the interaction between colicin endonuclease 9 and colicin endonuclease 2 immune protein from Escherichia coli. Finally, we have compared our results with other state-of-art methods.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Lluis Dominguez
- Integrative Biomedical Informatics Group (GRIB‐IMIM). Department of Experimental and Life SciencesUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Patricia M. Bota
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
- Department of BiosciencesUniversitat de Vic‐Universitat Central de CatalunyaVicCataloniaSpain
| | - Joaquim Aguirre‐Plans
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Jaume Bonet
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| | - Narcis Fernandez‐Fuentes
- Department of BiosciencesUniversitat de Vic‐Universitat Central de CatalunyaVicCataloniaSpain
- Institute of Biological, Environmental and Rural SciencesAberystwyth UniversityAberystwythUK
| | - Baldo Oliva
- Structural Bioinformatics Group, Research Programme on Biomedical Informatics, Department of Experimental and Health ScienceUniversitat Pompeu FabraBarcelonaCataloniaSpain
| |
Collapse
|
22
|
Jemimah S, Sekijima M, Gromiha MM. ProAffiMuSeq: sequence-based method to predict the binding free energy change of protein-protein complexes upon mutation using functional classification. Bioinformatics 2020; 36:1725-1730. [PMID: 31713585 DOI: 10.1093/bioinformatics/btz829] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 10/23/2019] [Accepted: 11/11/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Protein-protein interactions are essential for the cell and mediate various functions. However, mutations can disrupt these interactions and may cause diseases. Currently available computational methods require a complex structure as input for predicting the change in binding affinity. Further, they have not included the functional class information for the protein-protein complex. To address this, we have developed a method, ProAffiMuSeq, which predicts the change in binding free energy using sequence-based features and functional class. RESULTS Our method shows an average correlation between predicted and experimentally determined ΔΔG of 0.73 and mean absolute error (MAE) of 0.86 kcal/mol in 10-fold cross-validation and correlation of 0.75 with MAE of 0.94 kcal/mol in the test dataset. ProAffiMuSeq was also tested on an external validation set and showed results comparable to structure-based methods. Our method can be used for large-scale analysis of disease-causing mutations in protein-protein complexes without structural information. AVAILABILITY AND IMPLEMENTATION Users can access the method at https://web.iitm.ac.in/bioinfo2/proaffimuseq/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Masakazu Sekijima
- Advanced Computational Drug Discovery Unit, Tokyo Institute of Technology, Midori-ku, Kanagawa 226-8503, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.,Advanced Computational Drug Discovery Unit, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Kanagawa 226-8503, Yokohama, Japan
| |
Collapse
|
23
|
Gado JE, Beckham GT, Payne CM. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J Chem Inf Model 2020; 60:4098-4107. [DOI: 10.1021/acs.jcim.0c00489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Japheth E. Gado
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Gregg T. Beckham
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Christina M. Payne
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
24
|
Marabotti A, Scafuri B, Facchiano A. Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform 2020; 22:5850907. [PMID: 32496523 DOI: 10.1093/bib/bbaa074] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/07/2020] [Accepted: 04/10/2020] [Indexed: 01/06/2023] Open
Abstract
A very large number of computational methods to predict the change in thermodynamic stability of proteins due to mutations have been developed during the last 30 years, and many different web servers are currently available. Nevertheless, most of them suffer from severe drawbacks that decrease their general reliability and, consequently, their applicability to different goals such as protein engineering or the predictions of the effects of mutations in genetic diseases. In this review, we have summarized all the main approaches used to develop these tools, with a survey of the web servers currently available. Moreover, we have also reviewed the different assessments made during the years, in order to allow the reader to check directly the different performances of these tools, to select the one that best fits his/her needs, and to help naïve users in finding the best option for their needs.
Collapse
|
25
|
Broom A, Trainor K, Jacobi Z, Meiering EM. Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems. Structure 2020; 28:717-726.e3. [DOI: 10.1016/j.str.2020.04.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 03/26/2020] [Accepted: 04/06/2020] [Indexed: 12/20/2022]
|
26
|
Akdel M, Durairaj J, de Ridder D, van Dijk ADJ. Caretta - A multiple protein structure alignment and feature extraction suite. Comput Struct Biotechnol J 2020; 18:981-992. [PMID: 32368333 PMCID: PMC7186369 DOI: 10.1016/j.csbj.2020.03.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 02/01/2020] [Accepted: 03/13/2020] [Indexed: 02/06/2023] Open
Abstract
The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta’s performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, The Netherlands.,Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, The Netherlands
| |
Collapse
|
27
|
Huang P, Chu SKS, Frizzo HN, Connolly MP, Caster RW, Siegel JB. Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset. ACS OMEGA 2020; 5:6487-6493. [PMID: 32258884 PMCID: PMC7114132 DOI: 10.1021/acsomega.9b04105] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 03/06/2020] [Indexed: 05/04/2023]
Abstract
Engineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. The development of new experimental datasets and computational tools to guide these engineering efforts remains an active area of research. Thus, to complement the previously reported measures of T 50 and kinetic constants, we are reporting an expansion of our previously published dataset of mutants for β-glucosidase to include both measures of T M and ΔΔG. For a set of 51 mutants, we found that T 50 and T M are moderately correlated, with a Pearson correlation coefficient and Spearman's rank coefficient of 0.58 and 0.47, respectively, indicating that the two methods capture different physical features. The performance of predicted stability using nine computational tools was also evaluated on the dataset of 51 mutants, none of which are found to be strong predictors of the observed changes in T 50, T M, or ΔΔG. Furthermore, the ability of the nine algorithms to predict the production of isolatable soluble protein was examined, which revealed that Rosetta ΔΔG, FoldX, DeepDDG, PoPMuSiC, and SDM were capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of mutants that maintain fold and soluble production properties.
Collapse
Affiliation(s)
- Peishan Huang
- Biophysics
Graduate Group, University of California, Davis 95616, California, United States
| | - Simon K. S. Chu
- Biophysics
Graduate Group, University of California, Davis 95616, California, United States
| | - Henrique N. Frizzo
- Genome
Center, University of California, Davis 95616, California, United States
| | - Morgan P. Connolly
- Microbiology
Graduate Group, University of California, Davis 95616, California, United States
| | - Ryan W. Caster
- Genome
Center, University of California, Davis 95616, California, United States
| | - Justin B. Siegel
- Genome
Center, University of California, Davis 95616, California, United States
- Department
of Biochemistry & Molecular Medicine, University of California, Davis 95616, California, United States
- Department
of Chemistry, University of California, Davis 95616, California, United States
| |
Collapse
|
28
|
Raja M, Kinne RKH. Mechanistic Insights into Protein Stability and Self-aggregation in GLUT1 Genetic Variants Causing GLUT1-Deficiency Syndrome. J Membr Biol 2020; 253:87-99. [PMID: 32025761 PMCID: PMC7150661 DOI: 10.1007/s00232-020-00108-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 01/14/2020] [Indexed: 12/23/2022]
Abstract
Human sodium-independent glucose cotransporter 1 (hGLUT1) has been studied for its tetramerization and multimerization at the cell surface. Homozygous or compound heterozygous mutations in hGLUT1 elicit GLUT1-deficiency syndrome (GLUT1-DS), a metabolic disorder, which results in impaired glucose transport into the brain. The reduced cell surface expression or loss of function have been shown for some GLUT1 mutants. However, the mechanism by which deleterious mutations affect protein structure, conformational stability and GLUT1 oligomerization is not known and require investigation. In this review, we combined previous knowledge of GLUT1 mutations with hGLUT1 crystal structure to analyze native interactions and several natural single-point mutations. The modeling of native hGLUT1 structure confirmed the roles of native residues in forming a range of side-chain interactions. Interestingly, the modeled mutants pointed to the formation of a variety of non-native novel interactions, altering interaction networks and potentially eliciting protein misfolding. Self-aggregation of the last part of hGLUT1 was predicted using protein aggregation prediction tool. Furthermore, an increase in aggregation potential in the aggregation-prone regions was estimated for several mutants suggesting increased aggregation of misfolded protein. Protein stability change analysis predicted that GLUT1 mutant proteins are unstable. Combining GLUT1 oligomerization behavior with our modeling, aggregation prediction, and protein stability analyses, this work provides state-of-the-art view of GLUT1 genetic mutations that could destabilize native interactions, generate novel interactions, trigger protein misfolding, and enhance protein aggregation in a disease state.
Collapse
Affiliation(s)
- Mobeen Raja
- Max Planck Institute of Molecular Physiology, Otto-Hahn-Strasse 11, 44227 Dortmund, Germany
- Algonquin College, 1385 Woodroffe Avenue, Ottawa, ON K2G 1V8 Canada
| | - Rolf K. H. Kinne
- Max Planck Institute of Molecular Physiology, Otto-Hahn-Strasse 11, 44227 Dortmund, Germany
| |
Collapse
|
29
|
Galano-Frutos JJ, García-Cebollada H, Sancho J. Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when. Brief Bioinform 2019; 22:3-19. [PMID: 31813950 DOI: 10.1093/bib/bbz146] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/22/2019] [Accepted: 10/25/2019] [Indexed: 12/18/2022] Open
Abstract
The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether replacement of one amino acid residue with another will be tolerated or cause disease. Those approaches achieve up to 80-85% accuracy as binary classifiers (neutral/pathogenic). As such accuracy is insufficient for medical decision to be based on, and it does not appear to be increasing, more precise methods, such as full-atom molecular dynamics (MD) simulations in explicit solvent, are also discussed. Then, to describe the goal of interpreting human genetic variations at large scale through MD simulations, we restrictively refer to all possible protein variants carrying single-amino-acid substitutions arising from single-nucleotide variations as the human variome. We calculate its size and develop a simple model that allows calculating the simulation time needed to have a 0.99 probability of observing unfolding events of any unstable variant. The knowledge of that time enables performing a binary classification of the variants (stable-potentially neutral/unstable-pathogenic). Our model indicates that the human variome cannot be simulated with present computing capabilities. However, if they continue to increase as per Moore's law, it could be simulated (at 65°C) spending only 3 years in the task if we started in 2031. The simulation of individual protein variomes is achievable in short times starting at present. International coordination seems appropriate to embark upon massive MD simulations of protein variants.
Collapse
Affiliation(s)
- Juan J Galano-Frutos
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| | | | - Javier Sancho
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| |
Collapse
|
30
|
Jankauskaite J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019; 35:462-469. [PMID: 30020414 PMCID: PMC6361233 DOI: 10.1093/bioinformatics/bty635] [Citation(s) in RCA: 161] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/17/2018] [Indexed: 11/18/2022] Open
Abstract
Motivation Understanding the relationship between the sequence, structure, binding energy, binding kinetics and binding thermodynamics of protein–protein interactions is crucial to understanding cellular signaling, the assembly and regulation of molecular complexes, the mechanisms through which mutations lead to disease, and protein engineering. Results We present SKEMPI 2.0, a major update to our database of binding free energy changes upon mutation for structurally resolved protein–protein interactions. This version now contains manually curated binding data for 7085 mutations, an increase of 133%, including changes in kinetics for 1844 mutations, enthalpy and entropy changes for 443 mutations, and 440 mutations, which abolish detectable binding. Availability and implementation The database is available as supplementary data and at https://life.bsc.es/pid/skempi2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Justina Jankauskaite
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Brian Jiménez-García
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Institut de Biologia Molecular de Barcelona (IBMB), CSIC, Barcelona, Spain
| | - Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
31
|
Kuenze G, Duran AM, Woods H, Brewer KR, McDonald EF, Vanoye CG, George AL, Sanders CR, Meiler J. Upgraded molecular models of the human KCNQ1 potassium channel. PLoS One 2019; 14:e0220415. [PMID: 31518351 PMCID: PMC6743773 DOI: 10.1371/journal.pone.0220415] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 07/15/2019] [Indexed: 11/29/2022] Open
Abstract
The voltage-gated potassium channel KCNQ1 (KV7.1) assembles with the KCNE1 accessory protein to generate the slow delayed rectifier current, IKS, which is critical for membrane repolarization as part of the cardiac action potential. Loss-of-function (LOF) mutations in KCNQ1 are the most common cause of congenital long QT syndrome (LQTS), type 1 LQTS, an inherited genetic predisposition to cardiac arrhythmia and sudden cardiac death. A detailed structural understanding of KCNQ1 is needed to elucidate the molecular basis for KCNQ1 LOF in disease and to enable structure-guided design of new anti-arrhythmic drugs. In this work, advanced structural models of human KCNQ1 in the resting/closed and activated/open states were developed by Rosetta homology modeling guided by newly available experimentally-based templates: X. leavis KCNQ1 and various resting voltage sensor structures. Using molecular dynamics (MD) simulations, the capacity of the models to describe experimentally established channel properties including state-dependent voltage sensor gating charge interactions and pore conformations, PIP2 binding sites, and voltage sensor–pore domain interactions were validated. Rosetta energy calculations were applied to assess the utility of each model in interpreting mutation-evoked KCNQ1 dysfunction by predicting the change in protein thermodynamic stability for 50 experimentally characterized KCNQ1 variants with mutations located in the voltage-sensing domain. Energetic destabilization was successfully predicted for folding-defective KCNQ1 LOF mutants whereas wild type-like mutants exhibited no significant energetic frustrations, which supports growing evidence that mutation-induced protein destabilization is an especially common cause of KCNQ1 dysfunction. The new KCNQ1 Rosetta models provide helpful tools in the study of the structural basis for KCNQ1 function and can be used to generate hypotheses to explain KCNQ1 dysfunction.
Collapse
Affiliation(s)
- Georg Kuenze
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Amanda M. Duran
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Hope Woods
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Kathryn R. Brewer
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Eli Fritz McDonald
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Carlos G. Vanoye
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Alfred L. George
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Charles R. Sanders
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
32
|
Savojardo C, Petrosino M, Babbi G, Bovo S, Corbi-Verge C, Casadio R, Fariselli P, Folkman L, Garg A, Karimi M, Katsonis P, Kim PM, Lichtarge O, Martelli PL, Pasquo A, Pal D, Shen Y, Strokach AV, Turina P, Zhou Y, Andreoletti G, Brenner S, Chiaraluce R, Consalvi V, Capriotti E. Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge. Hum Mutat 2019; 40:1392-1399. [PMID: 31209948 PMCID: PMC6744327 DOI: 10.1002/humu.23843] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/02/2019] [Accepted: 06/09/2019] [Indexed: 12/31/2022]
Abstract
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Maria Petrosino
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Carles Corbi-Verge
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| | - Piero Fariselli
- Department of Medical Sciences University of Torino, 10126 Torino, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
| | - Aditi Garg
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Philip M. Kim
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Cir, Toronto, ON M5S 1A8, Canada
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory,FSN-TECFIS-DIM, Frascati, Italy
| | - Debnath Pal
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Alexey V. Strokach
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Parklands Dr, Southport QLD 4222, Australia
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Steven Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Roberta Chiaraluce
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Valerio Consalvi
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
33
|
Strokach A, Corbi-Verge C, Kim PM. Predicting changes in protein stability caused by mutation using sequence-and structure-based methods in a CAGI5 blind challenge. Hum Mutat 2019; 40:1414-1423. [PMID: 31243847 PMCID: PMC6744338 DOI: 10.1002/humu.23852] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 05/16/2019] [Accepted: 06/24/2019] [Indexed: 12/26/2022]
Abstract
Predicting the impact of mutations on proteins remains an important problem. As part of the CAGI5 frataxin challenge, we evaluate the accuracy with which Provean, FoldX, and ELASPIC can predict changes in the Gibbs free energy of a protein using a limited data set of eight mutations. We find that different methods have distinct strengths and limitations, with no method being strictly superior to other methods on all metrics. ELASPIC achieves the highest accuracy while also providing a web interface which simplifies the evaluation and analysis of mutations. FoldX is slightly less accurate than ELASPIC but is easier to run locally, as it does not depend on external tools or datasets. Provean achieves reasonable results while being computational less expensive than the other methods and not requiring a structure of the protein. In addition to methods submitted to the CAGI5 community experiment, and with the aim to inform about other methods with high accuracy, we also evaluate predictions made by Rosetta's ddg_monomer protocol, Rosetta's cartesian_ddg protocol, and thermodynamic integration calculations using Amber package. ELASPIC still achieves the highest accuracy, while Rosetta's catesian_ddg protocol appears to perform best in capturing the overall trend in the data.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Carles Corbi-Verge
- Donnelly Centre for Cellular and Biomolecular Research, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Donnelly Centre for Cellular and Biomolecular Research, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
34
|
Framework Mutations of the 10-1074 bnAb Increase Conformational Stability, Manufacturability, and Stability While Preserving Full Neutralization Activity. J Pharm Sci 2019; 109:233-246. [PMID: 31348937 PMCID: PMC6941225 DOI: 10.1016/j.xphs.2019.07.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 07/08/2019] [Accepted: 07/17/2019] [Indexed: 01/06/2023]
Abstract
The broadly neutralizing anti-HIV antibody, 10-1074, is a highly somatically hypermutated IgG1 being developed for prophylaxis in sub-Saharan Africa. A series of algorithms were applied to identify potentially destabilizing residues in the framework of the Fv region. Of 17 residues defined, a variant was identified encompassing 1 light and 3 heavy chain residues, with significantly increased conformational stability while maintaining full neutralization activity. Central to the stabilization was the replacement of the heavy chain residue T108 with R108 at the base of the CDR3 loop which allowed for the formation of a nascent salt bridge with heavy chain residue D137. Three additional mutations were necessary to confer increased conformational stability as evidenced by differential scanning fluorimetry and isothermal chemical unfolding. In addition, we observed increased stability during low pH incubation in which 40% of the parental monomer aggregated while the combinatorial variant showed no increase in aggregation. Incubation of the variant at 100 mg/mL for 6 weeks at 40°C showed a 9-fold decrease in subvisible particles ≥2 μm relative to the parental molecule. Stability-based designs have also translated to improved pharmacokinetics. Together, these data show that increasing conformational stability of the Fab can have profound effects on the manufacturability and long-term stability of a monoclonal antibody.
Collapse
|
35
|
Geng C, Xue LC, Roel‐Touris J, Bonvin AMJJ. Finding the ΔΔ
G
spot: Are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1410] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Cunliang Geng
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Li C. Xue
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Jorge Roel‐Touris
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| |
Collapse
|
36
|
Strokach A, Corbi-Verge C, Teyra J, Kim PM. Predicting the Effect of Mutations on Protein Folding and Protein-Protein Interactions. Methods Mol Biol 2019; 1851:1-17. [PMID: 30298389 DOI: 10.1007/978-1-4939-8736-8_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The function of a protein is largely determined by its three-dimensional structure and its interactions with other proteins. Changes to a protein's amino acid sequence can alter its function by perturbing the energy landscapes of protein folding and binding. Many tools have been developed to predict the energetic effect of amino acid changes, utilizing features describing the sequence of a protein, the structure of a protein, or both. Those tools can have many applications, such as distinguishing between deleterious and benign mutations and designing proteins and peptides with attractive properties. In this chapter, we describe how to use one of such tools, ELASPIC, to predict the effect of mutations on the stability of proteins and the affinity between proteins, in the context of a human protein-protein interaction network. ELASPIC uses a wide range of sequential and structural features to predict the change in the Gibbs free energy for protein folding and protein-protein interactions. It can be used both through a web server and as a stand-alone application. Since ELASPIC was trained using homology models and not crystal structures, it can be applied to a much broader range of proteins than traditional methods. It can leverage precalculated sequence alignments, homology models, and other features, in order to drastically lower the amount of time required to evaluate individual mutations and make tractable the analysis of millions of mutations affecting the majority of proteins in a genome.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Carles Corbi-Verge
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Joan Teyra
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
37
|
Geng C, Vangone A, Folkers GE, Xue LC, Bonvin AMJJ. iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations. Proteins 2018; 87:110-119. [PMID: 30417935 PMCID: PMC6587874 DOI: 10.1002/prot.25630] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 10/19/2018] [Accepted: 11/05/2018] [Indexed: 02/06/2023]
Abstract
Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning‐based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy‐based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein‐protein complexes. It competes with existing state‐of‐the‐art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2–p53 complex.
Collapse
Affiliation(s)
- Cunliang Geng
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Anna Vangone
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands.,Roche Pharmaceutical Research and Early Development, Large Molecule Research, Roche Innovation Center Penzberg, Penzberg, Germany
| | - Gert E Folkers
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Li C Xue
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
38
|
A comparative analysis of KMT2D missense variants in Kabuki syndrome, cancers and the general population. J Hum Genet 2018; 64:161-170. [PMID: 30459467 DOI: 10.1038/s10038-018-0536-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/10/2018] [Accepted: 10/19/2018] [Indexed: 12/21/2022]
Abstract
Determining the clinical significance of germline and somatic KMT2D missense variants (MVs) in Kabuki syndrome (KS) and cancers can be challenging. We analysed 1920 distinct KMT2D MVs that included 1535 germline MVs in controls (Control-MVs), 584 somatic MVs in cancers (Cancer-MVs) and 201 MV in individuals with KS (KS-MVs). The proportion of MVs likely to affect splicing was significantly higher for Cancer-MVs and KS-MVs than in Control-MVs (p = 0.000018). Our analysis identified significant clustering of Cancer-MVs and KS-MVs in the PHD#3 and #4, RING#4 and SET domains. Areas of enrichment restricted to just Cancer-MVs (FYR-C and between amino acids 3043-3248) or KS-MVs (coiled-coil#5, FYR-N and between amino acids 4995-5090) were also found. Cancer-MVs and KS-MVs tended to affect more conserved residues (lower BLOSUM scores, p < 0.001 and p = 0.007). KS-MVs are more likely to increase the energy for protein folding (higher ELASPIC ∆∆G scores, p = 0.03). Cancer-MVs are more likely to disrupt protein interactions (higher StructMAn scores, p = 0.019). We reclassify several presumed pathogenic MVs as benign or as variants of uncertain significance. We raise the possibility of as yet unrecognised 'non-KS' phenotype(s) associated with some germline pathogenic KMT2D MVs. Overall, this work provides insights into the disease mechanism of KMT2D variants and can be extended to other genes, mutations in which also cause developmental syndromes and cancer.
Collapse
|
39
|
Jetha A, Thorsteinson N, Jmeian Y, Jeganathan A, Giblin P, Fransson J. Homology modeling and structure-based design improve hydrophobic interaction chromatography behavior of integrin binding antibodies. MAbs 2018; 10:890-900. [PMID: 30110240 PMCID: PMC6152428 DOI: 10.1080/19420862.2018.1475871] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Monoclonal antibody (mAb) candidates from high-throughput screening or binding affinity optimization often contain mutations leading to liabilities for further development of the antibody, such as aggregation-prone regions and lack of solubility. In this work, we optimized a candidate integrin α11-binding mAb for developability using molecular modeling, rational design, and hydrophobic interaction chromatography (HIC). A homology model of the parental mAb Fv region was built, and this revealed hydrophobic patches on the surface of the complementarity-determining region loops. A series of 97 variants of the residues primarily responsible for the hydrophobic patches were expressed and their HIC retention times (RT) were measured. As intended, many of the computationally designed variants reduced the HIC RT compared to the parental mAb, and mutating residues that contributed most to hydrophobic patches had the greatest effect on HIC RT. A retrospective analysis was then performed where 3-dimentional protein property descriptors were evaluated for their ability to predict HIC RT using the current series of mAbs. The same descriptors were used to train a simple multi-parameter protein quantitative structure-property relationship model on this data, producing an improved correlation. We also extended this analysis to recently published HIC data for 137 clinical mAb candidates as well as 31 adnectin variants, and found that the surface area of hydrophobic patches averaged over a molecular dynamics sample consistently correlated to the experimental data across a diverse set of biotherapeutics.
Collapse
Affiliation(s)
- Arif Jetha
- a Department of Antibody Discovery and Development, Northern Biologics Inc ., Toronto , Ontario , Canada
| | - Nels Thorsteinson
- b Department of Scientific Services, Chemical Computing Group ULC , Montreal , Quebec , Canada
| | - Yazen Jmeian
- a Department of Antibody Discovery and Development, Northern Biologics Inc ., Toronto , Ontario , Canada
| | - Ajitha Jeganathan
- a Department of Antibody Discovery and Development, Northern Biologics Inc ., Toronto , Ontario , Canada
| | - Patricia Giblin
- a Department of Antibody Discovery and Development, Northern Biologics Inc ., Toronto , Ontario , Canada
| | - Johan Fransson
- a Department of Antibody Discovery and Development, Northern Biologics Inc ., Toronto , Ontario , Canada
| |
Collapse
|
40
|
Taipale M. Disruption of protein function by pathogenic mutations: common and uncommon mechanisms 1. Biochem Cell Biol 2018; 97:46-57. [PMID: 29693415 DOI: 10.1139/bcb-2018-0007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mutations in protein-coding regions underlie almost all Mendelian disorders, drive tumorigenesis, and contribute to susceptibility to common diseases. Despite the great diversity of diseases that are caused by coding mutations, the cellular processes that affect, and are affected by, pathogenic variants at the molecular level are fundamentally conserved. Experimental and computational approaches have revealed that a substantial fraction of disease mutations are not simple loss-of-function alleles. Rather, these pathogenic variants disrupt protein function in more subtle ways by tuning protein folding pathways, altering subcellular trafficking, interrupting signaling cascades, and rewiring highly connected interaction networks. Focusing mainly on Mendelian disorders, this review discusses the common mechanisms by which deleterious mutations disrupt protein function and how these disruptions can be exploited in the development of novel therapies.
Collapse
Affiliation(s)
- Mikko Taipale
- a Donnelly Centre, Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.,b Molecular Architecture of Life Program, Canadian Institute for Advanced Research, Toronto, ON M5S 1M1, Canada
| |
Collapse
|
41
|
Nailwal M, Chauhan JB. Computational Analysis of High-Risk SNPs in Human DBY Gene Responsible for Male Infertility: A Functional and Structural Impact. Interdiscip Sci 2018. [PMID: 29520635 DOI: 10.1007/s12539-018-0290-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
BACKGROUND DEAD-box helicase 3, Y-linked (DBY) is a candidate gene of the AZF region which is involved in spermatogenesis process. Mutations in the DBY gene may disrupt the spermatogenesis and lead to infertility in men. Identification of functionally neutral mutation from the disease-causing mutation is the biggest challenge in human genetic variation analysis. Owing to the importance of DBY in male infertility, functional analysis was carried out to reveal the association between genetic mutation and phenotypic variation through various in silico approaches. METHODS The present study analyzed the functional consequences of the nsSNPs in human DBY gene using SIFT, PolyPhen 2, PROVEAN, SNAP2, PMut, nsSNPAnalyzer, PhD-SNP and SNPs&GO along with stability analysis through I-Mutant2.0, MuPro and iPTREE-STAB. The conservational analysis of amino acid residues, biophysical properties and conserved domains of the DBY protein was analyzed using various computational tools. The 3D structure of the protein was generated using SPARKS-X and validated using RAMPAGE. RESULTS Out of 1130 SNPs reported in dbSNP, only one nsSNP (G300D) was found to have a functional effect on stability as well as the function of the DBY protein. The results showed the presence of G300 in the putative structure of DBY domain. CONCLUSION To the best of our knowledge, this is the first study to detect pathologically significant nsSNPs (G300D) through a computational approach in the DBY which can be useful for development in potent drug discovery studies.
Collapse
Affiliation(s)
- Mili Nailwal
- P.G. Department of Genetics, Ashok and Rita Patel Institute of Integrated Study and Research in Biotechnology and Allied Sciences (ARIBAS), New Vallabh Vidyanagar, Dist-Anand, Gujarat, 388121, India
| | - Jenabhai B Chauhan
- P.G. Department of Genetics, Ashok and Rita Patel Institute of Integrated Study and Research in Biotechnology and Allied Sciences (ARIBAS), New Vallabh Vidyanagar, Dist-Anand, Gujarat, 388121, India.
| |
Collapse
|
42
|
Kampmeyer C, Nielsen SV, Clausen L, Stein A, Gerdes AM, Lindorff-Larsen K, Hartmann-Petersen R. Blocking protein quality control to counter hereditary cancers. Genes Chromosomes Cancer 2017; 56:823-831. [PMID: 28779490 DOI: 10.1002/gcc.22487] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 07/29/2017] [Accepted: 08/01/2017] [Indexed: 12/28/2022] Open
Abstract
Inhibitors of molecular chaperones and the ubiquitin-proteasome system have already been clinically implemented to counter certain cancers, including multiple myeloma and mantle cell lymphoma. The efficacy of this treatment relies on genomic alterations in cancer cells causing a proteostatic imbalance, which makes them more dependent on protein quality control (PQC) mechanisms than normal cells. Accordingly, blocking PQC, e.g. by proteasome inhibitors, may cause a lethal proteotoxic crisis in cancer cells, while leaving normal cells unaffected. Evidence, however, suggests that the PQC system operates by following a better-safe-than-sorry principle and is thus prone to target proteins that are only slightly structurally perturbed, but still functional. Accordingly, implementing PQC inhibitors may also, through an entirely different mechanism, hold potential for other cancers. Several inherited cancer susceptibility syndromes, such as Lynch syndrome and von Hippel-Lindau disease, are caused by missense mutations in tumor suppressor genes, and in some cases, the resulting amino acid substitutions in the encoded proteins cause the cellular PQC system to target them for degradation, although they may still retain function. As a consequence of this over-meticulous PQC mechanism, the cell may end up with an insufficient amount of the abnormal, but functional, protein, which in turn leads to a loss-of-function phenotype and manifestation of the disease. Increasing the amounts of such proteins by stabilizing with chemical chaperones, or by targeting molecular chaperones or the ubiquitin-proteasome system, may thus avert or delay the disease onset. Here, we review the potential of targeting the PQC system in hereditary cancer susceptibility syndromes.
Collapse
Affiliation(s)
- Caroline Kampmeyer
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| | - Sofie V Nielsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| | - Lene Clausen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| | - Anne-Marie Gerdes
- Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, Copenhagen, DK-2100, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK-2200, Denmark
| |
Collapse
|
43
|
Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 2017; 292:14349-14361. [PMID: 28710274 DOI: 10.1074/jbc.m117.784165] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/11/2017] [Indexed: 01/18/2023] Open
Abstract
Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.
Collapse
Affiliation(s)
- Aron Broom
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Zachary Jacobi
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Kyle Trainor
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | | |
Collapse
|
44
|
Cang Z, Wei GW. TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Comput Biol 2017; 13:e1005690. [PMID: 28749969 PMCID: PMC5549771 DOI: 10.1371/journal.pcbi.1005690] [Citation(s) in RCA: 161] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 08/08/2017] [Accepted: 07/18/2017] [Indexed: 11/18/2022] Open
Abstract
Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the geometric and biological complexity. To address this problem we introduce the element-specific persistent homology (ESPH) method. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains important biological information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomolecules. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topological convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes. AVAILABILITY weilab.math.msu.edu/TDL/.
Collapse
Affiliation(s)
- Zixuan Cang
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
45
|
Steinbrecher T, Zhu C, Wang L, Abel R, Negron C, Pearlman D, Feyfant E, Duan J, Sherman W. Predicting the Effect of Amino Acid Single-Point Mutations on Protein Stability—Large-Scale Validation of MD-Based Relative Free Energy Calculations. J Mol Biol 2017; 429:948-963. [DOI: 10.1016/j.jmb.2016.12.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 12/02/2016] [Accepted: 12/02/2016] [Indexed: 12/22/2022]
|
46
|
Li M, Goncearenco A, Panchenko AR. Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols. Methods Mol Biol 2017; 1550:235-260. [PMID: 28188534 PMCID: PMC5388446 DOI: 10.1007/978-1-4939-6747-6_17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
47
|
Gromiha MM, Yugandhar K, Jemimah S. Protein-protein interactions: scoring schemes and binding affinity. Curr Opin Struct Biol 2016; 44:31-38. [PMID: 27866112 DOI: 10.1016/j.sbi.2016.10.016] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/30/2016] [Accepted: 10/25/2016] [Indexed: 01/16/2023]
Abstract
Protein-protein interactions mediate several cellular functions, which can be understood from the information obtained using the three-dimensional structures of protein-protein complexes and binding affinity data. This review focuses on computational aspects of predicting the best native-like complex structure and binding affinities. The first part covers the prediction of protein-protein complex structures and the advantages of conformational searching and scoring functions in protein-protein docking. The second part is devoted to various aspects of protein-protein interaction thermodynamics, such as databases for binding affinities and other thermodynamic parameters, computational methods to predict the binding affinity using either the three-dimensional structures of complexes or amino acid sequences, and change in binding affinities of the complexes upon mutations. We provide the latest developments on protein-protein docking and binding affinity studies along with a list of available computational resources for understanding protein-protein interactions.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India.
| | - K Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| | - Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| |
Collapse
|
48
|
Kroncke BM, Duran AM, Mendenhall JL, Meiler J, Blume JD, Sanders CR. Documentation of an Imperative To Improve Methods for Predicting Membrane Protein Stability. Biochemistry 2016; 55:5002-9. [PMID: 27564391 PMCID: PMC5024705 DOI: 10.1021/acs.biochem.6b00537] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
![]()
There
is a compelling and growing need to accurately predict the
impact of amino acid mutations on protein stability for problems in
personalized medicine and other applications. Here the ability of
10 computational tools to accurately predict mutation-induced perturbation
of folding stability (ΔΔG) for membrane
proteins of known structure was assessed. All methods for predicting
ΔΔG values performed significantly worse
when applied to membrane proteins than when applied to soluble proteins,
yielding estimated concordance, Pearson, and Spearman correlation
coefficients of <0.4 for membrane proteins. Rosetta and PROVEAN
showed a modest ability to classify mutations as destabilizing (ΔΔG < −0.5 kcal/mol), with a 7 in 10 chance of correctly
discriminating a randomly chosen destabilizing variant from a randomly
chosen stabilizing variant. However, even this performance is significantly
worse than for soluble proteins. This study highlights the need for
further development of reliable and reproducible methods for predicting
thermodynamic folding stability in membrane proteins.
Collapse
Affiliation(s)
- Brett M Kroncke
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| | - Amanda M Duran
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| | - Jeffrey L Mendenhall
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| | - Jens Meiler
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| | - Jeffrey D Blume
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| | - Charles R Sanders
- Department of Biochemistry, ‡Center for Structural Biology, §Departments of Chemistry, Pharmacology, and Bioinformatics, and ∥Department of Biostatistics, Vanderbilt University , Nashville, Tennessee 37240, United States
| |
Collapse
|
49
|
Geng C, Vangone A, Bonvin AMJJ. Exploring the interplay between experimental methods and the performance of predictors of binding affinity change upon mutations in protein complexes. Protein Eng Des Sel 2016; 29:291-299. [PMID: 27284087 DOI: 10.1093/protein/gzw020] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 05/09/2016] [Indexed: 11/14/2022] Open
Abstract
Reliable prediction of binding affinity changes (ΔΔG) upon mutations in protein complexes relies not only on the performance of computational methods but also on the availability and quality of experimental data. Binding affinity changes can be measured by various experimental methods with different accuracies and limitations. To understand the impact of these on the prediction of binding affinity change, we present the Database of binding Affinity Change Upon Mutation (DACUM), a database of 1872 binding affinity changes upon single-point mutations, a subset of the SKEMPI database (Moal,I.H. and Fernández-Recio,J. Bioinformatics, 2012;28:2600-2607) extended with information on the experimental methods used for ΔΔG measurements. The ΔΔG data were classified into different data sets based on the experimental method used and the position of the mutation (interface and non-interface). We tested the prediction performance of the original HADDOCK score, a newly trained version of it and mutation Cutoff Scanning Matrix (Pires,D.E.V., Ascher,D.B. and Blundell,T.L. Bioinformatics 2014;30:335-342), one of the best reported ΔΔG predictors so far, on these various data sets. Our results demonstrate a strong impact of the experimental methods on the performance of binding affinity change predictors for protein complexes. This underscores the importance of properly considering and carefully choosing experimental methods in the development of novel binding affinity change predictors. The DACUM database is available online at https://github.com/haddocking/DACUM.
Collapse
Affiliation(s)
- Cunliang Geng
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Padualaan 8, Utrecht 3584 CH, The Netherlands
| | - Anna Vangone
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Padualaan 8, Utrecht 3584 CH, The Netherlands
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Padualaan 8, Utrecht 3584 CH, The Netherlands
| |
Collapse
|
50
|
Dourado DFAR, Flores SC. Modeling and fitting protein-protein complexes to predict change of binding energy. Sci Rep 2016; 6:25406. [PMID: 27173910 PMCID: PMC4865953 DOI: 10.1038/srep25406] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 04/18/2016] [Indexed: 01/18/2023] Open
Abstract
It is possible to accurately and economically predict change in protein-protein interaction energy upon mutation (ΔΔG), when a high-resolution structure of the complex is available. This is of growing usefulness for design of high-affinity or otherwise modified binding proteins for therapeutic, diagnostic, industrial, and basic science applications. Recently the field has begun to pursue ΔΔG prediction for homology modeled complexes, but so far this has worked mostly for cases of high sequence identity. If the interacting proteins have been crystallized in free (uncomplexed) form, in a majority of cases it is possible to find a structurally similar complex which can be used as the basis for template-based modeling. We describe how to use MMB to create such models, and then use them to predict ΔΔG, using a dataset consisting of free target structures, co-crystallized template complexes with sequence identify with respect to the targets as low as 44%, and experimental ΔΔG measurements. We obtain similar results by fitting to a low-resolution Cryo-EM density map. Results suggest that other structural constraints may lead to a similar outcome, making the method even more broadly applicable.
Collapse
Affiliation(s)
- Daniel F A R Dourado
- Department of Cell and Molecular Biology, Computational and Systems Biology, Uppsala University, Biomedical Center Box 596, 751 24, Uppsala, Sweden
| | - Samuel Coulbourn Flores
- Department of Cell and Molecular Biology, Computational and Systems Biology, Uppsala University, Biomedical Center Box 596, 751 24, Uppsala, Sweden
| |
Collapse
|