1
|
Wang R, Zhang W, Ma H, Zou D, Zhang Z, Wang S. Structural insights into the binding of zoledronic acid with RANKL via computational simulations. Front Mol Biosci 2022; 9:992473. [PMID: 36200071 PMCID: PMC9527314 DOI: 10.3389/fmolb.2022.992473] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022] Open
Abstract
Zoledronic acid (ZOL) inhibits receptor activator of nuclear factor-κB ligand (RANKL) and reduces bone turnover. This plays an important role in the development of bisphosphonate-related osteonecrosis of the jaw (BRONJ). Previous reports have shown that ZOL binds to the enzyme farnesyl pyrophosphate synthase (FPPS) to block its activity. However, the mechanism of action of ZOL and its interaction with RANKL is still unclear. In this study, we confirmed that ZOL significantly suppressed the bone remodeling in ZOL-treated rats, investigated whether ZOL could bind to RANKL and examined the interactions between these molecules at the atomic level. Surface plasmon resonance (SPR) assay was performed to validate that ZOL could directly bind to RANKL in a dose dependent manner, and the equilibrium constant was calculated (KD = 2.28 × 10−4 M). Then, we used molecular docking simulation to predict the binding site and analyze the binding characteristics of ZOL and RANKL. Through molecular dynamics simulation, we confirmed the stable binding between ZOL and RANKL and observed their dynamic interactions over time. Binding free energy calculations and its decomposition were conducted to obtain the binding free energy −70.67 ± 2.62 kJ/mol for the RANKL–ZOL complex. We identified the key residues of RANKL in the binding region, and these included Tyr217(A), Val277(A), Gly278(A), Val277(B), Gly278(B), and Tyr215(C). Taken together, our results demonstrated the direct interaction between ZOL and RANKL, indicating that the pharmacological action of ZOL might be closely related to RANKL. The design of novel small molecules targeting RANKL might reduce the occurrence of BRONJ.
Collapse
Affiliation(s)
- Ruijie Wang
- Department of Oral Surgery, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Wenjie Zhang
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
- Department of Prosthodontics, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hailong Ma
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
- Department of Oral Maxillofacial-Head and Neck Oncology, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Duohong Zou
- Department of Oral Surgery, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Zhiyuan Zhang
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
- Department of Oral Maxillofacial-Head and Neck Oncology, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- *Correspondence: Zhiyuan Zhang, ; Shaoyi Wang,
| | - Shaoyi Wang
- Department of Oral Surgery, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Research Unit of Oral and Maxillofacial Regenerative Medicine, National Center for Stomatology, National Clinical Research Center for Oral Diseases, College of Stomatology, Chinese Academy of Medical Sciences, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Zhiyuan Zhang, ; Shaoyi Wang,
| |
Collapse
|
2
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
3
|
Olechnovič K, Venclovas Č. VoroContacts: a tool for the analysis of interatomic contacts in macromolecular structures. Bioinformatics 2021; 37:4873-4875. [PMID: 34132767 DOI: 10.1093/bioinformatics/btab448] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/03/2021] [Accepted: 06/14/2021] [Indexed: 11/12/2022] Open
Abstract
SUMMARY VoroContacts is a versatile tool for computing and analyzing contact surface areas (CSAs) and solvent accessible surface areas (SASAs) for 3 D structures of proteins, nucleic acids and their complexes at the atomic resolution. CSAs and SASAs are derived using Voronoi tessellation of 3 D structure, represented as a collection of atomic balls. VoroContacts web server features a highly configurable query interface, which enables on-the-fly analysis of contacts for selected set of atoms and allows filtering interatomic contacts by their type, surface areas, distance between contacting atoms and sequence separation between contacting residues. The VoroContacts functionality is also implemented as part of the standalone Voronota package, enabling batch processing. AVAILABILITY AND IMPLEMENTATION https://bioinformatics.lt/wtsam/vorocontacts. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| |
Collapse
|
4
|
Alam FF, Shehu A. Unsupervised multi-instance learning for protein structure determination. J Bioinform Comput Biol 2021; 19:2140002. [PMID: 33568002 DOI: 10.1142/s0219720021400023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Many regions of the protein universe remain inaccessible by wet-laboratory or computational structure determination methods. A significant challenge in elucidating these dark regions in silico relates to the ability to discriminate relevant structure(s) among many structures/decoys computed for a protein of interest, a problem known as decoy selection. Clustering decoys based on geometric similarity remains popular. However, it is unclear how exactly to exploit the groups of decoys revealed via clustering to select individual structures for prediction. In this paper, we provide an intuitive formulation of the decoy selection problem as an instance of unsupervised multi-instance learning. We address the problem in three stages, first organizing given decoys of a protein molecule into bags, then identifying relevant bags, and finally drawing individual instances from these bags to offer as prediction. We propose both non-parametric and parametric algorithms for drawing individual instances. Our evaluation utilizes two datasets, one benchmark dataset of ensembles of decoys for a varied list of protein molecules, and a dataset of decoy ensembles for targets drawn from recent CASP competitions. A comparative analysis with state-of-the-art methods reveals that the proposed approach outperforms existing methods, thus warranting further investigation of multi-instance learning to advance our treatment of decoy selection.
Collapse
Affiliation(s)
- Fardina Fathmiul Alam
- Department of Computer Science, George Mason University, Fairfax, Virginia 22030, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia 22030, USA
| |
Collapse
|
5
|
Akhter N, Chennupati G, Djidjev H, Shehu A. Decoy selection for protein structure prediction via extreme gradient boosting and ranking. BMC Bioinformatics 2020; 21:189. [PMID: 33297949 PMCID: PMC7724862 DOI: 10.1186/s12859-020-3523-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open
Abstract
Background Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. Results We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. Conclusions ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Bikini At al Rd., Los Alamos, 87545, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Bikini At al Rd., Los Alamos, 87545, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.,Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA.,School of Systems Biology, George Mason University, Manassas, 20110, VA, USA
| |
Collapse
|
6
|
Ribeiro J, Ríos-Vera C, Melo F, Schüller A. Calculation of accurate interatomic contact surface areas for the quantitative analysis of non-bonded molecular interactions. Bioinformatics 2020; 35:3499-3501. [PMID: 30698657 PMCID: PMC6748739 DOI: 10.1093/bioinformatics/btz062] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 12/24/2018] [Accepted: 01/24/2019] [Indexed: 12/02/2022] Open
Abstract
Summary Intra- and intermolecular contact surfaces are routinely calculated for a large array of applications in bioinformatics but are typically approximated from differential solvent accessible surface area calculations and not calculated directly. These approximations do not properly take the effects of neighboring atoms into account and tend to deviate considerably from the true contact surface. We implemented an extension of the original Shrake-Rupley algorithm to accurately estimate interatomic contact surface areas of molecular structures and complexes. Our extended algorithm is able to calculate the contact area of an atom to all nearby atoms by directly calculating overlapping surface patches, taking into account the possible shielding effects of neighboring atoms. Here, we present a versatile software tool and web server for the calculation of contact surface areas, as well as buried surface areas and solvent accessible surface areas (SASA) for different types of biomolecules, such as proteins, nucleic acids and small organic molecules. Detailed results are provided in tab-separated values format for analysis and Protein Databank files for visualization. Direct contact surface area calculation resulted in improved accuracy in a benchmark with a non-redundant set of 245 protein–DNA complexes. SASA-based approximations underestimated protein–DNA contact surfaces on average by 40%. This software tool may be useful for surface-based intra- and intermolecular interaction analyses and scoring function development. Availability and implementation A web server, stand-alone binaries for Linux, MacOS and Windows and C++ source code are freely available from http://schuellerlab.org/dr_sasa/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Judemir Ribeiro
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Carlos Ríos-Vera
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Andreas Schüller
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
7
|
Tadepalli S, Akhter N, Barbara D, Shehu A. Anomaly Detection-Based Recognition of Near-Native Protein Structures. IEEE Trans Nanobioscience 2020; 19:562-570. [PMID: 32340957 DOI: 10.1109/tnb.2020.2990642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The three-dimensional structures populated by a protein molecule determine to a great extent its biological activities. The rich information encoded by protein structure on protein function continues to motivate the development of computational approaches for determining functionally-relevant structures. The majority of structures generated in silico are not relevant. Discriminating relevant/native protein structures from non-native ones is an outstanding challenge in computational structural biology. Inherently, this is a recognition problem that can be addressed under the umbrella of machine learning. In this paper, based on the premise that near-native structures are effectively anomalies, we build on the concept of anomaly detection in machine learning. We propose methods that automatically select relevant subsets, as well as methods that select a single structure to offer as prediction. Evaluations are carried out on benchmark datasets and demonstrate that the proposed methods advance the state of the art. The presented results motivate further building on and adapting concepts and techniques from machine learning to improve recognition of near-native structures in protein structure prediction.
Collapse
|
8
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
9
|
Yu Z, Yao Y, Deng H, Yi M. ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment. BMC Bioinformatics 2019; 20:299. [PMID: 31159742 PMCID: PMC6547486 DOI: 10.1186/s12859-019-2898-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/13/2019] [Indexed: 01/05/2023] Open
Abstract
Background The knowledge-based statistical potential has been widely used in protein structure modeling and model quality assessment. They are commonly evaluated based on their abilities of native recognition as well as decoy discrimination. However, these two aspects are found to be mutually exclusive in many statistical potentials. Results We developed an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter. When distance cutoff is ≤9.0 Å, “effective atomic interaction” is employed to enhance the ability of native recognition. For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination. Benchmark tests on 632 structural decoy sets from diverse sources demonstrate that ANDIS outperforms other state-of-the-art potentials in both native recognition and decoy discrimination. Conclusions Distance cutoff is a crucial parameter for distance-dependent statistical potentials. A lower distance cutoff is better for native recognition, while a higher one is favorable for decoy discrimination. The ANDIS potential is freely available as a standalone application at http://qbp.hzau.edu.cn/ANDIS/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2898-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongwang Yu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
10
|
Battisti A, Zamuner S, Sarti E, Laio A. Toward a unified scoring function for native state discrimination and drug-binding pocket recognition. Phys Chem Chem Phys 2019; 20:17148-17155. [PMID: 29900428 DOI: 10.1039/c7cp08170g] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Protein folding and receptor-ligand recognition are fundamental processes for any living organism. Although folding and ligand recognition are based on the same chemistry, the existing empirical scoring functions target just one problem: predicting the correct fold or the correct binding pose. We here introduce a statistical potential which considers moieties as fundamental units. The scoring function is able to deal with both folding and ligand pocket recognition problems with a performance comparable to the scoring functions specifically tailored for one of the two tasks. We foresee that the capability of the new scoring function to tackle both problems in a unified framework will be a key to deal with the induced fit phenomena, in which a target protein changes significantly its conformation upon binding. Moreover, the new scoring function might be useful in docking protocols towards intrinsically disordered proteins, whose flexibility cannot be handled with the available docking software.
Collapse
Affiliation(s)
- Anna Battisti
- International School for Advanced Studies (SISSA), Via Bonomea 265, I-34136 Trieste, Italy.
| | | | | | | |
Collapse
|
11
|
Frappier V, Jenson JM, Zhou J, Grigoryan G, Keating AE. Tertiary Structural Motif Sequence Statistics Enable Facile Prediction and Design of Peptides that Bind Anti-apoptotic Bfl-1 and Mcl-1. Structure 2019; 27:606-617.e5. [PMID: 30773399 PMCID: PMC6447450 DOI: 10.1016/j.str.2019.01.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Revised: 12/20/2018] [Accepted: 01/18/2019] [Indexed: 12/25/2022]
Abstract
Understanding the relationship between protein sequence and structure well enough to design new proteins with desired functions is a longstanding goal in protein science. Here, we show that recurring tertiary structural motifs (TERMs) in the PDB provide rich information for protein-peptide interaction prediction and design. TERM statistics can be used to predict peptide binding energies for Bcl-2 family proteins as accurately as widely used structure-based tools. Furthermore, design using TERM energies (dTERMen) rapidly and reliably generates high-affinity peptide binders of anti-apoptotic proteins Bfl-1 and Mcl-1 with just 15%-38% sequence identity to any known native Bcl-2 family protein ligand. High-resolution structures of four designed peptides bound to their targets provide opportunities to analyze the strengths and limitations of the computational design method. Our results support dTERMen as a powerful approach that can complement existing tools for protein engineering.
Collapse
Affiliation(s)
- Vincent Frappier
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Justin M Jenson
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA; Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA; Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA.
| | - Amy E Keating
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Koch Center for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
12
|
Qi HW, Kulik HJ. Evaluating Unexpectedly Short Non-covalent Distances in X-ray Crystal Structures of Proteins with Electronic Structure Analysis. J Chem Inf Model 2019; 59:2199-2211. [DOI: 10.1021/acs.jcim.9b00144] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Helena W. Qi
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
13
|
An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction. COMPUTATION 2018. [DOI: 10.3390/computation6020039] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
A missense mutation in MYH1 is associated with susceptibility to immune-mediated myositis in Quarter Horses. Skelet Muscle 2018; 8:7. [PMID: 29510741 PMCID: PMC5838957 DOI: 10.1186/s13395-018-0155-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 02/25/2018] [Indexed: 12/30/2022] Open
Abstract
Background The cause of immune-mediated myositis (IMM), characterized by recurrent, rapid-onset muscle atrophy in Quarter Horses (QH), is unknown. The histopathologic hallmark of IMM is lymphocytic infiltration of myofibers. The purpose of this study was to identify putative functional variants associated with equine IMM. Methods A genome-wide association (GWA) study was performed on 36 IMM QHs and 54 breed matched unaffected QHs from the same environment using the Equine SNP50 and SNP70 genotyping arrays. Results A mixed model analysis identified nine SNPs within a ~ 2.87 Mb region on chr11 that were significantly (Punadjusted < 1.4 × 10− 6) associated with the IMM phenotype. Associated haplotypes within this region encompassed 38 annotated genes, including four myosin genes (MYH1, MYH2, MYH3, and MYH13). Whole genome sequencing of four IMM and four unaffected QHs identified a single segregating nonsynonymous E321G mutation in MYH1 encoding myosin heavy chain 2X. Genotyping of additional 35 IMM and 22 unaffected QHs confirmed an association (P = 2.9 × 10− 5), and the putative mutation was absent in 175 horses from 21 non-QH breeds. Lymphocytic infiltrates occurred in type 2X myofibers and the proportion of 2X fibers was decreased in the presence of inflammation. Protein modeling and contact/stability analysis identified 14 residues affected by the mutation which significantly decreased stability. Conclusions We conclude that a mutation in MYH1 is highly associated with susceptibility to the IMM phenotype in QH-related breeds. This is the first report of a mutation in MYH1 and the first link between a skeletal muscle myosin mutation and autoimmune disease. Electronic supplementary material The online version of this article (10.1186/s13395-018-0155-0) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Congenital Sucrase-isomaltase Deficiency: A Novel Compound Heterozygous Mutation Causing Aberrant Protein Localization. J Pediatr Gastroenterol Nutr 2017; 64:770-776. [PMID: 27749612 PMCID: PMC8176889 DOI: 10.1097/mpg.0000000000001424] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
OBJECTIVES Congenital diarrheal disorders is a group of inherited enteropathies presenting in early life and requiring parenteral nutrition. In most cases, genetics may be the key for precise diagnosis. We present an infant girl with chronic congenital diarrhea that resolved after introduction of fructose-based formula but had no identified mutation in the SLC5A1 gene. Using whole exome sequencing (WES) we identified other mutations that better dictated dietary adjustments. METHODS WES of the patient and her parents was performed. The analysis focused on recessive model including compound heterozygous mutations. Sanger sequencing was used to validate identified mutations and to screen the patient's newborn sister and grandparents. Expression and localization analysis were performed in the patient's duodenal biopsies using immunohistochemistry. RESULTS Using WES we identified a new compound heterozygote mutation in sucrase-isomaltase (SI) gene; a maternal inherited known V577G mutation, and a novel paternal inherited C1531W mutation. Importantly, the newborn offspring carried similar compound heterozygous mutations. Computational predictions suggest that both mutations highly destabilize the protein. SI expression and localization studies determined that the mutated SI protein was not expressed on the brush border membrane in the patient's duodenal biopsies, verifying the diagnosis of congenital sucrase-isomaltase deficiency (CSID). CONCLUSIONS The novel compound heterozygote V577G/C1531W SI mutations lead to lack of SI expression in the duodenal brush border, confirming the diagnosis of CSID. These cases of CSID extend the molecular spectrum of this condition, further directing a more adequate dietary intervention for the patient and newborn sibling.
Collapse
|
16
|
Olechnovič K, Venclovas Č. VoroMQA: Assessment of protein structure quality using interatomic contact areas. Proteins 2017; 85:1131-1145. [DOI: 10.1002/prot.25278] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 01/13/2017] [Accepted: 02/21/2017] [Indexed: 12/14/2022]
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Vilnius University; Saulėtekio 7 LT-10257 Vilnius Lithuania
- Faculty of Mathematics and Informatics; Vilnius University; Naugarduko 24 LT-03225 Vilnius Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Vilnius University; Saulėtekio 7 LT-10257 Vilnius Lithuania
| |
Collapse
|
17
|
Solomon O, Kunik V, Simon A, Kol N, Barel O, Lev A, Amariglio N, Somech R, Rechavi G, Eyal E. G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures. BMC Genomics 2016; 17:681. [PMID: 27565432 PMCID: PMC5002099 DOI: 10.1186/s12864-016-3028-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 08/19/2016] [Indexed: 11/10/2022] Open
Abstract
Background Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. Results We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D. Conclusions G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.
Collapse
Affiliation(s)
- Oz Solomon
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel.,The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Vered Kunik
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel
| | - Amos Simon
- Pediatric Immunology Service, Jeffrey Modell Foundation, Sheba Medical Center, Ramat-Gan, Israel
| | - Nitzan Kol
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel
| | - Ortal Barel
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel
| | - Atar Lev
- Pediatric Immunology Service, Jeffrey Modell Foundation, Sheba Medical Center, Ramat-Gan, Israel
| | - Ninette Amariglio
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel.,The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Raz Somech
- Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat-Gan, Israel.,Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Gidi Rechavi
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel.,Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran Eyal
- Cancer Research Center, Sheba Medical Center, Ramat-Gan, Israel.
| |
Collapse
|
18
|
Verdonk ML, Ludlow RF, Giangreco I, Rathi PC. Protein–Ligand Informatics Force Field (PLIff): Toward a Fully Knowledge Driven “Force Field” for Biomolecular Interactions. J Med Chem 2016; 59:6891-902. [DOI: 10.1021/acs.jmedchem.6b00716] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Marcel L. Verdonk
- Astex Pharmaceuticals, 436
Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - R. Frederick Ludlow
- Astex Pharmaceuticals, 436
Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - Ilenia Giangreco
- Astex Pharmaceuticals, 436
Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
- Dipartimento
Farmaco-Chimico, University of Bari, Via Orabona 4, I-70125 Bari, Italy
| | - Prakash Chandra Rathi
- Astex Pharmaceuticals, 436
Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| |
Collapse
|
19
|
Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015; 427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]
|
20
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
21
|
Jayaram B, Dhingra P, Mishra A, Kaushik R, Mukherjee G, Singh A, Shekhar S. Bhageerath-H: a homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins. BMC Bioinformatics 2014; 15 Suppl 16:S7. [PMID: 25521245 PMCID: PMC4290660 DOI: 10.1186/1471-2105-15-s16-s7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. RESULTS Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥ 0.5 in 91% of the cases and Cα RMSDs ≤ 5 Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. CONCLUSION Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment.
Collapse
|
22
|
Frappier V, Najmanovich R. Vibrational entropy differences between mesophile and thermophile proteins and their use in protein engineering. Protein Sci 2014; 24:474-83. [PMID: 25367089 DOI: 10.1002/pro.2592] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 10/16/2014] [Accepted: 10/17/2014] [Indexed: 11/10/2022]
Abstract
We recently introduced ENCoM, an elastic network atomic contact model, as the first coarse-grained normal mode analysis method that accounts for the nature of amino acids and can predict the effect of mutations on thermostability based on changes vibrational entropy. In this proof-of-concept article, we use pairs of mesophile and thermophile homolog proteins with identical structures to determine if a measure of vibrational entropy based on normal mode analysis can discriminate thermophile from mesophile proteins. We observe that in around 60% of cases, thermophile proteins are more rigid at equivalent temperatures than their mesophile counterpart and this difference can guide the design of proteins to increase their thermostability through series of mutations. We observe that mutations separating thermophile proteins from their mesophile orthologs contribute independently to a decrease in vibrational entropy and discuss the application and implications of this methodology to protein engineering.
Collapse
Affiliation(s)
- Vincent Frappier
- Department of Biochemistry, Faculty of Medicine and Health Sciences, University of Sherbrooke, J1H 5N4, Quebec, Canada
| | | |
Collapse
|
23
|
Mishra A, Rana PS, Mittal A, Jayaram B. D2N: Distance to the native. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:1798-807. [DOI: 10.1016/j.bbapap.2014.07.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 07/03/2014] [Accepted: 07/15/2014] [Indexed: 12/26/2022]
|
24
|
Liu Y, Zeng J, Gong H. Improving the orientation-dependent statistical potential using a reference state. Proteins 2014; 82:2383-93. [DOI: 10.1002/prot.24600] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Revised: 04/30/2014] [Accepted: 05/05/2014] [Indexed: 12/23/2022]
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory of Bioinformatics; School of Life Sciences, Tsinghua University; Beijing 100084 China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University; Beijing 100084 China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics; School of Life Sciences, Tsinghua University; Beijing 100084 China
| |
Collapse
|
25
|
Mahalingam R, Peng HP, Yang AS. Prediction of fatty acid-binding residues on protein surfaces with three-dimensional probability distributions of interacting atoms. Biophys Chem 2014; 192:10-9. [PMID: 24934883 DOI: 10.1016/j.bpc.2014.05.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/22/2014] [Accepted: 05/22/2014] [Indexed: 10/25/2022]
Abstract
Protein-fatty acid interaction is vital for many cellular processes and understanding this interaction is important for functional annotation as well as drug discovery. In this work, we present a method for predicting the fatty acid (FA)-binding residues by using three-dimensional probability density distributions of interacting atoms of FAs on protein surfaces which are derived from the known protein-FA complex structures. A machine learning algorithm was established to learn the characteristic patterns of the probability density maps specific to the FA-binding sites. The predictor was trained with five-fold cross validation on a non-redundant training set and then evaluated with an independent test set as well as on holo-apo pair's dataset. The results showed good accuracy in predicting the FA-binding residues. Further, the predictor developed in this study is implemented as an online server which is freely accessible at the following website, http://ismblab.genomics.sinica.edu.tw/.
Collapse
Affiliation(s)
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan; Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan; Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.
| |
Collapse
|
26
|
Sun J, Kudahl UJ, Simon C, Cao Z, Reinherz EL, Brusic V. Large-scale analysis of B-cell epitopes on influenza virus hemagglutinin - implications for cross-reactivity of neutralizing antibodies. Front Immunol 2014; 5:38. [PMID: 24570677 PMCID: PMC3916768 DOI: 10.3389/fimmu.2014.00038] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 01/22/2014] [Indexed: 11/13/2022] Open
Abstract
Influenza viruses continue to cause substantial morbidity and mortality worldwide. Fast gene mutation on surface proteins of influenza virus result in increasing resistance to current vaccines and available antiviral drugs. Broadly neutralizing antibodies (bnAbs) represent targets for prophylactic and therapeutic treatments of influenza. We performed a systematic bioinformatics study of cross-reactivity of neutralizing antibodies (nAbs) against influenza virus surface glycoprotein hemagglutinin (HA). This study utilized the available crystal structures of HA complexed with the antibodies for the analysis of tens of thousands of HA sequences. The detailed description of B-cell epitopes, measurement of epitope area similarity among different strains, and estimation of antibody neutralizing coverage provide insights into cross-reactivity status of existing nAbs against influenza virus. We have developed a method to assess the likely cross-reactivity potential of bnAbs for influenza strains, either newly emerged or existing. Our method catalogs influenza strains by a new concept named discontinuous peptide, and then provide assessment of cross-reactivity. Potentially cross-reactive strains are those that share 100% identity with experimentally verified neutralized strains. By cataloging influenza strains and their B-cell epitopes for known bnAbs, our method provides guidance for selection of representative strains for further experimental design. The knowledge of sequences, their B-cell epitopes, and differences between historical influenza strains, we enhance our preparedness and the ability to respond to the emerging pandemic threats.
Collapse
Affiliation(s)
- Jing Sun
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA ; Department of Medicine, Harvard Medical School , Boston, MA , USA
| | - Ulrich J Kudahl
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA ; Center for Biological Sequence Analysis, Technical University of Denmark , Lyngby , Denmark
| | - Christian Simon
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA ; Center for Biological Sequence Analysis, Technical University of Denmark , Lyngby , Denmark
| | - Zhiwei Cao
- School of Life Sciences and Technology, Tongji University , Shanghai , China
| | - Ellis L Reinherz
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA ; Department of Medicine, Harvard Medical School , Boston, MA , USA ; Laboratory of Immunobiology, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School , Boston, MA , USA ; Department of Medicine, Harvard Medical School , Boston, MA , USA
| |
Collapse
|
27
|
Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: assessment of protein interfaces and loops. Bioinformatics 2013; 29:3158-66. [PMID: 24078704 PMCID: PMC3842762 DOI: 10.1093/bioinformatics/btt560] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 08/13/2013] [Accepted: 09/22/2013] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state. RESULTS We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures. AVAILABILITY AND IMPLEMENTATION SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Collapse
Affiliation(s)
- Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
28
|
Li J, Mach P, Koehl P. Measuring the shapes of macromolecules - and why it matters. Comput Struct Biotechnol J 2013; 8:e201309001. [PMID: 24688748 PMCID: PMC3962087 DOI: 10.5936/csbj.201309001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 11/22/2013] [Accepted: 11/22/2013] [Indexed: 11/22/2022] Open
Abstract
The molecular basis of life rests on the activity of biological macromolecules, mostly nucleic acids and proteins. A perhaps surprising finding that crystallized over the last handful of decades is that geometric reasoning plays a major role in our attempt to understand these activities. In this paper, we address this connection between geometry and biology, focusing on methods for measuring and characterizing the shapes of macromolecules. We briefly review existing numerical and analytical approaches that solve these problems. We cover in more details our own work in this field, focusing on the alpha shape theory as it provides a unifying mathematical framework that enable the analytical calculations of the surface area and volume of a macromolecule represented as a union of balls, the detection of pockets and cavities in the molecule, and the quantification of contacts between the atomic balls. We have shown that each of these quantities can be related to physical properties of the molecule under study and ultimately provides insight on its activity. We conclude with a brief description of new challenges for the alpha shape theory in modern structural biology.
Collapse
Affiliation(s)
- Jie Li
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, United States
| | - Paul Mach
- Graduate Group of Applied Mathematics, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| |
Collapse
|
29
|
Mirzaie M, Sadeghi M. Delaunay-based nonlocal interactions are sufficient and accurate in protein fold recognition. Proteins 2013; 82:415-23. [DOI: 10.1002/prot.24407] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 08/12/2013] [Accepted: 08/21/2013] [Indexed: 01/05/2023]
Affiliation(s)
- Mehdi Mirzaie
- Department of Basic Sciences, Faculty of Paramedical Sciences; Shahid Beheshti University of Medical Sciences; Tehran Iran
- Department of Bioinformatics; School of Computer Science, Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| | - Mehdi Sadeghi
- Department of Bioinformatics, National Institute of Genetic Engineering and Biotechnology; Tehran Iran
| |
Collapse
|
30
|
Olsen LR, Kudahl UJ, Simon C, Sun J, Schönbach C, Reinherz EL, Zhang GL, Brusic V. BlockLogo: visualization of peptide and sequence motif conservation. J Immunol Methods 2013; 400-401:37-44. [PMID: 24001880 DOI: 10.1016/j.jim.2013.08.014] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 08/20/2013] [Accepted: 08/25/2013] [Indexed: 12/21/2022]
Abstract
BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/.
Collapse
Affiliation(s)
- Lars Rønn Olsen
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Using the unfolded state as the reference state improves the performance of statistical potentials. Biophys J 2013. [PMID: 23199923 DOI: 10.1016/j.bpj.2012.09.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Distance-dependent statistical potentials are an important class of energy functions extensively used in modeling protein structures and energetics. These potentials are obtained by statistically analyzing the proximity of atoms in all combinatorial amino-acid pairs in proteins with known structures. In model evaluation, the statistical potential is usually subtracted by the value of a reference state for better selectivity. An ideal reference state should include the general chemical properties of polypeptide chains so that only the unique factors stabilizing the native structures are retained after calibrating on reference state. However, reference states available as of this writing rarely model specific chemical constraints of peptide bonds and therefore poorly reflect the behavior of polypeptide chains. In this work, we proposed a statistical potential based on unfolded state ensemble (SPOUSE), where the reference state is summarized from the unfolded state ensembles of proteins produced according to the statistical coil model. Due to its better representation of the features of polypeptides, SPOUSE outperforms three of the most widely used distance-dependent potentials not only in native conformation identification, but also in the selection of close-to-native models and correlation coefficients between energy and model error. Furthermore, SPOUSE shows promising possibility of further improvement by integration with the orientation-dependent side-chain potentials.
Collapse
|
32
|
Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1520-31. [PMID: 23665455 DOI: 10.1016/j.bbapap.2013.04.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/12/2013] [Accepted: 04/15/2013] [Indexed: 12/15/2022]
Abstract
Specification of the three dimensional structure of a protein from its amino acid sequence, also called a "Grand Challenge" problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD<=5Ǻ) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP-largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp.
Collapse
|
33
|
Potapov V, Edelman M, Sobolev V. Residue-residue contacts: application to analysis of secondary structure interactions. Methods Mol Biol 2013; 932:159-173. [PMID: 22987352 DOI: 10.1007/978-1-62703-065-6_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Protein structures and their complexes are formed and stabilized by interactions, both inside and outside of the protein. Analysis of such interactions helps in understanding different levels of structures (secondary, super-secondary, and oligomeric states). It can also assist molecular biologists in understanding structural consequences of modifying proteins and/or ligands. In this chapter, our definition of atom-atom and residue-residue contacts is described and applied to analysis of protein-protein interactions in dimeric β-sandwich proteins.
Collapse
|
34
|
An analytical method for computing atomic contact areas in biomolecules. J Comput Chem 2012; 34:105-20. [DOI: 10.1002/jcc.23111] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Revised: 08/07/2012] [Indexed: 11/07/2022]
|
35
|
Maia JDC, Urquiza Carvalho GA, Mangueira CP, Santana SR, Cabral LAF, Rocha GB. GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations. J Chem Theory Comput 2012; 8:3072-81. [PMID: 26605718 DOI: 10.1021/ct3004645] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we present some modifications in the semiempirical quantum chemistry MOPAC2009 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 2500 atoms) molecular systems using GPU coprocessors and multithreaded shared-memory CPUs. Our modifications consisted of using a combination of highly optimized linear algebra libraries for both CPU (LAPACK and BLAS from Intel MKL) and GPU (MAGMA and CUBLAS) to hasten time-consuming parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling. We have shown that it is possible to obtain large speedups just by using CPU serial linear algebra libraries in the MOPAC code. As a special case, we show a speedup of up to 14 times for a methanol simulation box containing 2400 atoms and 4800 basis functions, with even greater gains in performance when using multithreaded CPUs (2.1 times in relation to the single-threaded CPU code using linear algebra libraries) and GPUs (3.8 times). This degree of acceleration opens new perspectives for modeling larger structures which appear in inorganic chemistry (such as zeolites and MOFs), biochemistry (such as polysaccharides, small proteins, and DNA fragments), and materials science (such as nanotubes and fullerenes). In addition, we believe that this parallel (GPU-GPU) MOPAC code will make it feasible to use semiempirical methods in lengthy molecular simulations using both hybrid QM/MM and QM/QM potentials.
Collapse
Affiliation(s)
| | - Gabriel Aires Urquiza Carvalho
- Departamento de Química, CCEN, Universidade Federal da Paraíba, Caixa Postal: 5093, CEP: 58051-970, João Pessoa/PB, Brazil
| | | | - Sidney Ramos Santana
- Departamento de Química, CCEN, Universidade Federal da Paraíba, Caixa Postal: 5093, CEP: 58051-970, João Pessoa/PB, Brazil
| | | | - Gerd B Rocha
- Departamento de Química, CCEN, Universidade Federal da Paraíba, Caixa Postal: 5093, CEP: 58051-970, João Pessoa/PB, Brazil
| |
Collapse
|
36
|
Tsai KC, Jian JW, Yang EW, Hsu PC, Peng HP, Chen CT, Chen JB, Chang JY, Hsu WL, Yang AS. Prediction of carbohydrate binding sites on protein surfaces with 3-dimensional probability density distributions of interacting atoms. PLoS One 2012; 7:e40846. [PMID: 22848404 PMCID: PMC3405063 DOI: 10.1371/journal.pone.0040846] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 06/13/2012] [Indexed: 11/22/2022] Open
Abstract
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.
Collapse
Affiliation(s)
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - Po-Chiang Hsu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Sciences, Academia Sinica, Taipei, Taiwan
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
37
|
Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS. Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces. PLoS One 2012; 7:e37706. [PMID: 22701576 PMCID: PMC3368894 DOI: 10.1371/journal.pone.0037706] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 04/23/2012] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.
Collapse
Affiliation(s)
- Ching-Tai Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Ei-Wen Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Jun-Bo Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
- * E-mail: (AY); (WH)
| |
Collapse
|
38
|
Cossio P, Granata D, Laio A, Seno F, Trovato A. A simple and efficient statistical potential for scoring ensembles of protein structures. Sci Rep 2012. [DOI: 10.1038/srep00351] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
39
|
Ramaraj T, Angel T, Dratz EA, Jesaitis AJ, Mumey B. Antigen-antibody interface properties: composition, residue interactions, and features of 53 non-redundant structures. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2012; 1824:520-32. [PMID: 22246133 DOI: 10.1016/j.bbapap.2011.12.007] [Citation(s) in RCA: 115] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2011] [Revised: 12/22/2011] [Accepted: 12/23/2011] [Indexed: 11/17/2022]
Abstract
The structures of protein antigen-antibody (Ag-Ab) interfaces contain information about how Ab recognize Ag as well as how Ag are folded to present surfaces for Ag recognition. As such, the Ab surface holds information about Ag folding that resides with the Ab-Ag interface residues and how they interact. In order to gain insight into the nature of such interactions, a data set comprised of 53 non-redundant 3D structures of Ag-Ab complexes was analyzed. We assessed the physical and biochemical features of the Ag-Ab interfaces and the degree to which favored interactions exist between amino acid residues on the corresponding interface surfaces. Amino acid compositional analysis of the interfaces confirmed the dominance of TYR in the Ab paratope-containing surface (PCS), with almost two fold greater abundance than any other residue. Additionally TYR had a much higher than expected presence in the PCS compared to the surface of the whole antibody (defined as the occurrence propensity), along with aromatics PHE, TRP, and to a lesser degree HIS and ILE. In the Ag epitope-containing surface (ECS), there were slightly increased occurrence propensities of TRP and TYR relative to the whole Ag surface, implying an increased significance over the compositionally most abundant LYS>ASN>GLU>ASP>ARG. This examination encompasses a large, diverse set of unique Ag-Ab crystal structures that help explain the biological range and specificity of Ag-Ab interactions. This analysis may also provide a measure of the significance of individual amino acid residues in phage display analysis of Ag binding.
Collapse
|
40
|
Mirzaie M, Sadeghi M. Distance-dependent atomic knowledge-based force in protein fold recognition. Proteins 2012; 80:683-90. [DOI: 10.1002/prot.24011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 11/15/2011] [Accepted: 12/06/2011] [Indexed: 11/08/2022]
|
41
|
Tian L, Wu A, Cao Y, Dong X, Hu Y, Jiang T. NCACO-score: an effective main-chain dependent scoring function for structure modeling. BMC Bioinformatics 2011; 12:208. [PMID: 21612673 PMCID: PMC3123610 DOI: 10.1186/1471-2105-12-208] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 05/26/2011] [Indexed: 11/10/2022] Open
Abstract
Background Development of effective scoring functions is a critical component to the success of protein structure modeling. Previously, many efforts have been dedicated to the development of scoring functions. Despite these efforts, development of an effective scoring function that can achieve both good accuracy and fast speed still presents a grand challenge. Results Based on a coarse-grained representation of a protein structure by using only four main-chain atoms: N, Cα, C and O, we develop a knowledge-based scoring function, called NCACO-score, that integrates different structural information to rapidly model protein structure from sequence. In testing on the Decoys'R'Us sets, we found that NCACO-score can effectively recognize native conformers from their decoys. Furthermore, we demonstrate that NCACO-score can effectively guide fragment assembly for protein structure prediction, which has achieved a good performance in building the structure models for hard targets from CASP8 in terms of both accuracy and speed. Conclusions Although NCACO-score is developed based on a coarse-grained model, it is able to discriminate native conformers from decoy conformers with high accuracy. NCACO is a very effective scoring function for structure modeling.
Collapse
Affiliation(s)
- Liqing Tian
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | |
Collapse
|
42
|
Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010; 11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open
Abstract
Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA
| | | |
Collapse
|
43
|
Abstract
We developed and tested RAPTOR++ in CASP8 for protein structure prediction. RAPTOR++ contains four modules: threading, model quality assessment, multiple protein alignment, and template-free modeling. RAPTOR++ first threads a target protein to all the templates using three methods and then predicts the quality of the 3D model implied by each alignment using a model quality assessment method. Based upon the predicted quality, RAPTOR++ employs different strategies as follows. If multiple alignments have good quality, RAPTOR++ builds a multiple protein alignment between the target and top templates and then generates a 3D model using MODELLER. If all the alignments have very low quality, RAPTOR++ uses template-free modeling. Otherwise, RAPTOR++ submits a threading-generated 3D model with the best quality. RAPTOR++ was not ready for the first 1/3 targets and was under development during the whole CASP8 season. The template-based and template-free modeling modules in RAPTOR++ are not closely integrated. We are using our template-free modeling technique to refine template-based models.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
44
|
Arab S, Sadeghi M, Eslahchi C, Pezeshk H, Sheari A. A pairwise residue contact area-based mean force potential for discrimination of native protein structure. BMC Bioinformatics 2010; 11:16. [PMID: 20064218 PMCID: PMC2821318 DOI: 10.1186/1471-2105-11-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Accepted: 01/09/2010] [Indexed: 11/21/2022] Open
Abstract
Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield
Collapse
Affiliation(s)
- Shahriar Arab
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | | | | | | |
Collapse
|
45
|
Bahadur RP, Chakrabarti P. Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins. BMC STRUCTURAL BIOLOGY 2009; 9:76. [PMID: 20038291 PMCID: PMC2809062 DOI: 10.1186/1472-6807-9-76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2009] [Accepted: 12/28/2009] [Indexed: 11/14/2022]
Abstract
BACKGROUND Setting the rules for the identification of a stable conformation of a protein is of utmost importance for the efficient generation of structures in computer simulation. For structure prediction, a considerable number of possible models are generated from which the best model has to be selected. RESULTS Two scoring functions, Rs and Rp, based on the consideration of packing of residues, which indicate if the conformation of an amino acid sequence is native-like, are presented. These are defined using the solvent accessible surface area (ASA) and the partner number (PN) (other residues that are within 4.5 A) of a particular residue. The two functions evaluate the deviation from the average packing properties (ASA or PN) of all residues in a polypeptide chain corresponding to a model of its three-dimensional structure. While simple in concept and computationally less intensive, both the functions are at least as efficient as any other energy functions in discriminating the native structure from decoys in a large number of standard decoy sets, as well as on models submitted for the targets of CASP7. Rs appears to be slightly more effective than Rp, as determined by the number of times the native structure possesses the minimum value for the function and its separation from the average value for the decoys. CONCLUSION Two parameters, Rs and Rp, are discussed that can very efficiently recognize the native fold for a sequence from an ensemble of decoy structures. Unlike many other algorithms that rely on the use of composite scoring function, these are based on a single parameter, viz., the accessible surface area (or the number of residues in contact), but still able to capture the essential attribute of the native fold.
Collapse
Affiliation(s)
- Ranjit Prasad Bahadur
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
- Current address: Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, West Bengal, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
46
|
Májek P, Elber R. A coarse-grained potential for fold recognition and molecular dynamics simulations of proteins. Proteins 2009; 76:822-36. [PMID: 19291741 DOI: 10.1002/prot.22388] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A coarse-grained potential for protein simulations and fold ranking is presented. The potential is based on a two-point model of individual amino acids and a specific implementation of hydrogen bonding. Parameters are determined for distance dependent pair interactions, pseudo bonds, angles, and torsions. A scaling factor for a hydrogen bonding term is also determined. Iterative sampling for 4867 proteins reproduces distributions of internal coordinates and distances observed in the Protein Data Bank. The adjustment of the potential and resampling are in the spirit of the generalized ensemble approach. No native structure information (e.g., secondary structure) is used in the calculation of the potential or in the simulation of a particular protein. The potential is subject to two tests as follows: (i) simulations of 956 globular proteins in the neighborhood of their native folds (these proteins were not used in the training set) and (ii) discrimination between native and decoy structures for 2470 proteins with 305,000 decoys and the "Decoys 'R' Us" dataset. In the first test, 58% of tested proteins stay within 5 A from the native fold in Molecular Dynamics simulations of more than 20 nanoseconds using the new potential. The potential is also useful in differentiating between correct and approximate folds providing significant signal for structure prediction algorithms. Sampling with the potential consistently regenerates the distribution of distances and internal coordinates it learned. Nevertheless, during Molecular Dynamics simulations structures are found that reproduce the learned distributions but are far from the native fold.
Collapse
Affiliation(s)
- Peter Májek
- Department of Computer Science, Upson Hall 4130, Cornell University, Ithaca, New York 14853-7501, USA
| | | |
Collapse
|
47
|
Gao X, Xu J, Li SC, Li M. Predicting local quality of a sequence-structure alignment. J Bioinform Comput Biol 2009; 7:789-810. [PMID: 19785046 DOI: 10.1142/s0219720009004345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 04/06/2009] [Accepted: 04/07/2009] [Indexed: 11/18/2022]
Abstract
Although protein structure prediction has made great progress in recent years, a protein model derived from automated prediction methods is subject to various errors. As methods for structure prediction develop, a continuing problem is how to evaluate the quality of a protein model, especially to identify some well-predicted regions of the model, so that the structural biology community can benefit from the automated structure prediction. It is also important to identify badly-predicted regions in a model so that some refinement measurements can be applied to it. We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local fragment quality, and PosQA outperforms two top-notch methods, ProQres and ProQprof. Our results indicate that (1) local quality can be predicted well; (2) local sequence evolutionary information (i.e. sequence similarity) is the major factor in predicting local quality; and (3) structural information such as solvent accessibility and secondary structure helps to improve the prediction performance.
Collapse
Affiliation(s)
- Xin Gao
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | |
Collapse
|
48
|
Ma J. Explicit orientation dependence in empirical potentials and its significance to side-chain modeling. Acc Chem Res 2009; 42:1087-96. [PMID: 19445451 DOI: 10.1021/ar900009e] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein structure modeling and prediction have important applications throughout the biological sciences, from the design of pharmaceuticals to the elucidation of enzyme mechanisms. At the core of most protein modeling is an energy function, the minimum of which represents the free energy "cost" for forming a correct protein structure. The most commonly used energy functions are knowledge-based statistical potential functions; that is, they are empirically derived from statistical analysis of a set of high-resolution protein structures. When that kind of potential function is constructed, the anisotropic orientation dependence between the interacting groups is a critical component for accurately representing key molecular interactions, such as those involved in protein side-chain packing. In the literature, however, many potential functions are limited in their ability to describe orientation dependence. In all-atom potentials, they typically ignore heterogeneous chemical-bond connectivity. In coarse-grained potentials, such as (semi)-residue-based potentials, the simplified representation of residues often reduces the sensitivity of the potential to side-chain orientation. Recently, in an effort to maximally capture the orientation dependence in side-chain interactions, a new type of all-atom statistical potential was developed: OPUS-PSP (potential derived from side-chain packing). The key feature of this potential is its explicit description of orientation dependence in molecular interactions, which is achieved with a basis set of 19 rigid-body blocks extracted from the chemical structures of 20 amino acid residues. This basis set is specifically designed to maximally capture the essential elements of orientation dependence in molecular packing interactions. The potential is constructed from the orientation-specific packing statistics of pairs of those blocks in a nonredundant structural database. On decoy set tests, OPUS-PSP significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and its consistency in achieving high Z scores across decoy sets. The application of OPUS-PSP to conformational modeling of side chains has led to another method, called OPUS-Rota. In terms of combined speed and accuracy, OPUS-Rota outperforms all of the other methods in modeling side-chain conformation. In this Account, we briefly outline the basic scheme of the OPUS-PSP potential and its application to side-chain modeling via OPUS-Rota. Future perspectives on the modeling of orientation dependence are also discussed. The computer programs for OPUS-PSP and OPUS-Rota can be downloaded at http://sigler.bioch.bcm.tmc.edu/MaLab . They are free for academic users.
Collapse
Affiliation(s)
- Jianpeng Ma
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, and Department of Bioengineering, Rice University, Houston, Texas 77005
| |
Collapse
|
49
|
Maupetit J, Tuffery P, Derreumaux P. A coarse-grained protein force field for folding and structure prediction. Proteins 2009; 69:394-408. [PMID: 17600832 DOI: 10.1002/prot.21505] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.
Collapse
Affiliation(s)
- Julien Maupetit
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0346, Université Paris 7, Tour 53-54, 2 place Jussieu, 75251 Paris, Cedex 05, France
| | | | | |
Collapse
|
50
|
Sun J, Wu D, Xu T, Wang X, Xu X, Tao L, Li YX, Cao ZW. SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res 2009; 37:W612-6. [PMID: 19465377 PMCID: PMC2703964 DOI: 10.1093/nar/gkp417] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In recent years, a lot of efforts have been made in conformational epitope prediction as antigen proteins usually bind antibodies with an assembly of sequentially discontinuous and structurally compact surface residues. Currently, only a few methods for spatial epitope prediction are available with focus on single residue propensity scales or continual segments clustering. In the method of SEPPA, a concept of ‘unit patch of residue triangle’ was introduced to better describe the local spatial context in protein surface. Besides that, SEPPA incorporated clustering coefficient to describe the spatial compactness of surface residues. Validated by independent testing datasets, SEPPA gave an average AUC value over 0.742 and produced a successful pick-up rate of 96.64%. Comparing with peers, SEPPA shows significant improvement over other popular methods like CEP, DiscoTope and BEpro. In addition, the threshold scores for certain accuracy, sensitivity and specificity are provided online to give the confidence level of the spatial epitope identification. The web server can be accessed at http://lifecenter.sgst.cn/seppa/index.php. Batch query is supported.
Collapse
Affiliation(s)
- Jing Sun
- Department of Biomedical Engineering, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | | | | | | | | | | | | | | |
Collapse
|