1
|
Kaushik R, Zhang KYJ. ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures. Bioinformatics 2022; 38:369-376. [PMID: 34542606 DOI: 10.1093/bioinformatics/btab666] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/06/2021] [Accepted: 09/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. RESULTS The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. AVAILABILITY AND IMPLEMENTATION http://github.com/KYZ-LSB/ProTerS-FitFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rahul Kaushik
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
2
|
Kukimoto-Niino M, Katsura K, Kaushik R, Ehara H, Yokoyama T, Uchikubo-Kamo T, Nakagawa R, Mishima-Tsumagari C, Yonemochi M, Ikeda M, Hanada K, Zhang KYJ, Shirouzu M. Cryo-EM structure of the human ELMO1-DOCK5-Rac1 complex. SCIENCE ADVANCES 2021; 7:7/30/eabg3147. [PMID: 34290093 PMCID: PMC8294757 DOI: 10.1126/sciadv.abg3147] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 06/03/2021] [Indexed: 05/28/2023]
Abstract
The dedicator of cytokinesis (DOCK) family of guanine nucleotide exchange factors (GEFs) promotes cell motility, phagocytosis, and cancer metastasis through activation of Rho guanosine triphosphatases. Engulfment and cell motility (ELMO) proteins are binding partners of DOCK and regulate Rac activation. Here, we report the cryo-electron microscopy structure of the active ELMO1-DOCK5 complex bound to Rac1 at 3.8-Å resolution. The C-terminal region of ELMO1, including the pleckstrin homology (PH) domain, aids in the binding of the catalytic DOCK homology region 2 (DHR-2) domain of DOCK5 to Rac1 in its nucleotide-free state. A complex α-helical scaffold between ELMO1 and DOCK5 stabilizes the binding of Rac1. Mutagenesis studies revealed that the PH domain of ELMO1 enhances the GEF activity of DOCK5 through specific interactions with Rac1. The structure provides insights into how ELMO modulates the biochemical activity of DOCK and how Rac selectivity is achieved by ELMO.
Collapse
Affiliation(s)
- Mutsuko Kukimoto-Niino
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kazushige Katsura
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Rahul Kaushik
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Haruhiko Ehara
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Takeshi Yokoyama
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate School of Life Sciences, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai, Miyagi 980-8577, Japan
| | - Tomomi Uchikubo-Kamo
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Reiko Nakagawa
- RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Chiemi Mishima-Tsumagari
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Mayumi Yonemochi
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Mariko Ikeda
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kazuharu Hanada
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Mikako Shirouzu
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
| |
Collapse
|
3
|
Zhao KL, Liu J, Zhou XG, Su JZ, Zhang Y, Zhang GJ. MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction. Bioinformatics 2021; 37:4350-4356. [PMID: 34185079 DOI: 10.1093/bioinformatics/btab484] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 06/22/2021] [Accepted: 06/28/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages. In the first modal exploration stage, a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. In the second modal maintaining stage, an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on 320 non-redundant proteins, where MMpred obtains models with TM-score ≥ 0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same distance constraints. In addition, on 320 benchmark proteins, the average TM-score of the enhanced version of MMpred (E-MMpred) is 0.732 on the best model, which is comparable to trRosetta (0.730). AVAILABILITY The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Jian-Zhong Su
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325011, Zhejiang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
4
|
Tadepalli S, Akhter N, Barbara D, Shehu A. Anomaly Detection-Based Recognition of Near-Native Protein Structures. IEEE Trans Nanobioscience 2020; 19:562-570. [PMID: 32340957 DOI: 10.1109/tnb.2020.2990642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The three-dimensional structures populated by a protein molecule determine to a great extent its biological activities. The rich information encoded by protein structure on protein function continues to motivate the development of computational approaches for determining functionally-relevant structures. The majority of structures generated in silico are not relevant. Discriminating relevant/native protein structures from non-native ones is an outstanding challenge in computational structural biology. Inherently, this is a recognition problem that can be addressed under the umbrella of machine learning. In this paper, based on the premise that near-native structures are effectively anomalies, we build on the concept of anomaly detection in machine learning. We propose methods that automatically select relevant subsets, as well as methods that select a single structure to offer as prediction. Evaluations are carried out on benchmark datasets and demonstrate that the proposed methods advance the state of the art. The presented results motivate further building on and adapting concepts and techniques from machine learning to improve recognition of near-native structures in protein structure prediction.
Collapse
|
5
|
Wu H, Huang H, Lu W, Fu Q, Ding Y, Qiu J, Li H. Ranking near-native candidate protein structures via random forest classification. BMC Bioinformatics 2019; 20:683. [PMID: 31874596 PMCID: PMC6929337 DOI: 10.1186/s12859-019-3257-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Background In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. Results To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. Conclusions In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Hongmei Huang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Jing Qiu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
6
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
7
|
An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction. COMPUTATION 2018. [DOI: 10.3390/computation6020039] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction. Molecules 2018; 23:molecules23010216. [PMID: 29351266 PMCID: PMC6017496 DOI: 10.3390/molecules23010216] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 12/11/2017] [Indexed: 11/17/2022] Open
Abstract
Due to the essential role that the three-dimensional conformation of a protein plays in regulating interactions with molecular partners, wet and dry laboratories seek biologically-active conformations of a protein to decode its function. Computational approaches are gaining prominence due to the labor and cost demands of wet laboratory investigations. Template-free methods can now compute thousands of conformations known as decoys, but selecting native conformations from the generated decoys remains challenging. Repeatedly, research has shown that the protein energy functions whose minima are sought in the generation of decoys are unreliable indicators of nativeness. The prevalent approach ignores energy altogether and clusters decoys by conformational similarity. Complementary recent efforts design protein-specific scoring functions or train machine learning models on labeled decoys. In this paper, we show that an informative consideration of energy can be carried out under the energy landscape view. Specifically, we leverage local structures known as basins in the energy landscape probed by a template-free method. We propose and compare various strategies of basin-based decoy selection that we demonstrate are superior to clustering-based strategies. The presented results point to further directions of research for improving decoy selection, including the ability to properly consider the multiplicity of native conformations of proteins.
Collapse
|
9
|
Simoncini D, Schiex T, Zhang KYJ. Balancing exploration and exploitation in population-based sampling improves fragment-based de novo protein structure prediction. Proteins 2017; 85:852-858. [PMID: 28066917 DOI: 10.1002/prot.25244] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 11/29/2016] [Accepted: 12/18/2016] [Indexed: 01/17/2023]
Abstract
Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRosec ) and an energy-based one (EdaRoseen ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C αRMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Simoncini
- INRA MIAT, UR 875, Castanet-Tolosan Cedex, 31326, France.,Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| | - Thomas Schiex
- INRA MIAT, UR 875, Castanet-Tolosan Cedex, 31326, France
| | - Kam Y J Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|
10
|
Simoncini D, Nakata H, Ogata K, Nakamura S, Zhang KY. Quality Assessment of Predicted Protein Models Using Energies Calculated by the Fragment Molecular Orbital Method. Mol Inform 2015; 34:97-104. [PMID: 27490032 DOI: 10.1002/minf.201400108] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 10/13/2014] [Indexed: 12/12/2022]
Abstract
Protein structure prediction directly from sequences is a very challenging problem in computational biology. One of the most successful approaches employs stochastic conformational sampling to search an empirically derived energy function landscape for the global energy minimum state. Due to the errors in the empirically derived energy function, the lowest energy conformation may not be the best model. We have evaluated the use of energy calculated by the fragment molecular orbital method (FMO energy) to assess the quality of predicted models and its ability to identify the best model among an ensemble of predicted models. The fragment molecular orbital method implemented in GAMESS was used to calculate the FMO energy of predicted models. When tested on eight protein targets, we found that the model ranking based on FMO energies is better than that based on empirically derived energies when there is sufficient diversity among these models. This model diversity can be estimated prior to the FMO energy calculations. Our result demonstrates that the FMO energy calculated by the fragment molecular orbital method is a practical and promising measure for the assessment of protein model quality and the selection of the best protein model among many generated.
Collapse
Affiliation(s)
- David Simoncini
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa 230-0045, Japan phone: +81(0)45-503-9560/fax: +81(0)45-503-9559.,Present address: Mathématiques et Informatique Appliquées de Toulouse, Unité de Recherche 875, Institut National de la Recherche Agronomique, F-31320 Castanet-Tolosan, France
| | - Hiroya Nakata
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503.,Department of Biomolecular Engineering, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan.,Japan Society for the Promotion of Science, Kojimachi Business Center Building, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Koji Ogata
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503
| | - Shinichiro Nakamura
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503.
| | - Kam Yj Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa 230-0045, Japan phone: +81(0)45-503-9560/fax: +81(0)45-503-9559.
| |
Collapse
|
11
|
Sun HP, Huang Y, Wang XF, Zhang Y, Shen HB. Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins 2015; 83:485-96. [PMID: 25524593 DOI: 10.1002/prot.24744] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 11/20/2014] [Accepted: 12/02/2014] [Indexed: 12/28/2022]
Abstract
Residue contact map is essential for protein three-dimensional structure determination. But most of the current contact prediction methods based on residue co-evolution suffer from high false-positives as introduced by indirect and transitive contacts (i.e., residues A-B and B-C are in contact, but A-C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726-733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co-evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium- and long-range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P-value < 5.93 × 10(-3) in the Student's t-test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co-evolution-based contact prediction to the three-dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/.
Collapse
Affiliation(s)
- Hai-Ping Sun
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | | | | | | | | |
Collapse
|
12
|
Shrestha R, Zhang KYJ. A fragmentation and reassembly method for ab initio phasing. ACTA ACUST UNITED AC 2015; 71:304-12. [PMID: 25664740 DOI: 10.1107/s1399004714025449] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 11/20/2014] [Indexed: 11/10/2022]
Abstract
Ab initio phasing with de novo models has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predicts de novo models and uses them for structure determination by molecular replacement. However, even the current state-of-the-art de novo modelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracy de novo models, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predicted de novo models cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of the ab initio phasing by de novo models approach. The method can be used to solve structures when the best de novo models are still of low accuracy.
Collapse
Affiliation(s)
- Rojan Shrestha
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
13
|
Shrestha R, Zhang KYJ. Improving fragment quality for de novo structure prediction. Proteins 2014; 82:2240-52. [PMID: 24753351 DOI: 10.1002/prot.24587] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 04/03/2014] [Accepted: 04/15/2014] [Indexed: 11/08/2022]
Abstract
De novo structure prediction can be defined as a search in conformational space under the guidance of an energy function. The most successful de novo structure prediction methods, such as Rosetta, assemble the fragments from known structures to reduce the search space. Therefore, the fragment quality is an important factor in structure prediction. In our study, a method is proposed to generate a new set of fragments from the lowest energy de novo models. These fragments were subsequently used to predict the next-round of models. In a benchmark of 30 proteins, the new set of fragments showed better performance when used to predict de novo structures. The lowest energy model predicted using our method was closer to native structure than Rosetta for 22 proteins. Following a similar trend, the best model among top five lowest energy models predicted using our method was closer to native structure than Rosetta for 20 proteins. In addition, our experiment showed that the C-alpha root mean square deviation was improved from 5.99 to 5.03 Å on average compared to Rosetta when the lowest energy models were picked as the best predicted models.
Collapse
Affiliation(s)
- Rojan Shrestha
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan; Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-0882, Japan
| | | |
Collapse
|
14
|
Simoncini D, Zhang KYJ. Efficient sampling in fragment-based protein structure prediction using an estimation of distribution algorithm. PLoS One 2013; 8:e68954. [PMID: 23935913 PMCID: PMC3723781 DOI: 10.1371/journal.pone.0068954] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 06/07/2013] [Indexed: 11/19/2022] Open
Abstract
Fragment assembly is a powerful method of protein structure prediction that builds protein models from a pool of candidate fragments taken from known structures. Stochastic sampling is subsequently used to refine the models. The structures are first represented as coarse-grained models and then as all-atom models for computational efficiency. Many models have to be generated independently due to the stochastic nature of the sampling methods used to search for the global minimum in a complex energy landscape. In this paper we present EdaFold(AA), a fragment-based approach which shares information between the generated models and steers the search towards native-like regions. A distribution over fragments is estimated from a pool of low energy all-atom models. This iteratively-refined distribution is used to guide the selection of fragments during the building of models for subsequent rounds of structure prediction. The use of an estimation of distribution algorithm enabled EdaFold(AA) to reach lower energy levels and to generate a higher percentage of near-native models. [Formula: see text] uses an all-atom energy function and produces models with atomic resolution. We observed an improvement in energy-driven blind selection of models on a benchmark of EdaFold(AA) in comparison with the [Formula: see text] AbInitioRelax protocol.
Collapse
Affiliation(s)
- David Simoncini
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, Wako, Saitama, Japan
| | - Kam Y. J. Zhang
- Zhang Initiative Research Unit, Institute Laboratories, RIKEN, Wako, Saitama, Japan
- * E-mail:
| |
Collapse
|
15
|
Zhou J, Wishart DS. An improved method to detect correct protein folds using partial clustering. BMC Bioinformatics 2013; 14:11. [PMID: 23323835 PMCID: PMC3626854 DOI: 10.1186/1471-2105-14-11] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2012] [Accepted: 12/13/2012] [Indexed: 11/23/2022] Open
Abstract
Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
Collapse
Affiliation(s)
- Jianjun Zhou
- JHK Co., Ltd., 2049 Heping Road, Shenzhen, Guangdong 518010, China
| | | |
Collapse
|
16
|
Abstract
De novo protein structure prediction often generates a large population of candidates (models), and then selects near-native models through clustering. Existing structural model clustering methods are time consuming due to pairwise distance calculation between models. In this paper, we present a novel method for fast model clustering without losing the clustering accuracy. Instead of the commonly used pairwise root mean square deviation and TM-score values, we propose two new distance measures, Dscore1 and Dscore2, based on the comparison of the protein distance matrices for describing the difference and the similarity among models, respectively. The analysis indicates that both the correlation between Dscore1 and root mean square deviation and the correlation between Dscore2 and TM-score are high. Compared to the existing methods with calculation time quadratic to the number of models, our Dscore1-based clustering achieves a linearly time complexity while obtaining almost the same accuracy for near-native model selection. By using Dscore2 to select representatives of clusters, we can further improve the quality of the representatives with little increase in computing time. In addition, for large size (~500 k) models, we can give a fast data visualization based on the Dscore distribution in seconds to minutes. Our method has been implemented in a package named MUFOLD-CL, available at http://mufold.org/clustering.php.
Collapse
Affiliation(s)
- Jingfen Zhang
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65201, USA
| | | |
Collapse
|
17
|
Shrestha R, Simoncini D, Zhang KYJ. Error-estimation-guided rebuilding ofde novomodels increases the success rate ofab initiophasing. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2012; 68:1522-34. [DOI: 10.1107/s0907444912037961] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2012] [Accepted: 09/04/2012] [Indexed: 11/10/2022]
Abstract
Recent advancements in computational methods for protein-structure prediction have made it possible to generate the high-qualityde novomodels required forab initiophasing of crystallographic diffraction data using molecular replacement. Despite those encouraging achievements inab initiophasing usingde novomodels, its success is limited only to those targets for which high-qualityde novomodels can be generated. In order to increase the scope of targets to whichab initiophasing withde novomodels can be successfully applied, it is necessary to reduce the errors in thede novomodels that are used as templates for molecular replacement. Here, an approach is introduced that can identify and rebuild the residues with larger errors, which subsequently reduces the overall Cαroot-mean-square deviation (CA-RMSD) from the native protein structure. The error in a predicted model is estimated from the average pairwise geometric distance per residue computed among selected lowest energy coarse-grained models. This score is subsequently employed to guide a rebuilding process that focuses on more error-prone residues in the coarse-grained models. This rebuilding methodology has been tested on ten protein targets that were unsuccessful using previous methods. The average CA-RMSD of the coarse-grained models was improved from 4.93 to 4.06 Å. For those models with CA-RMSD less than 3.0 Å, the average CA-RMSD was improved from 3.38 to 2.60 Å. These rebuilt coarse-grained models were then converted into all-atom models and refined to produce improvedde novomodels for molecular replacement. Seven diffraction data sets were successfully phased using rebuiltde novomodels, indicating the improved quality of these rebuiltde novomodels and the effectiveness of the rebuilding process. Software implementing this method, calledMORPHEUS, can be downloaded from http://www.riken.jp/zhangiru/software.html.
Collapse
|
18
|
Simoncini D, Berenger F, Shrestha R, Zhang KYJ. A probabilistic fragment-based protein structure prediction algorithm. PLoS One 2012; 7:e38799. [PMID: 22829868 PMCID: PMC3400640 DOI: 10.1371/journal.pone.0038799] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 05/10/2012] [Indexed: 11/23/2022] Open
Abstract
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold’s decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software/.
Collapse
Affiliation(s)
- David Simoncini
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, Wako, Saitama, Japan
| | - Francois Berenger
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, Wako, Saitama, Japan
| | - Rojan Shrestha
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, Wako, Saitama, Japan
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Kam Y. J. Zhang
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, Wako, Saitama, Japan
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- * E-mail:
| |
Collapse
|
19
|
Harder T, Borg M, Boomsma W, Røgen P, Hamelryck T. Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 2011; 28:510-5. [DOI: 10.1093/bioinformatics/btr692] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
20
|
Berenger F, Shrestha R, Zhou Y, Simoncini D, Zhang KYJ. Durandal: fast exact clustering of protein decoys. J Comput Chem 2011; 33:471-4. [PMID: 22120171 DOI: 10.1002/jcc.21988] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 09/16/2011] [Accepted: 10/11/2011] [Indexed: 11/11/2022]
Abstract
In protein folding, clustering is commonly used as one way to identify the best decoy produced. Initializing the pairwise distance matrix for a large decoy set is computationally expensive. We have proposed a fast method that works even on large decoy sets. This method is implemented in a software called Durandal. Durandal has been shown to be consistently faster than other software performing fast exact clustering. In some cases, Durandal can even outperform the speed of an approximate method. Durandal uses the triangular inequality to accelerate exact clustering, without compromising the distance function. Recently, we have further enhanced the performance of Durandal by incorporating a Quaternion-based characteristic polynomial method that has increased the speed of Durandal between 13% and 27% compared with the previous version. Durandal source code is available under the GNU General Public License at http://www.riken.jp/zhangiru/software/durandal_released_qcp.tgz. Alternatively, a compiled version of Durandal is also distributed with the nightly builds of the Phenix (http://www.phenix-online.org/) crystallographic software suite (Adams et al., Acta Crystallogr Sect D 2010, 66, 213).
Collapse
Affiliation(s)
- Francois Berenger
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | | | | | | | | |
Collapse
|