1
|
Heinzinger M, Rost B. Artificial Intelligence Learns Protein Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041458. [PMID: 38858069 PMCID: PMC11368192 DOI: 10.1101/cshperspect.a041458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
From AlphaGO over StableDiffusion to ChatGPT, the recent decade of exponential advances in artificial intelligence (AI) has been altering life. In parallel, advances in computational biology are beginning to decode the language of life: AlphaFold2 leaped forward in protein structure prediction, and protein language models (pLMs) replaced expertise and evolutionary information from multiple sequence alignments with information learned from reoccurring patterns in databases of billions of proteins without experimental annotations other than the amino acid sequences. None of those tools could have been developed 10 years ago; all will increase the wealth of experimental data and speed up the cycle from idea to proof. AI is affecting molecular and medical biology at giant steps, and the most important might be the leap toward more powerful protein design.
Collapse
Affiliation(s)
- Michael Heinzinger
- Technical University of Munich (TUM) School of School of Computation, Information and Technology (CIT), Bioinformatics and Computational Biology - i12, 85748 Garching/Munich, Germany
| | - Burkhard Rost
- Technical University of Munich (TUM) School of School of Computation, Information and Technology (CIT), Bioinformatics and Computational Biology - i12, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), 85748 Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), 85354 Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| |
Collapse
|
2
|
Zhai J, Wang W, Zhao R, Sun D, Lu D, Gong X. BDM: An Assessment Metric for Protein Complex Structure Models Based on Distance Difference Matrix. Interdiscip Sci 2024; 16:677-687. [PMID: 38536590 DOI: 10.1007/s12539-024-00622-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 02/07/2024] [Accepted: 02/17/2024] [Indexed: 09/19/2024]
Abstract
Protein complex structure prediction is an important problem in computational biology. While significant progress has been made for protein monomers, accurate evaluation of protein complexes remains challenging. Existing assessment methods in CASP, lack dedicated metrics for evaluating complexes. DockQ, a widely used metric, has some limitations. In this study, we propose a novel metric called BDM (Based on Distance difference Matrix) for assessing protein complex prediction structures. Our approach utilizes a distance difference matrix derived from comparing real and predicted protein structures, establishing a linear correlation with Root Mean Square Deviation (RMSD). BDM overcomes limitations associated with receptor-ligand differentiation and eliminates the requirement for structure alignment, making it a more effective and efficient metric. Evaluation of BDM using CASP14 and CASP15 test sets demonstrates superior performance compared to the official CASP scoring. BDM provides accurate and reasonable assessments of predicted protein complexes, wide adoption of BDM has the potential to advance protein complex structure prediction and facilitate related researches across scientific domains. Code is available at http://mialab.ruc.edu.cn/BDMServer/ .
Collapse
Affiliation(s)
- Jiaqi Zhai
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Ranxi Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Daiwen Sun
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Da Lu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
| |
Collapse
|
3
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
4
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
5
|
Structure-Aware Mycobacterium tuberculosis Functional Annotation Uncloaks Resistance, Metabolic, and Virulence Genes. mSystems 2021; 6:e0067321. [PMID: 34726489 PMCID: PMC8562490 DOI: 10.1128/msystems.00673-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Accurate and timely functional genome annotation is essential for translating basic pathogen research into clinically impactful advances. Here, through literature curation and structure-function inference, we systematically update the functional genome annotation of Mycobacterium tuberculosis virulent type strain H37Rv. First, we systematically curated annotations for 589 genes from 662 publications, including 282 gene products absent from leading databases. Second, we modeled 1,711 underannotated proteins and developed a semiautomated pipeline that captured shared function between 400 protein models and structural matches of known function on Protein Data Bank, including drug efflux proteins, metabolic enzymes, and virulence factors. In aggregate, these structure- and literature-derived annotations update 940/1,725 underannotated H37Rv genes and generate hundreds of functional hypotheses. Retrospectively applying the annotation to a recent whole-genome transposon mutant screen provided missing function for 48% (13/27) of underannotated genes altering antibiotic efficacy and 33% (23/69) required for persistence during mouse tuberculosis (TB) infection. Prospective application of the protein models enabled us to functionally interpret novel laboratory generated pyrazinamide (PZA)-resistant mutants of unknown function, which implicated the emerging coenzyme A depletion model of PZA action in the mutants’ PZA resistance. Our findings demonstrate the functional insight gained by integrating structural modeling and systematic literature curation, even for widely studied microorganisms. Functional annotations and protein structure models are available at https://tuberculosis.sdsu.edu/H37Rv in human- and machine-readable formats. IMPORTANCEMycobacterium tuberculosis, the primary causative agent of tuberculosis, kills more humans than any other infectious bacterium. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered. Irregular literature curation for functional annotation contributes to this gap. We systematically curated functions from literature and structural similarity for over half of poorly characterized genes, expanding the functionally annotated Mycobacterium tuberculosis proteome. Applying this updated annotation to recent in vivo functional screens added functional information to dozens of clinically pertinent proteins described as having unknown function. Integrating the annotations with a prospective functional screen identified new mutants resistant to a first-line TB drug, supporting an emerging hypothesis for its mode of action. These improvements in functional interpretation of clinically informative studies underscore the translational value of this functional knowledge. Structure-derived annotations identify hundreds of high-confidence candidates for mechanisms of antibiotic resistance, virulence factors, and basic metabolism and other functions key in clinical and basic tuberculosis research. More broadly, they provide a systematic framework for improving prokaryotic reference annotations.
Collapse
|
6
|
Jiang H, Fan X. The Two-Step Clustering Approach for Metastable States Learning. Int J Mol Sci 2021; 22:6576. [PMID: 34205252 PMCID: PMC8233889 DOI: 10.3390/ijms22126576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/14/2021] [Accepted: 06/14/2021] [Indexed: 01/20/2023] Open
Abstract
Understanding the energy landscape and the conformational dynamics is crucial for studying many biological or chemical processes, such as protein-protein interaction and RNA folding. Molecular Dynamics (MD) simulations have been a major source of dynamic structure. Although many methods were proposed for learning metastable states from MD data, some key problems are still in need of further investigation. Here, we give a brief review on recent progresses in this field, with an emphasis on some popular methods belonging to a two-step clustering framework, and hope to draw more researchers to contribute to this area.
Collapse
Affiliation(s)
- Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou 310058, China;
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
8
|
Takei Y, Ishida T. P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. Bioengineering (Basel) 2021; 8:bioengineering8030040. [PMID: 33808604 PMCID: PMC8003382 DOI: 10.3390/bioengineering8030040] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 11/16/2022] Open
Abstract
Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.
Collapse
Affiliation(s)
- Yuma Takei
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Correspondence:
| |
Collapse
|
9
|
Santhoshkumar R, Yusuf A. In silico structural modeling and analysis of physicochemical properties of curcumin synthase (CURS1, CURS2, and CURS3) proteins of Curcuma longa. J Genet Eng Biotechnol 2020; 18:24. [PMID: 32617758 PMCID: PMC7332660 DOI: 10.1186/s43141-020-00041-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 06/05/2020] [Indexed: 12/15/2022]
Abstract
Background Pharmaceutically important curcuminoid synthesis in C. longa is controlled by CURS1, CURS2, and CURS3 genes. The present study detected the physicochemical properties and structural characteristics including the secondary and 3D structure of CURS proteins. The primary, secondary, and tertiary structure of the CURS proteins were modeled and characterized using multiple bioinformatics tools such as ExPasy ProtParam tools, self-optimized prediction method with alignment (SOPMA), PSIPRED, and SWISS-MODEL. The predicted secondary structure of curcumin synthase provided an α-helix and random coil as the major components. The reliability of the modeled structure was confirmed using PROCHECK and QMEAN programs. Results The molecular weight of CURS1 is 21093.19 Da, theoretical pI as 4.93, and an aliphatic index of 99.19. Molecular weight of CURS2 and CURS3 proteins are 20266.13 Da and 20629.52 Da, theoretical pI as 5.28 and 4.96, and an aliphatic index of 89.30 and 86.37, respectively. In the predicted secondary structure of CURS proteins, alpha helices and random coils of CURS1, CUR2, and CURS3 were 42.72, 41.38, and 44.74% and 24.87, 31.03, and 17.89, respectively. The extended strands were 16.24, 19.40, and 17.89. QMEAN Z-score is − 0.83, − 0.89, and − 1.09 for CURS1, CURS2, and CURS3, respectively. Conclusion Prediction of the 3D model of a protein by in silico analysis is a highly challenging aspect to confirm the NMR or X-ray crystallographic data. This report can contribute to the understanding of the structure, physicochemical properties, structural motifs, and protein-protein interaction of CURS1, CUR2, and CURS3.
Collapse
Affiliation(s)
- R Santhoshkumar
- Interuniversity Centre for Plant Biotechnology, Department of Botany, University of Calicut, Malappuram, Kerala, 673635, India
| | - A Yusuf
- Interuniversity Centre for Plant Biotechnology, Department of Botany, University of Calicut, Malappuram, Kerala, 673635, India.
| |
Collapse
|
10
|
Al Nasr K, Al-Haija QA. Forecasting the Growth of Structures from NMR and X-Ray Crystallography Experiments Released Per Year. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1142/s0219649220400043] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we present a forecasting scheme for the growth of molecular structures from NMR and X-ray Crystallography experimental techniques released every year by employing an autoregressive (AR) process. The proposed scheme maximises the forecasting accuracy by utilising the optimal AR process order. The optimal model order was derived as the model with the least prediction error. Therefore, the proposed scheme has been efficiently employed to model and predict the annual growth of structures-based NMR and X-ray Crystallography experimental data for the next decade 2019–2028 using the time series of the past 43 years of both experimental datasets. The experimental results showed that the optimal model order to estimate both datasets was [Formula: see text] which belongs to a forecasting accuracy of [Formula: see text], for both datasets. Indeed, such a high level of accuracy referred to the amount of linearity between the consecutive elements of the original times series. Hence, the forecasting results reveals of an exponential increasing behaviour in the future growth in the annual structures released from both NMR and X-ray Crystallography experiments.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN, USA
- University of Texas, San Antonio, TX, USA
| | - Qasem Abu Al-Haija
- Department of Computer and Information, Systems Engineering (CISE), Tennessee State University, Nashville, TN, USA
| |
Collapse
|
11
|
Self-organized emergence of folded protein-like network structures from geometric constraints. PLoS One 2020; 15:e0229230. [PMID: 32106258 PMCID: PMC7046222 DOI: 10.1371/journal.pone.0229230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 01/31/2020] [Indexed: 12/13/2022] Open
Abstract
The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the biochemical environment and the temporal sequence of distinct interactions yield a complex folding process that cannot yet be easily tracked for all proteins. To gain qualitative insights into the fundamental mechanisms behind the folding dynamics and generic features of the folded structure, we propose a simple model of structure formation that takes into account only fundamental geometric constraints and otherwise assumes randomly paired connections. We find that despite its simplicity, the model results in a network ensemble consistent with key overall features of the ensemble of Protein Residue Networks we obtained from more than 1000 biological protein geometries as available through the Protein Data Base. Specifically, the distribution of the number of interaction neighbors a unit (amino acid) has, the scaling of the structure’s spatial extent with chain length, the eigenvalue spectrum and the scaling of the smallest relaxation time with chain length are all consistent between model and real proteins. These results indicate that geometric constraints alone may already account for a number of generic features of protein tertiary structures.
Collapse
|
12
|
Zheng W, Zhang C, Bell EW, Zhang Y. I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. FUTURE GENERATIONS COMPUTER SYSTEMS : FGCS 2019; 99:73-85. [PMID: 31427836 PMCID: PMC6699767 DOI: 10.1016/j.future.2019.04.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments. Functional insights are then derived from comparative matching of the predicted model with a library of proteins with known function. The I-TASSER pipeline has been recently integrated with the XSEDE Gateway system to accommodate pressing demand from the user community and increasing computing costs. This report summarizes the configuration of the I-TASSER Gateway with the XSEDE-Comet supercomputer cluster, together with an overview of the I-TASSER method and milestones of its development.
Collapse
|
13
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
14
|
Robertson JC, Perez A, Dill KA. MELD × MD Folds Nonthreadables, Giving Native Structures and Populations. J Chem Theory Comput 2018; 14:6734-6740. [PMID: 30407805 DOI: 10.1021/acs.jctc.8b00886] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A current challenge is to compute the native structures of proteins from their amino acid sequences. A main approach of bioinformatics is threading, in which a protein to be predicted is computationally threaded onto protein fragments of similar sequence having an already known structure. However, ∼15% of proteins cannot be folded in this way; this has been called the glass ceiling, and the proteins are called nonthreadables. For these, physical molecular dynamics (MD) modeling is promising because it does not require templates. We find that MD, when used with an accelerator called MELD, can fold many nonthreadables. For 41 nonthreadable proteins with fewer than 125 residues, MELD-accelerated MD (MELD × MD) folds 20 of them to better than 4 Å error. In 10 cases, MELD × MD succeeds even when the force field does not properly encode the native state. In 11 cases, MELD × MD foretells its own success; seeing large Boltzmann populations in the simulations predicts it has converged to the correct native state. MELD × MD acceleration can be applied to a broad physical protein modeling range.
Collapse
Affiliation(s)
- James C Robertson
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Chemistry , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Physics and Astronomy , Stony Brook University , Stony Brook , New York 11794 , United States
| |
Collapse
|
15
|
A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models. Symmetry (Basel) 2018. [DOI: 10.3390/sym10010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
16
|
Abid H, Harigua-Souiai E, Mejri T, Barhoumi M, Guizani I. Leishmania infantum 5'-Methylthioadenosine Phosphorylase presents relevant structural divergence to constitute a potential drug target. BMC STRUCTURAL BIOLOGY 2017; 17:9. [PMID: 29258562 PMCID: PMC5738077 DOI: 10.1186/s12900-017-0079-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/21/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND The 5'-methylthioadenosine phosphorylase (MTAP), an enzyme involved in purine and polyamine metabolism and in the methionine salvage pathway, is considered as a potential drug target against cancer and trypanosomiasis. In fact, Trypanosoma and Leishmania parasites lack de novo purine pathways and rely on purine salvage pathways to meet their requirements. Herein, we propose the first comprehensive bioinformatic and structural characterization of the putative Leishmania infantum MTAP (LiMTAP), using a comparative computational approach. RESULTS Sequence analysis showed that LiMTAP shared higher identity rates with the Trypanosoma brucei (TbMTAP) and the human (huMTAP) homologs as compared to the human purine nucleoside phosphorylase (huPNP). Motifs search using MEME identified more common patterns and higher relatedness of the parasite proteins to the huMTAP than to the huPNP. The 3D structures of LiMTAP and TbMTAP were predicted by homology modeling and compared to the crystal structure of the huMTAP. These models presented conserved secondary structures compared to the huMTAP, with a similar topology corresponding to the Rossmann fold. This confirmed that both LiMTAP and TbMTAP are members of the NP-I family. In comparison to the huMTAP, the 3D model of LiMTAP showed an additional α-helix, at the C terminal extremity. One peptide located in this specific region was used to generate a specific antibody to LiMTAP. In comparison with the active site (AS) of huMTAP, the parasite ASs presented significant differences in the shape and the electrostatic potentials (EPs). Molecular docking of 5'-methylthioadenosine (MTA) and 5'-hydroxyethylthio-adenosine (HETA) on the ASs on the three proteins predicted differential binding modes and interactions when comparing the parasite proteins to the human orthologue. CONCLUSIONS This study highlighted significant structural peculiarities, corresponding to functionally relevant sequence divergence in LiMTAP, making of it a potential drug target against Leishmania.
Collapse
Affiliation(s)
- Hela Abid
- Laboratory of Molecular Epidemiology and Experimental Pathology (LR11IPT04/ LR16IPT04), Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.,Faculté des Sciences de Bizerte, Université de Carthage, Tunis, Tunisie
| | - Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology (LR11IPT04/ LR16IPT04), Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Thouraya Mejri
- Laboratory of Molecular Epidemiology and Experimental Pathology (LR11IPT04/ LR16IPT04), Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Mourad Barhoumi
- Laboratory of Molecular Epidemiology and Experimental Pathology (LR11IPT04/ LR16IPT04), Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology (LR11IPT04/ LR16IPT04), Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.
| |
Collapse
|
17
|
Gadzała M, Kalinowska B, Banach M, Konieczny L, Roterman I. Determining protein similarity by comparing hydrophobic core structure. Heliyon 2017; 3:e00235. [PMID: 28217749 PMCID: PMC5300504 DOI: 10.1016/j.heliyon.2017.e00235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 12/06/2016] [Accepted: 01/19/2017] [Indexed: 12/19/2022] Open
Abstract
Formal assessment of structural similarity is - next to protein structure prediction - arguably the most important unsolved problem in proteomics. In this paper we propose a similarity criterion based on commonalities between the proteins' hydrophobic cores. The hydrophobic core emerges as a result of conformational changes through which each residue reaches its intended position in the protein body. A quantitative criterion based on this phenomenon has been proposed in the framework of the CASP challenge. The structure of the hydrophobic core - including the placement and scope of any deviations from the idealized model - may indirectly point to areas of importance from the point of view of the protein's biological function. Our analysis focuses on an arbitrarily selected target from the CASP11 challenge. The proposed measure, while compliant with CASP criteria (70-80% correlation), involves certain adjustments which acknowledge the presence of factors other than simple spatial arrangement of solids.
Collapse
Affiliation(s)
- M. Gadzała
- AGH - Academic Computer Center − Cyfronet, Nawojki 11, Kraków 30-950, Poland
| | - B. Kalinowska
- Faculty of Physics, Astronomy, Applied Computer Science − Jagiellonian University, Łojasiewicza 11, Kraków 30-348, Poland
| | - M. Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| | - L. Konieczny
- Chair of Medical Biochemistry, Jagiellonian University − Medical College, Kopernika 7, Kraków 31-034, Poland
| | - I. Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| |
Collapse
|
18
|
Wei L, Zou Q. Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition. Int J Mol Sci 2016; 17:ijms17122118. [PMID: 27999256 PMCID: PMC5187918 DOI: 10.3390/ijms17122118] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 12/03/2016] [Accepted: 12/11/2016] [Indexed: 01/22/2023] Open
Abstract
Knowledge on protein folding has a profound impact on understanding the heterogeneity and molecular function of proteins, further facilitating drug design. Predicting the 3D structure (fold) of a protein is a key problem in molecular biology. Determination of the fold of a protein mainly relies on molecular experimental methods. With the development of next-generation sequencing techniques, the discovery of new protein sequences has been rapidly increasing. With such a great number of proteins, the use of experimental techniques to determine protein folding is extremely difficult because these techniques are time consuming and expensive. Thus, developing computational prediction methods that can automatically, rapidly, and accurately classify unknown protein sequences into specific fold categories is urgently needed. Computational recognition of protein folds has been a recent research hotspot in bioinformatics and computational biology. Many computational efforts have been made, generating a variety of computational prediction methods. In this review, we conduct a comprehensive survey of recent computational methods, especially machine learning-based methods, for protein fold recognition. This review is anticipated to assist researchers in their pursuit to systematically understand the computational recognition of protein folds.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| |
Collapse
|
19
|
Gebala M, Bonilla S, Bisaria N, Herschlag D. Does Cation Size Affect Occupancy and Electrostatic Screening of the Nucleic Acid Ion Atmosphere? J Am Chem Soc 2016; 138:10925-34. [PMID: 27479701 PMCID: PMC5010015 DOI: 10.1021/jacs.6b04289] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Indexed: 01/14/2023]
Abstract
Electrostatics are central to all aspects of nucleic acid behavior, including their folding, condensation, and binding to other molecules, and the energetics of these processes are profoundly influenced by the ion atmosphere that surrounds nucleic acids. Given the highly complex and dynamic nature of the ion atmosphere, understanding its properties and effects will require synergy between computational modeling and experiment. Prior computational models and experiments suggest that cation occupancy in the ion atmosphere depends on the size of the cation. However, the computational models have not been independently tested, and the experimentally observed effects were small. Here, we evaluate a computational model of ion size effects by experimentally testing a blind prediction made from that model, and we present additional experimental results that extend our understanding of the ion atmosphere. Giambasu et al. developed and implemented a three-dimensional reference interaction site (3D-RISM) model for monovalent cations surrounding DNA and RNA helices, and this model predicts that Na(+) would outcompete Cs(+) by 1.8-2.1-fold; i.e., with Cs(+) in 2-fold excess of Na(+) the ion atmosphere would contain an equal number of each cation (Nucleic Acids Res. 2015, 43, 8405). However, our ion counting experiments indicate that there is no significant preference for Na(+) over Cs(+). There is an ∼25% preferential occupancy of Li(+) over larger cations in the ion atmosphere but, counter to general expectations from existing models, no size dependence for the other alkali metal ions. Further, we followed the folding of the P4-P6 RNA and showed that differences in folding with different alkali metal ions observed at high concentration arise from cation-anion interactions and not cation size effects. Overall, our results provide a critical test of a computational prediction, fundamental information about ion atmosphere properties, and parameters that will aid in the development of next-generation nucleic acid computational models.
Collapse
Affiliation(s)
- Magdalena Gebala
- Department
of Biochemistry, Stanford University, Stanford, California 94305, United States
| | - Steve Bonilla
- Department
of Chemical Engineering, Stanford University, Stanford, California 94305, United States
| | - Namita Bisaria
- Department
of Biochemistry, Stanford University, Stanford, California 94305, United States
| | - Daniel Herschlag
- Department
of Biochemistry, Stanford University, Stanford, California 94305, United States
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- ChEM-H
Institute, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
20
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 2016; 84 Suppl 1:4-14. [PMID: 27171127 DOI: 10.1002/prot.25064] [Citation(s) in RCA: 149] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 04/29/2016] [Accepted: 05/08/2016] [Indexed: 12/15/2022]
Abstract
Modeling of protein structure from amino acid sequence now plays a major role in structural biology. Here we report new developments and progress from the CASP11 community experiment, assessing the state of the art in structure modeling. Notable points include the following: (1) New methods for predicting three dimensional contacts resulted in a few spectacular template free models in this CASP, whereas models based on sequence homology to proteins with experimental structure continue to be the most accurate. (2) Refinement of initial protein models, primarily using molecular dynamics related approaches, has now advanced to the point where the best methods can consistently (though slightly) improve nearly all models. (3) The use of relatively sparse NMR constraints dramatically improves the accuracy of models, and another type of sparse data, chemical crosslinking, introduced in this CASP, also shows promise for producing better models. (4) A new emphasis on modeling protein complexes, in collaboration with CAPRI, has produced interesting results, but also shows the need for more focus on this area. (5) Methods for estimating the accuracy of models have advanced to the point where they are of considerable practical use. (6) A first assessment demonstrates that models can sometimes successfully address biological questions that motivate experimental structure determination. (7) There is continuing progress in accuracy of modeling regions of structure not directly available by comparative modeling, while there is marginal or no progress in some other areas. Proteins 2016; 84(Suppl 1):4-14. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850.
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, 95616
| | | | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
21
|
Busato M, Giorgetti A. Structural modeling of G-protein coupled receptors: An overview on automatic web-servers. Int J Biochem Cell Biol 2016; 77:264-74. [PMID: 27102413 DOI: 10.1016/j.biocel.2016.04.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 04/09/2016] [Accepted: 04/15/2016] [Indexed: 12/27/2022]
Abstract
Despite the significant efforts and discoveries during the last few years in G protein-coupled receptor (GPCR) expression and crystallization, the receptors with known structures to date are limited only to a small fraction of human GPCRs. The lack of experimental three-dimensional structures of the receptors represents a strong limitation that hampers a deep understanding of their function. Computational techniques are thus a valid alternative strategy to model three-dimensional structures. Indeed, recent advances in the field, together with extraordinary developments in crystallography, in particular due to its ability to capture GPCRs in different activation states, have led to encouraging results in the generation of accurate models. This, prompted the community of modelers to render their methods publicly available through dedicated databases and web-servers. Here, we present an extensive overview on these services, focusing on their advantages, drawbacks and their role in successful applications. Future challenges in the field of GPCR modeling, such as the predictions of long loop regions and the modeling of receptor activation states are presented as well.
Collapse
Affiliation(s)
- Mirko Busato
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy.
| | - Alejandro Giorgetti
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy; Computational Biomedicine, Institute for Advanced Simulation IAS-5 and Computational Biomedicine, Institute of Neuroscience and Medicine INM-9, Forschungszentrum Jülich, Germany.
| |
Collapse
|
22
|
Roy S, Guzzi PH. Biological Network Inference from Microarray Data, Current Solutions, and Assessments. Methods Mol Biol 2016; 1375:155-167. [PMID: 26507508 DOI: 10.1007/7651_2015_284] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Currently in bioinformatics and systems biology there is a growing interest for the analysis of associations among biological molecules at a network level. A main research in this area is represented by the inference of biological networks from experimental data. Biological network inference aims to reconstruct network of interactions (or associations) among biological molecules (e.g., genes or proteins) starting from experimental observations. The current scenario is characterized by a growing number of algorithms for the inference, while few attention has been posed on the determination of fair assessments and comparisons. Current assessments are usually based on the comparison of the algorithms using reference networks or gold standard datasets. Here we survey some selected inference algorithms and we compare current assessments. We also present a systematic listing of freely available inference and assessment tools for easy reference. Finally we outline some possible future directions of research, such as the use of a prior knowledge into the assessment process.
Collapse
Affiliation(s)
- Swarup Roy
- Department of Information Technology, North-Eastern Hill University, Shillong, India.
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
23
|
Lee TV, Johnson RD, Arcus VL, Lott JS. Prediction of the substrate for nonribosomal peptide synthetase (NRPS) adenylation domains by virtual screening. Proteins 2015; 83:2052-66. [PMID: 26358936 DOI: 10.1002/prot.24922] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 08/19/2015] [Accepted: 08/28/2015] [Indexed: 12/28/2022]
Abstract
Nonribosomal peptide synthetases (NRPSs) synthesize a diverse array of bioactive small peptides, many of which are used in medicine. There is considerable interest in predicting NRPS substrate specificity in order to facilitate investigation of the many "cryptic" NRPS genes that have not been linked to any known product. However, the current sequence similarity-based methods are unable to produce reliable predictions when there is a lack of prior specificity data, which is a particular problem for fungal NRPSs. We conducted virtual screening on the specificity-determining domain of NRPSs, the adenylation domain, and found that virtual screening using experimentally determined structures results in good enrichment of the cognate substrate. Our results indicate that the conformation of the adenylation domain and in particular the conformation of a key conserved aromatic residue is important in determining the success of the virtual screening. When homology models of NRPS adenylation domains of known specificity, rather than experimentally determined structures, were built and used for virtual screening, good enrichment of the cognate substrate was also achieved in many cases. However, the accuracy of the models was key to the reliability of the predictions and there was a large variation in the results when different models of the same domain were used. This virtual screening approach is promising and is able to produce enrichment of the cognate substrates in many cases, but improvements in building and assessing homology models are required before the approach can be reliably applied to these models.
Collapse
Affiliation(s)
- T Verne Lee
- School of Biological Sciences, University of Auckland, Auckland, New Zealand.,Maurice Wilkins Centre for Molecular Biodiscovery, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Richard D Johnson
- AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand
| | - Vickery L Arcus
- Maurice Wilkins Centre for Molecular Biodiscovery, School of Biological Sciences, University of Auckland, Auckland, New Zealand.,Department of Biological Sciences, University of Waikato, Hamilton, New Zealand
| | - J Shaun Lott
- School of Biological Sciences, University of Auckland, Auckland, New Zealand.,Maurice Wilkins Centre for Molecular Biodiscovery, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
24
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
25
|
Abstract
BACKGROUND Recognizing the correct structural fold among known template protein structures for a target protein (i.e. fold recognition) is essential for template-based protein structure modeling. Since the fold recognition problem can be defined as a binary classification problem of predicting whether or not the unknown fold of a target protein is similar to an already known template protein structure in a library, machine learning methods have been effectively applied to tackle this problem. In our work, we developed RF-Fold that uses random forest - one of the most powerful and scalable machine learning classification methods - to recognize protein folds. RESULTS RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels. CONCLUSIONS The good performance achieved by the RF-Fold demonstrates the random forest's effectiveness for protein fold recognition.
Collapse
Affiliation(s)
- Taeho Jo
- Department of Computer Science, Informatics Institute, C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
26
|
Cao R, Wang Z, Wang Y, Cheng J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics 2014; 15:120. [PMID: 24776231 PMCID: PMC4013430 DOI: 10.1186/1471-2105-15-120] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 04/15/2014] [Indexed: 01/19/2023] Open
Abstract
Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, Christopher S, Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
27
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014. [PMID: 24344053 DOI: 10.1002/prot.24452.critical] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850
| | | | | | | | | |
Collapse
|
28
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014; 82 Suppl 2:1-6. [PMID: 24344053 PMCID: PMC4394854 DOI: 10.1002/prot.24452] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/21/2013] [Indexed: 12/28/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland 20850
| | | | | | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
29
|
Terashi G, Nakamura Y, Shimoyama H, Takeda-Shitaka M. Quality Assessment Methods for 3D Protein Structure Models Based on a Residue–Residue Distance Matrix Prediction. Chem Pharm Bull (Tokyo) 2014; 62:744-53. [PMID: 25087626 DOI: 10.1248/cpb.c13-00973] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
30
|
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
|
31
|
Mirzaie M, Sadeghi M. Delaunay-based nonlocal interactions are sufficient and accurate in protein fold recognition. Proteins 2013; 82:415-23. [DOI: 10.1002/prot.24407] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 08/12/2013] [Accepted: 08/21/2013] [Indexed: 01/05/2023]
Affiliation(s)
- Mehdi Mirzaie
- Department of Basic Sciences, Faculty of Paramedical Sciences; Shahid Beheshti University of Medical Sciences; Tehran Iran
- Department of Bioinformatics; School of Computer Science, Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| | - Mehdi Sadeghi
- Department of Bioinformatics, National Institute of Genetic Engineering and Biotechnology; Tehran Iran
| |
Collapse
|
32
|
Krupa P, Sieradzan AK, Rackovsky S, Baranowski M, Ołldziej S, Scheraga HA, Liwo A, Czaplewski C. Improvement of the treatment of loop structures in the UNRES force field by inclusion of coupling between backbone- and side-chain-local conformational states. J Chem Theory Comput 2013; 9. [PMID: 24273465 DOI: 10.1021/ct4004977] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The UNited RESidue (UNRES) coarse-grained model of polypeptide chains, developed in our laboratory, enables us to carry out millisecond-scale molecular-dynamics simulations of large proteins effectively. It performs well in ab initio predictions of protein structure, as demonstrated in the last Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). However, the resolution of the simulated structure is too coarse, especially in loop regions, which results from insufficient specificity of the model of local interactions. To improve the representation of local interactions, in this work we introduced new side-chain-backbone correlation potentials, derived from a statistical analysis of loop regions of 4585 proteins. To obtain sufficient statistics, we reduced the set of amino-acid-residue types to five groups, derived in our earlier work on structurally optimized reduced alphabets, based on a statistical analysis of the properties of amino-acid structures. The new correlation potentials are expressed as one-dimensional Fourier series in the virtual-bond-dihedral angles involving side-chain centroids. The weight of these new terms was determined by a trial-and-error method, in which Multiplexed Replica Exchange Molecular Dynamics (MREMD) simulations were run on selected test proteins. The best average root-mean-square deviations (RMSDs) of the calculated structures from the experimental structures below the folding-transition temperatures were obtained with the weight of the new side-chain-backbone correlation potentials equal to 0.57. The resulting conformational ensembles were analyzed in detail by using the Weighted Histogram Analysis Method (WHAM) and Ward's minimum-variance clustering. This analysis showed that the RMSDs from the experimental structures dropped by 0.5 Å on average, compared to simulations without the new terms, and the deviation of individual residues in the loop region of the computed structures from their counterparts in the experimental structures (after optimum superposition of the calculated and experimental structure) decreased by up to 8 Å. Consequently, the new terms improve the representation of local structure.
Collapse
Affiliation(s)
- Paweł Krupa
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland.,Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| | - S Rackovsky
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A.,Dept. of Pharmacology and Systems Therapeutics, The Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, U.S.A
| | - Maciej Baranowski
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Kładki 24, 80-922 Gdańsk, Poland
| | - Stanisław Ołldziej
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Kładki 24, 80-922 Gdańsk, Poland
| | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| |
Collapse
|
33
|
Chida A, Yan-Qing Zhang, Harrison R. Enhanced Encoding with Improved Fuzzy Decision Tree Testing Using CASP Templates. IEEE COMPUT INTELL M 2012. [DOI: 10.1109/mci.2012.2215134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
34
|
Ray A, Lindahl E, Wallner B. Improved model quality assessment using ProQ2. BMC Bioinformatics 2012; 13:224. [PMID: 22963006 PMCID: PMC3584948 DOI: 10.1186/1471-2105-13-224] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Accepted: 09/07/2012] [Indexed: 11/19/2022] Open
Abstract
Background Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement. Results Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local. Conclusions ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson’s correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.
Collapse
Affiliation(s)
- Arjun Ray
- Department of Theoretical Physics & Swedish eScience Research Center, Royal Institute of Technology, Stockholm, Sweden
| | | | | |
Collapse
|
35
|
Maadooliat M, Gao X, Huang JZ. Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles. Brief Bioinform 2012; 14:724-36. [PMID: 22926831 DOI: 10.1093/bib/bbs052] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence-structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu.edu/∼madoliat/LagSVD) that can be used to produce informative animations.
Collapse
Affiliation(s)
- Mehdi Maadooliat
- Mathematical and Computer Sciences and Engineering Division, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia, . Jianhua Z. Huang, Department of Statistics, 447 Blocker Building, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143 (USA), E-mail:
| | | | | |
Collapse
|
36
|
Abstract
Functional characterization of proteins being one of the major issues in molecular biology is still unsolved due to several resource and technical limitations of experimental structure determination methods. A suitable methodology for accurate prediction of protein confirmations simply from sequence is therefore emerging as the primary modeling goal of researchers today. Global blind protein structure prediction summit, entitled Critical Assessment of Structure Prediction (CASP), critically assesses the modeling methodologies, to track our algorithmic path development. But our success is still impeded by incompetent modeling methodologies and several key technical lacunas. There is still a great need to focus some key issues for bridging the major though considered trivial gaps, in the upcoming CASP to pave and demarcate our correct way of developing a consistently accurate prediction methodology in the near future. Major problems resulting in divergence of our predicted models from their actual native states are thus highlighted with suggested more stringent and reliable assessment considerations in the CASP test.
Collapse
Affiliation(s)
- Ashish Runthala
- Biological Sciences, Faculty Division III, Birla Institute of Technology & Science, Pilani, Rajasthan, India.
| |
Collapse
|
37
|
Corbeil CR, Williams CI, Labute P. Variability in docking success rates due to dataset preparation. J Comput Aided Mol Des 2012; 26:775-86. [PMID: 22566074 PMCID: PMC3397132 DOI: 10.1007/s10822-012-9570-1] [Citation(s) in RCA: 281] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 04/03/2012] [Indexed: 01/22/2023]
Abstract
The results of cognate docking with the prepared Astex dataset provided by the organizers of the "Docking and Scoring: A Review of Docking Programs" session at the 241st ACS national meeting are presented. The MOE software with the newly developed GBVI/WSA dG scoring function is used throughout the study. For 80 % of the Astex targets, the MOE docker produces a top-scoring pose within 2 Å of the X-ray structure. For 91 % of the targets a pose within 2 Å of the X-ray structure is produced in the top 30 poses. Docking failures, defined as cases where the top scoring pose is greater than 2 Å from the experimental structure, are shown to be largely due to the absence of bound waters in the source dataset, highlighting the need to include these and other crucial information in future standardized sets. Docking success is shown to depend heavily on data preparation. A "dataset preparation" error of 0.5 kcal/mol is shown to cause fluctuations of over 20 % in docking success rates.
Collapse
Affiliation(s)
- Christopher R Corbeil
- Chemical Computing Group, Suite 910, 1010 Sherbrooke Street West, Montreal, QC, H3A 2R7, Canada.
| | | | | |
Collapse
|
38
|
Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis P, Karaca E, Melquiond ASJ, Bonvin AMJJ. Clustering biomolecular complexes by residue contacts similarity. Proteins 2012; 80:1810-7. [PMID: 22489062 DOI: 10.1002/prot.24078] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Revised: 03/14/2012] [Accepted: 03/30/2012] [Indexed: 01/01/2023]
Abstract
Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors.
Collapse
Affiliation(s)
- João P G L M Rodrigues
- Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, 3584 CH Utrecht, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
39
|
Cheng J, Li J, Wang Z, Eickholt J, Deng X. The MULTICOM toolbox for protein structure prediction. BMC Bioinformatics 2012; 13:65. [PMID: 22545707 PMCID: PMC3495398 DOI: 10.1186/1471-2105-13-65] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 04/30/2012] [Indexed: 12/31/2022] Open
Abstract
Background As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources. Results To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction. Conclusions These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, University of Missouri-Columbia, Columbia, MO 65211, USA.
| | | | | | | | | |
Collapse
|
40
|
Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 2012; 80:1715-35. [PMID: 22411565 DOI: 10.1002/prot.24065] [Citation(s) in RCA: 596] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Revised: 01/23/2012] [Accepted: 03/03/2012] [Indexed: 11/09/2022]
Abstract
Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
41
|
Chatterjee S, Bhattacharyya M, Vishveshwara S. Network properties of protein-decoy structures. J Biomol Struct Dyn 2012; 29:606-22. [DOI: 10.1080/07391102.2011.672625] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
42
|
Kim JH, Lim JW, Lee SW, Kim KR, No KT. Prediction of Binding Mode between Chemokine Receptor CCR2 and Its Known Antagonists using Ligand Supported Homology Modeling. B KOREAN CHEM SOC 2012. [DOI: 10.5012/bkcs.2012.33.2.717] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
43
|
Srivastava M, Gupta SK, Abhilash PC, Singh N. Structure prediction and binding sites analysis of curcin protein of Jatropha curcas using computational approaches. J Mol Model 2011; 18:2971-9. [PMID: 22146985 DOI: 10.1007/s00894-011-1320-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 11/22/2011] [Indexed: 11/29/2022]
Abstract
Ribosome inactivating proteins (RIPs) are defense proteins in a number of higher-plant species that are directly targeted toward herbivores. Jatropha curcas is one of the biodiesel plants having RIPs. The Jatropha seed meal, after extraction of oil, is rich in curcin, a highly toxic RIP similar to ricin, which makes it unsuitable for animal feed. Although the toxicity of curcin is well documented in the literature, the detailed toxic properties and the 3D structure of curcin has not been determined by X-ray crystallography, NMR spectroscopy or any in silico techniques to date. In this pursuit, the structure of curcin was modeled by a composite approach of 3D structure prediction using threading and ab initio modeling. Assessment of model quality was assessed by methods which include Ramachandran plot analysis and Qmean score estimation. Further, we applied the protein-ligand docking approach to identify the r-RNA binding residue of curcin. The present work provides the first structural insight into the binding mode of r-RNA adenine to the curcin protein and forms the basis for designing future inhibitors of curcin. Cloning of a future peptide inhibitor within J. curcas can produce non-toxic varieties of J. curcas, which would make the seed-cake suitable as animal feed without curcin detoxification.
Collapse
Affiliation(s)
- Mugdha Srivastava
- Eco-Auditing Laboratory, National Botanical Research Institute, CSIR, Lucknow, 226001 Uttar Pradesh, India.
| | | | | | | |
Collapse
|
44
|
Bardwell DA, Adjiman CS, Arnautova YA, Bartashevich E, Boerrigter SXM, Braun DE, Cruz-Cabeza AJ, Day GM, Della Valle RG, Desiraju GR, van Eijck BP, Facelli JC, Ferraro MB, Grillo D, Habgood M, Hofmann DWM, Hofmann F, Jose KVJ, Karamertzanis PG, Kazantsev AV, Kendrick J, Kuleshova LN, Leusen FJJ, Maleev AV, Misquitta AJ, Mohamed S, Needs RJ, Neumann MA, Nikylov D, Orendt AM, Pal R, Pantelides CC, Pickard CJ, Price LS, Price SL, Scheraga HA, van de Streek J, Thakur TS, Tiwari S, Venuti E, Zhitkov IK. Towards crystal structure prediction of complex organic compounds--a report on the fifth blind test. ACTA CRYSTALLOGRAPHICA. SECTION B, STRUCTURAL SCIENCE 2011; 67:535-51. [PMID: 22101543 PMCID: PMC3222142 DOI: 10.1107/s0108768111042868] [Citation(s) in RCA: 247] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 10/16/2011] [Indexed: 12/04/2022]
Abstract
Following on from the success of the previous crystal structure prediction blind tests (CSP1999, CSP2001, CSP2004 and CSP2007), a fifth such collaborative project (CSP2010) was organized at the Cambridge Crystallographic Data Centre. A range of methodologies was used by the participating groups in order to evaluate the ability of the current computational methods to predict the crystal structures of the six organic molecules chosen as targets for this blind test. The first four targets, two rigid molecules, one semi-flexible molecule and a 1:1 salt, matched the criteria for the targets from CSP2007, while the last two targets belonged to two new challenging categories - a larger, much more flexible molecule and a hydrate with more than one polymorph. Each group submitted three predictions for each target it attempted. There was at least one successful prediction for each target, and two groups were able to successfully predict the structure of the large flexible molecule as their first place submission. The results show that while not as many groups successfully predicted the structures of the three smallest molecules as in CSP2007, there is now evidence that methodologies such as dispersion-corrected density functional theory (DFT-D) are able to reliably do so. The results also highlight the many challenges posed by more complex systems and show that there are still issues to be overcome.
Collapse
Affiliation(s)
- David A Bardwell
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
POLEKSIC ALEKSANDAR. OPTIMAL PAIRWISE ALIGNMENT OF FIXED PROTEIN STRUCTURES IN SUBQUADRATIC TIME. J Bioinform Comput Biol 2011; 9:367-82. [DOI: 10.1142/s0219720011005562] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Revised: 03/30/2011] [Accepted: 04/11/2011] [Indexed: 11/18/2022]
Abstract
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith–Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith–Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed–accuracy tradeoff in a number of popular protein structure alignment methods.
Collapse
Affiliation(s)
- ALEKSANDAR POLEKSIC
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50613, USA
| |
Collapse
|
46
|
Menon R, Roy A, Mukherjee S, Belkin S, Zhang Y, Omenn GS. Functional implications of structural predictions for alternative splice proteins expressed in Her2/neu-induced breast cancers. J Proteome Res 2011; 10:5503-11. [PMID: 22003824 DOI: 10.1021/pr200772w] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Alternative splicing allows a single gene to generate multiple mRNA transcripts, which can be translated into functionally diverse proteins. However, experimentally determined structures of protein splice isoforms are rare, and homology modeling methods are poor at predicting atomic-level structural differences because of high sequence identity. Here we exploit the state-of-the-art structure prediction method I-TASSER to analyze the structural and functional consequences of alternative splicing of proteins differentially expressed in a breast cancer model. We first successfully benchmarked the I-TASSER pipeline for structure modeling of all seven pairs of protein splice isoforms, which are known to have experimentally solved structures. We then modeled three cancer-related variant pairs reported to have opposite functions. In each pair, we observed structural differences in regions where the presence or absence of a motif can directly influence the distinctive functions of the variants. Finally, we applied the method to five splice variants overexpressed in mouse Her2/neu mammary tumor: anxa6, calu, cdc42, ptbp1, and tax1bp3. Despite >75% sequence identity between the variants, structural differences were observed in biologically important regions of these protein pairs. These results demonstrate the feasibility of integrating proteomic analysis with structure-based conformational predictions of differentially expressed alternative splice variants in cancers and other conditions.
Collapse
Affiliation(s)
- Rajasree Menon
- Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States.
| | | | | | | | | | | |
Collapse
|
47
|
Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round IX. Proteins 2011; 79 Suppl 10:1-5. [PMID: 21997831 DOI: 10.1002/prot.23200] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 09/12/2011] [Indexed: 12/16/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the ninth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Methods for modeling protein structure continue to advance, although at a more modest pace than in the early CASP experiments. CASP developments of note are indications of improvement in model accuracy for some classes of target, an improved ability to choose the most accurate of a set of generated models, and evidence of improvement in accuracy for short "new fold" models. In addition, a new analysis of regions of models not derivable from the most obvious template structure has revealed better performance than expected.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, MD 20850, USA.
| | | | | | | |
Collapse
|
48
|
Wang Q, Vantasin K, Xu D, Shang Y. MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction. Proteins 2011; 79 Suppl 10:185-95. [PMID: 21997748 DOI: 10.1002/prot.23185] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 08/25/2011] [Accepted: 08/27/2011] [Indexed: 11/07/2022]
Abstract
Assessing the quality of predicted models is essential in protein tertiary structure prediction. In the past critical assessment of techniques for protein structure prediction (CASP) experiments, consensus quality assessment (QA) methods have shown to be very effective, outperforming single-model methods and other competing approaches by a large margin. In the consensus QA approach, the quality score of a model is typically estimated based on pair-wise structure similarity of it to a set of reference models. In CASP8, the differences among the top QA servers were mostly in the selection of the reference models. In this article, we present a new consensus method "SelCon" based on two key ideas: (1) to adaptively select appropriate reference models based on the attributes of the whole set of predicted models and (2) to weigh different reference models differently, and in particular not to use models that are too similar or too different from the candidate model as its references. We have developed several reference selection functions in SelCon and obtained improved QA results over existing QA methods in experiments using CASP7 and CASP8 data. In the recently completed CASP9 in 2010, the new method was implemented in our MUFOLD-WQA server. Both the official CASP9 assessment and our in-house evaluation showed that MUFOLD-WQA performed very well and achieved top performances in both the global structure QA and top-model selection category in CASP9.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
49
|
Kuziemko A, Honig B, Petrey D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 2011; 7:e1002175. [PMID: 21998567 PMCID: PMC3188491 DOI: 10.1371/journal.pcbi.1002175] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/14/2011] [Indexed: 11/18/2022] Open
Abstract
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. It has been suggested that, for nearly every protein sequence, there is already a protein with a similar structure in current protein structure databases. However, with poor or undetectable sequence relationships, it is expected that accurate alignments and models cannot be generated. Here we show that this is not the case, and that whenever structural relationship exists, there are usually local sequence relationships that can be used to generate an accurate alignment, no matter what the global sequence identity. However, this requires an alternative to the traditional dynamic programming algorithm and the consideration of a small ensemble of alignments. We present an algorithm, S4, and demonstrate that it is capable of generating accurate alignments in nearly all cases where a structural relationship exists between two proteins. Our results thus constitute an important advance in the full exploitation of the information in structural databases. That is, the expectation of an accurate alignment suggests that a meaningful model can be generated for nearly every sequence for which a suitable template exists.
Collapse
Affiliation(s)
- Andrew Kuziemko
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
50
|
Li Y, Zhang J, Tai D, Middaugh CR, Zhang Y, Fang J. PROTS: a fragment based protein thermo-stability potential. Proteins 2011; 80:81-92. [PMID: 21976375 DOI: 10.1002/prot.23163] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 07/18/2011] [Accepted: 07/31/2011] [Indexed: 12/30/2022]
Abstract
Designing proteins with enhanced thermo-stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo-stable proteins are in critical demand. Here we report PROTS, a sequential and structural four-residue fragment based protein thermo-stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo-stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo-stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white-box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, the University of Kansas, Lawrence, Kansas 66047, USA
| | | | | | | | | | | |
Collapse
|