1
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 100] [Impact Index Per Article: 100.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
2
|
Nagaraju M, Liu H. A scoring function for the prediction of protein complex interfaces based on the neighborhood preferences of amino acids. Acta Crystallogr D Struct Biol 2023; 79:31-39. [PMID: 36601805 DOI: 10.1107/s2059798322011858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
Proteins often assemble into functional complexes, the structures of which are more difficult to obtain than those of the individual protein molecules. Given the structures of the subunits, it is possible to predict plausible complex models via computational methods such as molecular docking. Assessing the quality of the predicted models is crucial to obtain correct complex structures. Here, an energy-scoring function was developed based on the interfacial residues of structures in the Protein Data Bank. The statistically derived energy function (Nepre) imitates the neighborhood preferences of amino acids, including the types and relative positions of neighboring residues. Based on the preference statistics, a program iNepre was implemented and its performance was evaluated with several benchmarking decoy data sets. The results show that iNepre scores are powerful in model ranking to select the best protein complex structures.
Collapse
Affiliation(s)
- Mulpuri Nagaraju
- Complex Systems Division, Beijing Computational Science Research Center, Beijing 100193, People's Republic of China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Beijing 100193, People's Republic of China
| |
Collapse
|
3
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
4
|
Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble. Sci Rep 2022; 12:10018. [PMID: 35705565 PMCID: PMC9200820 DOI: 10.1038/s41598-022-13714-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins exist in several different conformations. These structural changes are often associated with fluctuations at the residue level. Recent findings show that co-evolutionary analysis coupled with machine-learning techniques improves the precision by providing quantitative distance predictions between pairs of residues. The predicted statistical distance distribution from Multi Sequence Analysis reveals the presence of different local maxima suggesting the flexibility of key residue pairs. Here we investigate the ability of the residue-residue distance prediction to provide insights into the protein conformational ensemble. We combine deep learning approaches with mechanistic modeling to a set of proteins that experimentally showed conformational changes. The predicted protein models were filtered based on energy scores, RMSD clustering, and the centroids selected as the lowest energy structure per cluster. These models were compared to the experimental-Molecular Dynamics (MD) relaxed structure by analyzing the backbone residue torsional distribution and the sidechain orientations. Our pipeline allows to retrieve the experimental structural dynamics experimentally represented by different X-ray conformations for the same sequence as well the conformational space observed with the MD simulations. We show the potential correlation between the experimental structure dynamics and the predicted model ensemble demonstrating the susceptibility of the current state-of-the-art methods in protein folding and dynamics prediction and pointing out the areas of improvement.
Collapse
|
5
|
Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.
Collapse
|
6
|
Xia YH, Peng CX, Zhou XG, Zhang GJ. A Sequential Niche Multimodal Conformational Sampling Algorithm for Protein Structure Prediction. Bioinformatics 2021; 37:4357-4365. [PMID: 34245242 DOI: 10.1093/bioinformatics/btab500] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/23/2021] [Accepted: 07/05/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
7
|
Kwon Y, Lee J. MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES. J Cheminform 2021; 13:24. [PMID: 33736687 PMCID: PMC7977239 DOI: 10.1186/s13321-021-00501-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 02/27/2021] [Indexed: 12/22/2022] Open
Abstract
Here, we introduce a new molecule optimization method, MolFinder, based on an efficient global optimization algorithm, the conformational space annealing algorithm, and the SMILES representation. MolFinder finds diverse molecules with desired properties efficiently without any training and a large molecular database. Compared with recently proposed reinforcement-learning-based molecule optimization algorithms, MolFinder consistently outperforms in terms of both the optimization of a given target property and the generation of a set of diverse and novel molecules. The efficiency of MolFinder demonstrates that combinatorial optimization using the SMILES representation is a promising approach for molecule optimization, which has not been well investigated despite its simplicity. We believe that our results shed light on new possibilities for advances in molecule optimization methods.
Collapse
Affiliation(s)
- Yongbeom Kwon
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, 1 Gangwondaehak-gil, Chuncheon, 24341, Republic of Korea.,Arontier Inc., 15F, 241, Gangnam-daero, Seocho-gu, Seoul, 06735, Republic of Korea
| | - Juyong Lee
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, 1 Gangwondaehak-gil, Chuncheon, 24341, Republic of Korea.
| |
Collapse
|
8
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
9
|
Chen X, Song S, Ji J, Tang Z, Todo Y. Incorporating a multiobjective knowledge-based energy function into differential evolution for protein structure prediction. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
10
|
Lee J, Brooks BR. Direct global optimization of Onsager-Machlup action using Action-CSA. Chem Phys 2020. [DOI: 10.1016/j.chemphys.2020.110768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
12
|
Revisiting the "satisfaction of spatial restraints" approach of MODELLER for protein homology modeling. PLoS Comput Biol 2019; 15:e1007219. [PMID: 31846452 PMCID: PMC6938380 DOI: 10.1371/journal.pcbi.1007219] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 12/31/2019] [Accepted: 11/13/2019] [Indexed: 01/02/2023] Open
Abstract
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the “modeling by satisfaction of spatial restraints” strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program’s predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER’s objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance. Proteins are fundamental biological molecules that carry out countless activities in living beings. Since the function of proteins is dictated by their three-dimensional atomic structures, acquiring structural details of proteins provides deep insights into their function. Currently, the most frequently used computational approach for protein structure prediction is template-based modeling. In this approach, a target protein is modeled using the experimentally-derived structural information of a template protein assumed to have a similar structure to the target. MODELLER is the most frequently used program for template-based 3D model building. Despite its success, its predictions are not always accurate enough to be useful in Biomedical Research. Here, we show that it is possible to greatly increase the performance of MODELLER by modifying two aspects of its algorithm. First, we demonstrate that providing the program with accurate estimations of local target-template structural divergence greatly increases the quality of its predictions. Additionally, we show that modifying MODELLER’s scoring function with statistical potential energetic terms also helps to improve modeling quality. This work will be useful in future research, since it reports practical strategies to improve the performance of this core tool in Structural Bioinformatics.
Collapse
|
13
|
Li ZW, Sun K, Hao XH, Hu J, Ma LF, Zhou XG, Zhang GJ. Loop Enhanced Conformational Resampling Method for Protein Structure Prediction. IEEE Trans Nanobioscience 2019; 18:567-577. [PMID: 31180866 DOI: 10.1109/tnb.2019.2922101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein structure prediction has been a long-standing problem for the past decades. In particular, the loop region structure remains an obstacle in forming an accurate protein tertiary structure because of its flexibility. In this study, Rama torsion angle and secondary structure feature-guided differential evolution named RSDE is proposed to predict three-dimensional structure with the exploitation on the loop region structure. In RSDE, the structure of the loop region is improved by the following: loop-based cross operator, which interchanges configuration of a randomly selected loop region between individuals, and loop-based mutate operator, which considers torsion angle feature into conformational sampling. A stochastic ranking selective strategy is designed to select conformations with low energy and near-native structure. Moreover, the conformational resampling method, which uses previously learned knowledge to guide subsequent sampling, is proposed to improve the sampling efficiency. Experiments on a total of 28 test proteins reveals that the proposed RSDE is effective and can obtain native-like models.
Collapse
|
14
|
Song S, Ji J, Chen X, Gao S, Tang Z, Todo Y. Adoption of an improved PSO to explore a compound multi-objective energy function in protein structure prediction. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.07.042] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
15
|
Gao S, Song S, Cheng J, Todo Y, Zhou M. Incorporation of Solvent Effect into Multi-Objective Evolutionary Algorithm for Improved Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1365-1378. [PMID: 28534784 DOI: 10.1109/tcbb.2017.2705094] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The problem of predicting the three-dimensional (3-D) structure of a protein from its one-dimensional sequence has been called the "holy grail of molecular biology", and it has become an important part of structural genomics projects. Despite the rapid developments in computer technology and computational intelligence, it remains challenging and fascinating. In this paper, to solve it we propose a multi-objective evolutionary algorithm. We decompose the protein energy function Chemistry at HARvard Macromolecular Mechanics force fields into bond and non-bond energies as the first and second objectives. Considering the effect of solvent, we innovatively adopt a solvent-accessible surface area as the third objective. We use 66 benchmark proteins to verify the proposed method and obtain better or competitive results in comparison with the existing methods. The results suggest the necessity to incorporate the effect of solvent into a multi-objective evolutionary algorithm to improve protein structure prediction in terms of accuracy and efficiency.
Collapse
|
16
|
AIMOES: Archive information assisted multi-objective evolutionary strategy for ab initio protein structure prediction. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.01.028] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
17
|
Joo K, Heo S, Joung I, Hong SH, Lee SJ, Lee J. Data-assisted protein structure modeling by global optimization in CASP12. Proteins 2018; 86 Suppl 1:240-246. [PMID: 29341255 DOI: 10.1002/prot.25457] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 12/29/2017] [Accepted: 01/08/2018] [Indexed: 12/26/2022]
Abstract
In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling.
Collapse
Affiliation(s)
- Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Seungryong Heo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Seung Hwan Hong
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,The Research Institute for Basic Sciences, Changwon National University, Changwon-Si, Gyeongsangnam-do, South Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| |
Collapse
|
18
|
Hong SH, Joung I, Flores-Canales JC, Manavalan B, Cheng Q, Heo S, Kim JY, Lee SY, Nam M, Joo K, Lee IH, Lee SJ, Lee J. Protein structure modeling and refinement by global optimization in CASP12. Proteins 2017; 86 Suppl 1:122-135. [PMID: 29159837 DOI: 10.1002/prot.25426] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 11/10/2017] [Accepted: 11/16/2017] [Indexed: 11/09/2022]
Abstract
For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded.
Collapse
Affiliation(s)
- Seung Hwan Hong
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jose C Flores-Canales
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Balachandran Manavalan
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Qianyi Cheng
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Seungryong Heo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Sun Young Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Mikyung Nam
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| | - In-Ho Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,The Research Institute for Basic Sciences, Changwon National University, Changwon-Si, Gyeongsangnam-do, South Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| |
Collapse
|
19
|
Zhang GJ, Zhou XG, Yu XF, Hao XH, Yu L. Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1288-1301. [PMID: 28113726 DOI: 10.1109/tcbb.2016.2566617] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
De novo protein structure prediction aims to search for low-energy conformations as it follows the thermodynamics hypothesis that places native conformations at the global minimum of the protein energy surface. However, the native conformation is not necessarily located in the lowest-energy regions owing to the inaccuracies of the energy model. This study presents a differential evolution algorithm using distance profile-based selection strategy to sample conformations with reasonable structure effectively. In the proposed algorithm, besides energy, the residue-residue distance is considered another measure of the conformation. The average distance errors of decoys between the distance of each residue pair and the corresponding distance in the distance profiles are first calculated when the trial conformation yields a larger energy value than that of the target. Then, the distance acceptance probability of the trial conformation is designed based on distance profiles if the trial conformation obtains a lower average distance error compared with that of the target conformation. The trial conformation is accepted to the next generation in accordance with its distance acceptance probability. By using the dual constraints of energy and distance in guiding sampling, the algorithm can sample conformations with lower energies and more reasonable structures. Experimental results of 28 benchmark proteins show that the proposed algorithm can effectively predict near-native protein structures.
Collapse
|
20
|
Hao XH, Zhang GJ, Zhou XG. Conformational Space Sampling Method Using Multi-Subpopulation Differential Evolution for De novo Protein Structure Prediction. IEEE Trans Nanobioscience 2017; 16:618-633. [DOI: 10.1109/tnb.2017.2749243] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
21
|
Inverse Resolution Limit of Partition Density and Detecting Overlapping Communities by Link-Surprise. Sci Rep 2017; 7:12399. [PMID: 28963540 PMCID: PMC5622083 DOI: 10.1038/s41598-017-12432-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 09/04/2017] [Indexed: 11/16/2022] Open
Abstract
Finding overlapping communities of complex networks remains a challenge in network science. To address this challenge, one of the widely used approaches is finding the communities of links by optimizing the objective function, partition density. In this study, we show that partition density suffers from inverse resolution limit; it has a strong preference to triangles. This resolution limit makes partition density an improper objective function for global optimization. The conditions where partition density prefers triangles to larger link community structures are analytically derived and confirmed with global optimization calculations using synthetic and real-world networks. To overcome this limitation of partition density, we suggest an alternative measure, Link Surprise, to find link communities, which is suitable for global optimization. Benchmark studies demonstrate that global optimization of Link Surprise yields meaningful and more accurate link community structures than partition density optimization.
Collapse
|
22
|
Lee J, Lee IH, Joung I, Lee J, Brooks BR. Finding multiple reaction pathways via global optimization of action. Nat Commun 2017; 8:15443. [PMID: 28548089 PMCID: PMC5458546 DOI: 10.1038/ncomms15443] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 03/24/2017] [Indexed: 12/25/2022] Open
Abstract
Global searching for reaction pathways is a long-standing challenge in computational chemistry and biology. Most existing approaches perform only local searches due to computational complexity. Here we present a computational approach, Action-CSA, to find multiple diverse reaction pathways connecting fixed initial and final states through global optimization of the Onsager-Machlup action using the conformational space annealing (CSA) method. Action-CSA successfully overcomes large energy barriers via crossovers and mutations of pathways and finds all possible pathways of small systems without initial guesses on pathways. The rank order and the transition time distribution of multiple pathways are in good agreement with those of long Langevin dynamics simulations. The lowest action folding pathway of FSD-1 is consistent with recent experiments. The results show that Action-CSA is an efficient and robust computational approach to study the multiple pathways of complex reactions and large-scale conformational changes.
Collapse
Affiliation(s)
- Juyong Lee
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, Maryland 20892, USA
| | - In-Ho Lee
- Center for Materials Genome, Korea Research Institute of Standards and Science, Daejeon 34113, Republic of Korea
- Center for In Silico Protein Science, School of Computational Science, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
| | - InSuk Joung
- Center for In Silico Protein Science, School of Computational Science, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, School of Computational Science, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
| | - Bernard R. Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, Maryland 20892, USA
| |
Collapse
|
23
|
Heo S, Lee J, Joo K, Shin HC, Lee J. Protein Loop Structure Prediction Using Conformational Space Annealing. J Chem Inf Model 2017; 57:1068-1078. [DOI: 10.1021/acs.jcim.6b00742] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Seungryong Heo
- School
of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
| | - Juyong Lee
- Laboratory
of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | | | - Hang-Cheol Shin
- School
of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
| | | |
Collapse
|
24
|
Absolute binding free energy calculations of CBClip host-guest systems in the SAMPL5 blind challenge. J Comput Aided Mol Des 2016; 31:71-85. [PMID: 27677749 DOI: 10.1007/s10822-016-9968-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Accepted: 09/08/2016] [Indexed: 12/11/2022]
Abstract
Herein, we report the absolute binding free energy calculations of CBClip complexes in the SAMPL5 blind challenge. Initial conformations of CBClip complexes were obtained using docking and molecular dynamics simulations. Free energy calculations were performed using thermodynamic integration (TI) with soft-core potentials and Bennett's acceptance ratio (BAR) method based on a serial insertion scheme. We compared the results obtained with TI simulations with soft-core potentials and Hamiltonian replica exchange simulations with the serial insertion method combined with the BAR method. The results show that the difference between the two methods can be mainly attributed to the van der Waals free energies, suggesting that either the simulations used for TI or the simulations used for BAR, or both are not fully converged and the two sets of simulations may have sampled difference phase space regions. The penalty scores of force field parameters of the 10 guest molecules provided by CHARMM Generalized Force Field can be an indicator of the accuracy of binding free energy calculations. Among our submissions, the combination of docking and TI performed best, which yielded the root mean square deviation of 2.94 kcal/mol and an average unsigned error of 3.41 kcal/mol for the ten guest molecules. These values were best overall among all participants. However, our submissions had little correlation with experiments.
Collapse
|
25
|
Lamiable A, Thevenet P, Tufféry P. A critical assessment of hidden markov model sub-optimal sampling strategies applied to the generation of peptide 3D models. J Comput Chem 2016; 37:2006-16. [PMID: 27317417 DOI: 10.1002/jcc.24422] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 05/03/2016] [Accepted: 05/17/2016] [Indexed: 12/23/2022]
Abstract
Hidden Markov Model derived structural alphabets are a probabilistic framework in which the complete conformational space of a peptidic chain is described in terms of probability distributions that can be sampled to identify conformations of largest probabilities. Here, we assess how three strategies to sample sub-optimal conformations-Viterbi k-best, forward backtrack and a taboo sampling approach-can lead to the efficient generation of peptide conformations. We show that the diversity of sampling is essential to compensate biases introduced in the estimates of the probabilities, and we find that only the forward backtrack and a taboo sampling strategies can efficiently generate native or near-native models. Finally, we also find such approaches are as efficient as former protocols, while being one order of magnitude faster, opening the door to the large scale de novo modeling of peptides and mini-proteins. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- A Lamiable
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| | - P Thevenet
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| | - P Tufféry
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| |
Collapse
|
26
|
Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins 2016; 84 Suppl 1:51-66. [PMID: 26677002 DOI: 10.1002/prot.24973] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 12/12/2015] [Indexed: 12/25/2022]
Abstract
We present an assessment of 'template-free modeling' (FM) in CASP11and ROLL. Community-wide server performance suggested the use of automated scores similar to previous CASPs would provide a good system of evaluating performance, even in the absence of comprehensive manual assessment. The CASP11 FM category included several outstanding examples, including successful prediction by the Baker group of a 256-residue target (T0806-D1) that lacked sequence similarity to any existing template. The top server model prediction by Zhang's Quark, which was apparently selected and refined by several manual groups, encompassed the entire fold of target T0837-D1. Methods from the same two groups tended to dominate overall CASP11 FM and ROLL rankings. Comparison of top FM predictions with those from the previous CASP experiment revealed progress in the category, particularly reflected in high prediction accuracy for larger protein domains. FM prediction models for two cases were sufficient to provide functional insights that were otherwise not obtainable by traditional sequence analysis methods. Importantly, CASP11 abstracts revealed that alignment-based contact prediction methods brought about much of the CASP11 progress, producing both of the functionally relevant models as well as several of the other outstanding structure predictions. These methodological advances enabled de novo modeling of much larger domain structures than was previously possible and allowed prediction of functional sites. Proteins 2016; 84(Suppl 1):51-66. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, Texas 75390-9050.
| | - Wenlin Li
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, Texas 75390-9050
| | - Bohdan Monastyrskyy
- Genome Center, University of California, 451 Health Sciences Drive, Davis, California 95616
| | - Andriy Kryshtafovych
- Genome Center, University of California, 451 Health Sciences Drive, Davis, California 95616
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, Texas 75390-9050.,Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, Texas 75390-9050
| |
Collapse
|
27
|
Joo K, Joung I, Cheng Q, Lee SJ, Lee J. Contact-assisted protein structure modeling by global optimization in CASP11. Proteins 2016; 84 Suppl 1:189-99. [DOI: 10.1002/prot.24975] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 11/24/2015] [Accepted: 12/12/2015] [Indexed: 11/09/2022]
Affiliation(s)
- Keehyoung Joo
- Center for in Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- Center for Advanced Computation, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - InSuk Joung
- Center for in Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Qianyi Cheng
- Center for in Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Sung Jong Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- Department of Physics; University of Suwon; Hwaseong-Si Gyeonggi-do 445-743 Korea
| | - Jooyoung Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- Center for Advanced Computation, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences, Korea Institute for Advanced Study; Seoul 130-722 Korea
| |
Collapse
|
28
|
Joung I, Lee SY, Cheng Q, Kim JY, Joo K, Lee SJ, Lee J. Template-free modeling by LEE and LEER in CASP11. Proteins 2015; 84 Suppl 1:118-30. [DOI: 10.1002/prot.24944] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 08/26/2015] [Accepted: 10/11/2015] [Indexed: 12/25/2022]
Affiliation(s)
- InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Sun Young Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Qianyi Cheng
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- Center for Advanced Computation, Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- Department of Physics; University of Suwon; Hwaseong-Si Gyeonggi-Do 445-743 Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study; Seoul 130-722 Korea
- School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
- Center for Advanced Computation, Korea Institute for Advanced Study; Seoul 130-722 Korea
| |
Collapse
|
29
|
Joo K, Joung I, Lee J, Lee J, Lee W, Brooks B, Lee SJ, Lee J. Protein structure determination by conformational space annealing using NMR geometric restraints. Proteins 2015; 83:2251-62. [PMID: 26454251 DOI: 10.1002/prot.24941] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 09/19/2015] [Accepted: 09/22/2015] [Indexed: 11/06/2022]
Abstract
We have carried out numerical experiments to investigate the applicability of the global optimization method of conformational space annealing (CSA) to the enhanced NMR protein structure determination over existing PDB structures. The NMR protein structure determination is driven by the optimization of collective multiple restraints arising from experimental data and the basic stereochemical properties of a protein-like molecule. By rigorous and straightforward application of CSA to the identical NMR experimental data used to generate existing PDB structures, we redetermined 56 recent PDB protein structures starting from fully randomized structures. The quality of CSA-generated structures and existing PDB structures were assessed by multiobjective functions in terms of their consistencies with experimental data and the requirements of protein-like stereochemistry. In 54 out of 56 cases, CSA-generated structures were better than existing PDB structures in the Pareto-dominant manner, while in the remaining two cases, it was a tie with mixed results. As a whole, all structural features tested improved in a statistically meaningful manner. The most improved feature was the Ramachandran favored portion of backbone torsion angles with about 8.6% improvement from 88.9% to 97.5% (P-value <10(-17)). We show that by straightforward application of CSA to the efficient global optimization of an energy function, NMR structures will be of better quality than existing PDB structures.
Collapse
Affiliation(s)
- Keehyoung Joo
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - InSuk Joung
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Jinhyuk Lee
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, 305-806, Korea.,Department of Nanobiotechnology and Bioinformatics, University of Sciences and Technology, Daejeon, 305-350, Korea
| | - Jinwoo Lee
- Department of Mathematics, Kwangwoon University, Nowon-Gu, Seoul, 139-701, Korea
| | - Weontae Lee
- Department of Biochemistry, Yonsei University, Seodaemun-Gu, Seoul, 120-749, Korea
| | - Bernard Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, 20852
| | - Sung Jong Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,Department of Physics, University of Suwon, Hwaseong-Si, Gyeonggi-Do, 445-743, Korea
| | - Jooyoung Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| |
Collapse
|
30
|
Joo K, Joung I, Lee SY, Kim JY, Cheng Q, Manavalan B, Joung JY, Heo S, Lee J, Nam M, Lee IH, Lee SJ, Lee J. Template based protein structure modeling by global optimization in CASP11. Proteins 2015; 84 Suppl 1:221-32. [PMID: 26329522 DOI: 10.1002/prot.24917] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 08/04/2015] [Accepted: 08/21/2015] [Indexed: 11/11/2022]
Abstract
For the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed. It can incorporate sensitive positional and environmental dependence in alignment scores as well as strong nonlinear correlations among various features. Modifications and adjustments were made to the form of the energy function and weight parameters pertaining to the chain building procedure. For the side-chain remodeling step, residue-type dependence was introduced to the cutoff value that determines the entry of a rotamer to the side-chain modeling library. The improved performance of the nns server method is attributed to successful fold recognition achieved by combining several methods including CRFalign and to the current modeling formulation that can incorporate native-like structural aspects present in multiple templates. The LEE protocol is identical to the nns one except that CASP11-released server models are used as templates. The success of LEE in utilizing CASP11 server models indicates that proper template screening and template clustering assisted by appropriate cluster ranking promises a new direction to enhance protein 3D modeling. Proteins 2016; 84(Suppl 1):221-232. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Keehyoung Joo
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - InSuk Joung
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Sun Young Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Jong Yun Kim
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Qianyi Cheng
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Balachandran Manavalan
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Jong Young Joung
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Seungryong Heo
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - Juyong Lee
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, 20852
| | - Mikyung Nam
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea
| | - In-Ho Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,Korea Research Institute of Standards and Science (KRISS), Seoul, 305-600, Korea
| | - Sung Jong Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea.,Department of Physics, University of Suwon, Hwaseong-Si, Gyeonggi-Do, 445-743, Korea
| | - Jooyoung Lee
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul, 130-722, Korea. .,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 130-722, Korea. .,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 130-722, Korea.
| |
Collapse
|
31
|
Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015; 11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling. Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.
Collapse
Affiliation(s)
- Brinda Vallat
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Carlos Madrid-Aliste
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| |
Collapse
|
32
|
Shen Y, Maupetit J, Derreumaux P, Tufféry P. Improved PEP-FOLD Approach for Peptide and Miniprotein Structure Prediction. J Chem Theory Comput 2014; 10:4745-58. [DOI: 10.1021/ct500592m] [Citation(s) in RCA: 416] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Yimin Shen
- INSERM U973, MTi, F-75205 Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, F-75205 Paris, France
| | - Julien Maupetit
- Laboratoire
de Biochimie Théorique, UPR 9080 CNRS, Institut de Biologie Physico-Chimique, F-75005 Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, F-75205 Paris, France
| | - Philippe Derreumaux
- Laboratoire
de Biochimie Théorique, UPR 9080 CNRS, Institut de Biologie Physico-Chimique, F-75005 Paris, France
- Institut Universitaire de France, 103 Boulevard Saint-Michel, 75005, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, F-75205 Paris, France
| | - Pierre Tufféry
- INSERM U973, MTi, F-75205 Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, F-75205 Paris, France
| |
Collapse
|
33
|
Lee J, Gross SP, Lee J. Improved network community structure improves function prediction. Sci Rep 2014; 3:2197. [PMID: 23852097 PMCID: PMC3711050 DOI: 10.1038/srep02197] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 06/24/2013] [Indexed: 12/15/2022] Open
Abstract
We are overwhelmed by experimental data, and need better ways to understand large interaction datasets. While clustering related nodes in such networks—known as community detection—appears a promising approach, detecting such communities is computationally difficult. Further, how to best use such community information has not been determined. Here, within the context of protein function prediction, we address both issues. First, we apply a novel method that generates improved modularity solutions than the current state of the art. Second, we develop a better method to use this community information to predict proteins' functions. We discuss when and why this community information is important. Our results should be useful for two distinct scientific communities: first, those using various cost functions to detect community structure, where our new optimization approach will improve solutions, and second, those working to extract novel functional information about individual nodes from large interaction datasets.
Collapse
Affiliation(s)
- Juyong Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea.
| | | | | |
Collapse
|
34
|
Shen Y, Picord G, Guyon F, Tuffery P. Detecting protein candidate fragments using a structural alphabet profile comparison approach. PLoS One 2013; 8:e80493. [PMID: 24303019 PMCID: PMC3841190 DOI: 10.1371/journal.pone.0080493] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 10/03/2013] [Indexed: 01/28/2023] Open
Abstract
Predicting accurate fragments from sequence has recently become a critical step for protein structure modeling, as protein fragment assembly techniques are presently among the most efficient approaches for de novo prediction. A key step in these approaches is, given the sequence of a protein to model, the identification of relevant fragments - candidate fragments - from a collection of the available 3D structures. These fragments can then be assembled to produce a model of the complete structure of the protein of interest. The search for candidate fragments is classically achieved by considering local sequence similarity using profile comparison, or threading approaches. In the present study, we introduce a new profile comparison approach that, instead of using amino acid profiles, is based on the use of predicted structural alphabet profiles, where structural alphabet profiles contain information related to the 3D local shapes associated with the sequences. We show that structural alphabet profile-profile comparison can be used efficiently to retrieve accurate structural fragments, and we introduce a fully new protocol for the detection of candidate fragments. It identifies fragments specific of each position of the sequence and of size varying between 6 and 27 amino-acids. We find it outperforms present state of the art approaches in terms (i) of the accuracy of the fragments identified, (ii) the rate of true positives identified, while having a high coverage score. We illustrate the relevance of the approach on complete target sets of the two previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) rounds 9 and 10. A web server for the approach is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/SAFrag.
Collapse
Affiliation(s)
- Yimin Shen
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Géraldine Picord
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Frédéric Guyon
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Pierre Tuffery
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
- RPBS, Paris, France
- * E-mail:
| |
Collapse
|
35
|
Joo K, Lee J, Sim S, Lee SY, Lee K, Heo S, Lee IH, Lee SJ, Lee J. Protein structure modeling for CASP10 by multiple layers of global optimization. Proteins 2013; 82 Suppl 2:188-95. [PMID: 23966235 DOI: 10.1002/prot.24397] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 07/09/2013] [Accepted: 08/09/2013] [Indexed: 11/06/2022]
Abstract
In the template-based modeling (TBM) category of CASP10 experiment, we introduced a new protocol called protein modeling system (PMS) to generate accurate protein structures in terms of side-chains as well as backbone trace. In the new protocol, a global optimization algorithm, called conformational space annealing (CSA), is applied to the three layers of TBM procedure: multiple sequence-structure alignment, 3D chain building, and side-chain re-modeling. For 3D chain building, we developed a new energy function which includes new distance restraint terms of Lorentzian type (derived from multiple templates), and new energy terms that combine (physical) energy terms such as dynamic fragment assembly (DFA) energy, DFIRE statistical potential energy, hydrogen bonding term, etc. These physical energy terms are expected to guide the structure modeling especially for loop regions where no template structures are available. In addition, we developed a new quality assessment method based on random forest machine learning algorithm to screen templates, multiple alignments, and final models. For TBM targets of CASP10, we find that, due to the combination of three stages of CSA global optimizations and quality assessment, the modeling accuracy of PMS improves at each additional stage of the protocol. It is especially noteworthy that the side-chains of the final PMS models are far more accurate than the models in the intermediate steps.
Collapse
Affiliation(s)
- Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Dongdaemun-gu, Seoul, 130-722, Korea; Center for Advanced Computation, Korea Institute for Advanced Study, Dongdaemun-gu, Seoul, 130-722, Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013; 82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]
Abstract
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.
Collapse
Affiliation(s)
- Jan B Valentin
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Wang H, He Z, Zhang C, Zhang L, Xu D. Transmembrane protein alignment and fold recognition based on predicted topology. PLoS One 2013; 8:e69744. [PMID: 23894534 PMCID: PMC3716705 DOI: 10.1371/journal.pone.0069744] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 06/15/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Although Transmembrane Proteins (TMPs) are highly important in various biological processes and pharmaceutical developments, general prediction of TMP structures is still far from satisfactory. Because TMPs have significantly different physicochemical properties from soluble proteins, current protein structure prediction tools for soluble proteins may not work well for TMPs. With the increasing number of experimental TMP structures available, template-based methods have the potential to become broadly applicable for TMP structure prediction. However, the current fold recognition methods for TMPs are not as well developed as they are for soluble proteins. METHODOLOGY We developed a novel TMP Fold Recognition method, TMFR, to recognize TMP folds based on sequence-to-structure pairwise alignment. The method utilizes topology-based features in alignment together with sequence profile and solvent accessibility. It also incorporates a gap penalty that depends on predicted topology structure segments. Given the difference between α-helical transmembrane protein (αTMP) and β-strands transmembrane protein (βTMP), parameters of scoring functions are trained respectively for these two protein categories using 58 αTMPs and 17 βTMPs in a non-redundant training dataset. RESULTS We compared our method with HHalign, a leading alignment tool using a non-redundant testing dataset including 72 αTMPs and 30 βTMPs. Our method achieved 10% and 9% better accuracies than HHalign in αTMPs and βTMPs, respectively. The raw score generated by TMFR is negatively correlated with the structure similarity between the target and the template, which indicates its effectiveness for fold recognition. The result demonstrates TMFR provides an effective TMP-specific fold recognition and alignment method.
Collapse
Affiliation(s)
- Han Wang
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, People’s Republic of China
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| | - Zhiquan He
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| | - Chao Zhang
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, People’s Republic of China
| | - Dong Xu
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, United States of America
| |
Collapse
|
38
|
Lee J, Lee J. Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction. PLoS One 2013; 8:e60372. [PMID: 23577106 PMCID: PMC3618231 DOI: 10.1371/journal.pone.0060372] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 02/25/2013] [Indexed: 11/18/2022] Open
Abstract
The task of extracting the maximal amount of information from a biological network has drawn much attention from researchers, for example, predicting the function of a protein from a protein-protein interaction (PPI) network. It is well known that biological networks consist of modules/communities, a set of nodes that are more densely inter-connected among themselves than with the rest of the network. However, practical applications of utilizing the community information have been rather limited. For protein function prediction on a network, it has been shown that none of the existing community-based protein function prediction methods outperform a simple neighbor-based method. Recently, we have shown that proper utilization of a highly optimal modularity community structure for protein function prediction can outperform neighbor-assisted methods. In this study, we propose two function prediction approaches on bipartite networks that consider the community structure information as well as the neighbor information from the network: 1) a simple screening method and 2) a random forest based method. We demonstrate that our community-assisted methods outperform neighbor-assisted methods and the random forest method yields the best performance. In addition, we show that using the optimal community structure information is essential for more accurate function prediction for the protein-complex bipartite network of Saccharomyces cerevisiae. Community detection can be carried out either using a modified modularity for dealing with the original bipartite network or first projecting the network into a single-mode network (i.e., PPI network) and then applying community detection to the reduced network. We find that the projection leads to the loss of information in a significant way. Since our prediction methods rely only on the network topology, they can be applied to various fields where an efficient network-based analysis is required.
Collapse
Affiliation(s)
- Juyong Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
39
|
Caudron B, Jestin J. Sequence criteria for the anti-parallel character of protein beta-strands. J Theor Biol 2012; 315:146-9. [DOI: 10.1016/j.jtbi.2012.09.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Revised: 09/10/2012] [Accepted: 09/12/2012] [Indexed: 12/17/2022]
|
40
|
Lee J, Gross SP, Lee J. Modularity optimization by conformational space annealing. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:056702. [PMID: 23004898 DOI: 10.1103/physreve.85.056702] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Indexed: 06/01/2023]
Abstract
We propose a modularity optimization method, Mod-CSA, based on stochastic global optimization algorithm, conformational space annealing (CSA). Our method outperforms simulated annealing in terms of both efficiency and accuracy, finding higher modularity partitions with less computational resources required. The high modularity values found by our method are higher than, or equal to, the largest values previously reported. In addition, the method can be combined with other heuristic methods, and implemented in parallel fashion, allowing it to be applicable to large graphs with more than 10,000 nodes.
Collapse
Affiliation(s)
- Juyong Lee
- School of Computational Sciences, Korea Institute of Advanced Study, Seoul, Korea.
| | | | | |
Collapse
|
41
|
Fan H, Periole X, Mark AE. Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: application in the refinement of de novo models. Proteins 2012; 80:1744-54. [PMID: 22411697 DOI: 10.1002/prot.24068] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Revised: 02/11/2012] [Accepted: 03/03/2012] [Indexed: 12/25/2022]
Abstract
The efficiency of using a variant of Hamiltonian replica-exchange molecular dynamics (Chaperone H-replica-exchange molecular dynamics [CH-REMD]) for the refinement of protein structural models generated de novo is investigated. In CH-REMD, the interaction between the protein and its environment, specifically, the electrostatic interaction between the protein and the solvating water, is varied leading to cycles of partial unfolding and refolding mimicking some aspects of folding chaperones. In 10 of the 15 cases examined, the CH-REMD approach sampled structures in which the root-mean-square deviation (RMSD) of secondary structure elements (SSE-RMSD) with respect to the experimental structure was more than 1.0 Å lower than the initial de novo model. In 14 of the 15 cases, the improvement was more than 0.5 Å. The ability of three different statistical potentials to identify near-native conformations was also examined. Little correlation between the SSE-RMSD of the sampled structures with respect to the experimental structure and any of the scoring functions tested was found. The most effective scoring function tested was the DFIRE potential. Using the DFIRE potential, the SSE-RMSD of the best scoring structures was on average 0.3 Å lower than the initial model. Overall the work demonstrates that targeted enhanced-sampling techniques such as CH-REMD can lead to the systematic refinement of protein structural models generated de novo but that improved potentials for the identification of near-native structures are still needed.
Collapse
Affiliation(s)
- Hao Fan
- Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158-2330, USA
| | | | | |
Collapse
|
42
|
Cetin H, Sasaki TN, Sasai M. The Fragment-based Consistency Score in Model Quality Assessment for De Novo Prediction of Protein Structures. CHEM-BIO INFORMATICS JOURNAL 2011. [DOI: 10.1273/cbij.11.63] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Hikmet Cetin
- Department of Computational Science and Engineering, Nagoya University
| | | | - Masaki Sasai
- Department of Computational Science and Engineering, Nagoya University
- School of Computational Sciences, Korea Institute for Advanced Study
- Okazaki Institute for Integrative Bioscience
| |
Collapse
|