1
|
Lee SJ, Joo K, Sim S, Lee J, Lee IH, Lee J. CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27123711. [PMID: 35744836 PMCID: PMC9231382 DOI: 10.3390/molecules27123711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022]
Abstract
Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
Collapse
Affiliation(s)
- Sung Jong Lee
- Basic Science Institute, Changwon National University, Changwon 51140, Korea;
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea;
| | | | - Juyong Lee
- Department of Chemistry, Kangwon National University, Chuncheon 24341, Korea;
| | - In-Ho Lee
- Korea Research Institute of Standards and Science (KRISS), Daejeon 34113, Korea;
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
- Correspondence:
| |
Collapse
|
2
|
Lubecka EA, Liwo A. ESCASA: Analytical estimation of atomic coordinates from coarse-grained geometry for nuclear-magnetic-resonance-assisted protein structure modeling. I. Backbone and H β protons. J Comput Chem 2021; 42:1579-1589. [PMID: 34048074 DOI: 10.1002/jcc.26695] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/06/2021] [Accepted: 05/11/2021] [Indexed: 12/13/2022]
Abstract
A method for the estimation of coordinates of atoms in proteins from coarse-grained geometry by simple analytical formulas (ESCASA), for use in nuclear-magnetic-resonance (NMR) data-assisted coarse-grained simulations of proteins is proposed. In this paper, the formulas for the backbone Hα and amide (HN ) protons, and the side-chain Hβ protons, given the Cα -trace, have been derived and parameterized, by using the interproton distances calculated from a set of 140 high-resolution non-homologous protein structures. The mean standard deviation over all types of proton pairs in the set was 0.44 Å after fitting. Validation against a set of 41 proteins with NMR-determined structures, which were not considered in parameterization, resulted in average standard deviation from average proton-proton distances of the NMR-determined structures of 0.25 Å, compared to 0.21 Å obtained with the PULCHRA all-atom-chain reconstruction algorithm and to the 0.12 Å standard deviation of the average-structure proton-proton distance of NMR-determined ensembles. The formulas provide analytical forces and can, therefore, be used in coarse-grained molecular dynamics.
Collapse
Affiliation(s)
- Emilia A Lubecka
- Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Gdańsk, Poland
| |
Collapse
|
3
|
Karczyńska AS, Ziȩba K, Uciechowska U, Mozolewska MA, Krupa P, Lubecka EA, Lipska AG, Sikorska C, Samsonov SA, Sieradzan AK, Giełdoń A, Liwo A, Ślusarz R, Ślusarz M, Lee J, Joo K, Czaplewski C. Improved Consensus-Fragment Selection in Template-Assisted Prediction of Protein Structures with the UNRES Force Field in CASP13. J Chem Inf Model 2020; 60:1844-1864. [PMID: 31999919 PMCID: PMC7588044 DOI: 10.1021/acs.jcim.9b00864] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The method for protein-structure
prediction, which combines the
physics-based coarse-grained UNRES force field with knowledge-based
modeling, has been developed further and tested in the 13th Community
Wide Experiment on the Critical Assessment of Techniques for Protein
Structure Prediction (CASP13). The method implements restraints from
the consensus fragments common to server models. In this work, the
server models to derive fragments have been chosen on the basis of
quality assessment; a fully automatic fragment-selection procedure
has been introduced, and Dynamic Fragment Assembly pseudopotentials
have been fully implemented. The Global Distance Test Score (GDT_TS),
averaged over our “Model 1” predictions, increased by
over 10 units with respect to CASP12 for the free-modeling category
to reach 40.82. Our “Model 1” predictions ranked 20
and 14 for all and free-modeling targets, respectively (upper 20.2%
and 14.3% of all models submitted to CASP13 in these categories, respectively),
compared to 27 (upper 21.1%) and 24 (upper 18.9%) in CASP12, respectively.
For oligomeric targets, the Interface Patch Similarity (IPS) and Interface
Contact Similarity (ICS) averaged over our best oligomer models increased
from 0.28 to 0.36 and from 12.4 to 17.8, respectively, from CASP12
to CASP13, and top-ranking models of 2 targets (H0968 and T0997o)
were obtained (none in CASP12). The improvement of our method in CASP13
over CASP12 was ascribed to the combined effect of the overall enhancement
of server-model quality, our success in selecting server models and
fragments to derive restraints, and improvements of the restraint
and potential-energy functions.
Collapse
Affiliation(s)
| | - Karolina Ziȩba
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Urszula Uciechowska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena A Mozolewska
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw PL-02668, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw PL-02668, Poland
| | - Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Sergey A Samsonov
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
4
|
Revisiting the "satisfaction of spatial restraints" approach of MODELLER for protein homology modeling. PLoS Comput Biol 2019; 15:e1007219. [PMID: 31846452 PMCID: PMC6938380 DOI: 10.1371/journal.pcbi.1007219] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 12/31/2019] [Accepted: 11/13/2019] [Indexed: 01/02/2023] Open
Abstract
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the “modeling by satisfaction of spatial restraints” strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program’s predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER’s objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance. Proteins are fundamental biological molecules that carry out countless activities in living beings. Since the function of proteins is dictated by their three-dimensional atomic structures, acquiring structural details of proteins provides deep insights into their function. Currently, the most frequently used computational approach for protein structure prediction is template-based modeling. In this approach, a target protein is modeled using the experimentally-derived structural information of a template protein assumed to have a similar structure to the target. MODELLER is the most frequently used program for template-based 3D model building. Despite its success, its predictions are not always accurate enough to be useful in Biomedical Research. Here, we show that it is possible to greatly increase the performance of MODELLER by modifying two aspects of its algorithm. First, we demonstrate that providing the program with accurate estimations of local target-template structural divergence greatly increases the quality of its predictions. Additionally, we show that modifying MODELLER’s scoring function with statistical potential energetic terms also helps to improve modeling quality. This work will be useful in future research, since it reports practical strategies to improve the performance of this core tool in Structural Bioinformatics.
Collapse
|
5
|
Kim Y, You HJ, Park SH, Kim MS, Chae H, Park J, Jekarl DW, Kim J, Kwon A, Choi H, Kim Y, Paek AR, Lee A, Kim JM, Park SY, Kim Y, Joo K, Jung J, Chung SH, Mok JW, Kim M. A Mutation in ZNF143 as a Novel Candidate Gene for Endothelial Corneal Dystrophy. J Clin Med 2019; 8:jcm8081174. [PMID: 31390831 PMCID: PMC6723187 DOI: 10.3390/jcm8081174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 08/02/2019] [Accepted: 08/04/2019] [Indexed: 02/06/2023] Open
Abstract
Corneal dystrophies (CDs) are a diverse group of inherited disorders with a heterogeneous genetic background. Here, we report the identification of a novel ZNF143 heterozygous missense mutation in three individuals of the same family with clinical and pathological features that are consistent with endothelial CD. Ophthalmologic examination revealed diffuse corneal clouding and edema with decreased endothelial cell density. Pathological findings showed increased corneal thickness due to edema of basal epithelial cells and stroma, and abnormal metaplastic endothelium with stratified epithelium-like changes. Patients’ metaplastic corneal endothelial cells expressed predominantly cytokerain 7, cytokeratin 19, and E-cadherin. Although Sanger sequencing did not detect any mutation associated with endothelial CDs, whole exome sequencing identified the ZNF143 c.937G>C p.(Asp313His) mutation as a candidate gene for our patients’ endothelial CD. In-vitro functional studies demonstrated that mutant ZNF143 promoted the mesenchymal-to-epithelial transition; it upregulated the expression of genes associated with epithelialization in human corneal endothelial cells. Additionally, proinflammatory cytokine responsive genes were significantly enriched after mutant ZNF143 transfection, which may contribute to the severe phenotype of the three patients. These findings link a mutation in ZNF143 with endothelial CD for the first time.
Collapse
Affiliation(s)
- Yonggoo Kim
- Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Hye Jin You
- Cancer Cell and Molecular Biology Branch, Division of Cancer Biology, National Cancer Center, Gyeonggi-do 10408, Korea
| | - Shin Hae Park
- Department of Ophthalmology and Visual Science, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Man Soo Kim
- Department of Ophthalmology and Visual Science, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Hyojin Chae
- Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Joonhong Park
- Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Dong Wook Jekarl
- Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Jiyeon Kim
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Ahlm Kwon
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Hayoung Choi
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Yeojae Kim
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - A Rome Paek
- Cancer Cell and Molecular Biology Branch, Division of Cancer Biology, National Cancer Center, Gyeonggi-do 10408, Korea
| | - Ahwon Lee
- Department of Hospital Pathology, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | | | - Seon Young Park
- Department of Life Systems, Sookmyung Women's University, Seoul 04312, Korea
| | - Yonghwan Kim
- Department of Life Systems, Sookmyung Women's University, Seoul 04312, Korea
| | - Keehyoung Joo
- Center for in Silico Protein Science, Korea Institute for Advanced Study, Seoul 02455, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | | | - So-Hyang Chung
- Department of Ophthalmology and Visual Science, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Catholic Institutes of Visual Science, The Catholic University of Korea, Seoul 06591, Korea
| | - Jee Won Mok
- Catholic Institutes of Visual Science, The Catholic University of Korea, Seoul 06591, Korea
| | - Myungshin Kim
- Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
| |
Collapse
|
6
|
Sieradzan AK, Bogunia M, Mech P, Ganzynkowicz R, Giełdoń A, Liwo A, Makowski M. Introduction of Phosphorylated Residues into the UNRES Coarse-Grained Model: Toward Modeling of Signaling Processes. J Phys Chem B 2019; 123:5721-5729. [PMID: 31194908 DOI: 10.1021/acs.jpcb.9b03799] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Phosphorylated proteins take part in many signaling pathways and play a key role in homeostasis regulation. The all-atom force fields enable us to study the systems containing phosphorylated proteins, but they are limited to short time scales. In this paper, we report the extension of the physics-based coarse-grained UNRES force field to treat systems with phosphorylated amino-acid residues. To derive the respective potentials, appropriate physics-based analytical expressions were fitted to the potentials of mean force of systems modeling phosphorylated amino-acid residues computed in our previous work and implemented in UNRES. The extended UNRES performed well in ab initio simulations of two miniproteins containing phosphorylated residues, strongly suggesting that realistic large-scale simulations of processes involving phosphorylated proteins, especially signaling processes, are now possible.
Collapse
Affiliation(s)
- Adam K Sieradzan
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Małgorzata Bogunia
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Paulina Mech
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Robert Ganzynkowicz
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Artur Giełdoń
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Adam Liwo
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| | - Mariusz Makowski
- Faculty of Chemistry , University of Gdańsk , ul. Wita Stwosza 63 , 80-308 Gdańsk , Poland
| |
Collapse
|
7
|
Use of the UNRES force field in template-assisted prediction of protein structures and the refinement of server models: Test with CASP12 targets. J Mol Graph Model 2018; 83:92-99. [DOI: 10.1016/j.jmgm.2018.05.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Revised: 05/18/2018] [Accepted: 05/20/2018] [Indexed: 11/22/2022]
|
8
|
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 2018; 33:2496-2503. [PMID: 28419290 DOI: 10.1093/bioinformatics/btx222] [Citation(s) in RCA: 130] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/12/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation The accurate ranking of predicted structural models and selecting the best model from a given candidate pool remain as open problems in the field of structural bioinformatics. The quality assessment (QA) methods used to address these problems can be grouped into two categories: consensus methods and single-model methods. Consensus methods in general perform better and attain higher correlation between predicted and true quality measures. However, these methods frequently fail to generate proper quality scores for native-like structures which are distinct from the rest of the pool. Conversely, single-model methods do not suffer from this drawback and are better suited for real-life applications where many models from various sources may not be readily available. Results In this study, we developed a support-vector-machine-based single-model global quality assessment (SVMQA) method. For a given protein model, the SVMQA method predicts TM-score and GDT_TS score based on a feature vector containing statistical potential energy terms and consistency-based terms between the actual structural features (extracted from the three-dimensional coordinates) and predicted values (from primary sequence). We trained SVMQA using CASP8, CASP9 and CASP10 targets and determined the machine parameters by 10-fold cross-validation. We evaluated the performance of our SVMQA method on various benchmarking datasets. Results show that SVMQA outperformed the existing best single-model QA methods both in ranking provided protein models and in selecting the best model from the pool. According to the CASP12 assessment, SVMQA was the best method in selecting good-quality models from decoys in terms of GDTloss. Availability and implementation SVMQA method can be freely downloaded from http://lee.kias.re.kr/SVMQA/SVMQA_eval.tar.gz. Contact jlee@kias.re.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| |
Collapse
|
9
|
Heo L, Feig M. PREFMD: a web server for protein structure refinement via molecular dynamics simulations. Bioinformatics 2018; 34:1063-1065. [PMID: 29126101 PMCID: PMC5860225 DOI: 10.1093/bioinformatics/btx726] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/04/2017] [Accepted: 11/07/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Refinement of protein structure models is a long-standing problem in structural bioinformatics. Molecular dynamics-based methods have emerged as an avenue to achieve consistent refinement. The PREFMD web server implements an optimized protocol based on the method successfully tested in CASP11. Validation with recent CASP refinement targets shows consistent and more significant improvement in global structure accuracy over other state-of-the-art servers. Availability and implementation PREFMD is freely available as a web server at http://feiglab.org/prefmd. Scripts for running PREFMD as a stand-alone package are available at https://github.com/feiglab/prefmd.git. Contact feig@msu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lim Heo
- Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Michael Feig
- Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
10
|
Kundrotas PJ, Anishchenko I, Badal VD, Das M, Dauzhenka T, Vakser IA. Modeling CAPRI targets 110-120 by template-based and free docking using contact potential and combined scoring function. Proteins 2018; 86 Suppl 1:302-310. [PMID: 28905425 PMCID: PMC5820180 DOI: 10.1002/prot.25380] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 08/25/2017] [Accepted: 09/10/2017] [Indexed: 01/12/2023]
Abstract
The paper presents analysis of our template-based and free docking predictions in the joint CASP12/CAPRI37 round. A new scoring function for template-based docking was developed, benchmarked on the Dockground resource, and applied to the targets. The results showed that the function successfully discriminates the incorrect docking predictions. In correctly predicted targets, the scoring function was complemented by other considerations, such as consistency of the oligomeric states among templates, similarity of the biological functions, biological interface relevance, etc. The scoring function still does not distinguish well biological from crystal packing interfaces, and needs further development for the docking of bundles of α-helices. In the case of the trimeric targets, sequence-based methods did not find common templates, despite similarity of the structures, suggesting complementary use of structure- and sequence-based alignments in comparative docking. The results showed that if a good docking template is found, an accurate model of the interface can be built even from largely inaccurate models of individual subunits. Free docking however is very sensitive to the quality of the individual models. However, our newly developed contact potential detected approximate locations of the binding sites.
Collapse
Affiliation(s)
- Petras J. Kundrotas
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66045, USA
| | | | - Varsha D. Badal
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66045, USA
| | - Madhurima Das
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66045, USA
| | - Taras Dauzhenka
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66045, USA
| | - Ilya A. Vakser
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66045, USA
| |
Collapse
|
11
|
Liu T, Ish‐Shalom S, Torng W, Lafita A, Bock C, Mort M, Cooper DN, Bliven S, Capitani G, Mooney SD, Altman RB. Biological and functional relevance of CASP predictions. Proteins 2018; 86 Suppl 1:374-386. [PMID: 28975675 PMCID: PMC5820171 DOI: 10.1002/prot.25396] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Revised: 09/12/2017] [Accepted: 10/03/2017] [Indexed: 02/06/2023]
Abstract
Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of BioengineeringStanford UniversityStanfordCalifornia
| | - Shirbi Ish‐Shalom
- Biomedical Informatics Training Program, Stanford UniversityStanfordCalifornia
| | - Wen Torng
- Department of BioengineeringStanford UniversityStanfordCalifornia
| | - Aleix Lafita
- Laboratory of Biomolecular ResearchPaul Scherrer InstituteVilligenSwitzerland
- Department of Biosystems Science and EngineeringETH Zurich4058BaselSwitzerland
| | - Christian Bock
- Department of Biomedical Informatics and Medical EducationUniversity of WashingtonSeattleWashington
- Heidelberg UniversityHeidelbergGermany
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff UniversityUnited Kingdom
| | - David N Cooper
- Institute of Medical Genetics, Cardiff UniversityUnited Kingdom
| | - Spencer Bliven
- Laboratory of Biomolecular ResearchPaul Scherrer InstituteVilligenSwitzerland
- National Center for Biotechnology Information, National Library of MedicineNational Institutes of HealthBethesdaMaryland
| | - Guido Capitani
- Laboratory of Biomolecular ResearchPaul Scherrer InstituteVilligenSwitzerland
- Department of BiologyETH ZurichZurichSwitzerland
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical EducationUniversity of WashingtonSeattleWashington
| | - Russ B. Altman
- Department of BioengineeringStanford UniversityStanfordCalifornia
| |
Collapse
|
12
|
Joo K, Heo S, Joung I, Hong SH, Lee SJ, Lee J. Data-assisted protein structure modeling by global optimization in CASP12. Proteins 2018; 86 Suppl 1:240-246. [PMID: 29341255 DOI: 10.1002/prot.25457] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 12/29/2017] [Accepted: 01/08/2018] [Indexed: 12/26/2022]
Abstract
In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling.
Collapse
Affiliation(s)
- Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Seungryong Heo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Seung Hwan Hong
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,The Research Institute for Basic Sciences, Changwon National University, Changwon-Si, Gyeongsangnam-do, South Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, 02455, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, 02455, South Korea
| |
Collapse
|
13
|
Assessment of data-assisted prediction by inclusion of crosslinking/mass-spectrometry and small angle X-ray scattering data in the 12thCritical Assessment of protein Structure Prediction experiment. Proteins 2017; 86 Suppl 1:215-227. [DOI: 10.1002/prot.25442] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 11/16/2017] [Accepted: 12/10/2017] [Indexed: 12/26/2022]
|
14
|
Hong SH, Joung I, Flores-Canales JC, Manavalan B, Cheng Q, Heo S, Kim JY, Lee SY, Nam M, Joo K, Lee IH, Lee SJ, Lee J. Protein structure modeling and refinement by global optimization in CASP12. Proteins 2017; 86 Suppl 1:122-135. [PMID: 29159837 DOI: 10.1002/prot.25426] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 11/10/2017] [Accepted: 11/16/2017] [Indexed: 11/09/2022]
Abstract
For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded.
Collapse
Affiliation(s)
- Seung Hwan Hong
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jose C Flores-Canales
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Balachandran Manavalan
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Qianyi Cheng
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea
| | - Seungryong Heo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Sun Young Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Mikyung Nam
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea
| | - Keehyoung Joo
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| | - In-Ho Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,Korea Research Institute of Standards and Science (KRISS), Daejeon, South Korea
| | - Sung Jong Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,The Research Institute for Basic Sciences, Changwon National University, Changwon-Si, Gyeongsangnam-do, South Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, South Korea.,School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea.,Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, South Korea
| |
Collapse
|
15
|
Karczyńska AS, Mozolewska MA, Krupa P, Giełdoń A, Liwo A, Czaplewski C. Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and knowledge-based information. Proteins 2017; 86 Suppl 1:228-239. [PMID: 29134679 DOI: 10.1002/prot.25421] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 11/09/2017] [Accepted: 11/10/2017] [Indexed: 11/09/2022]
Abstract
A new approach to assisted protein-structure prediction has been proposed, which is based on running multiplexed replica exchange molecular dynamics simulations with the coarse-grained UNRES force field with restraints derived from knowledge-based models and distance distribution from small angle X-ray scattering (SAXS) measurements. The latter restraints are incorporated into the target function as a maximum-likelihood term that guides the shape of the simulated structures towards that defined by SAXS. The approach was first verified with the 1KOY protein, for which the distance distribution was calculated from the experimental structure, and subsequently used to predict the structures of 11 data-assisted targets in the CASP12 experiment. Major improvement of the GDT_TS was obtained for 2 targets, minor improvement for other 2 while, for 6 target GDT_TS deteriorated compared with that calculated for predictions without the SAXS data, partly because of assuming a wrong multimeric state (for Ts866) or because the crystal conformation was more compact than the solution conformation (for Ts942). Particularly good results were obtained for Ts909, in which use of SAXS data resulted in the selection of a correctly packed trimer and, subsequently, increased the GDT_TS of monomer prediction. It was found that running simulations with correct oligomeric state is essential for the success in SAXS-data-assisted prediction.
Collapse
Affiliation(s)
| | - Magdalena A Mozolewska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk, 80-308, Poland.,Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw, 01-248, Poland
| | - Paweł Krupa
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk, 80-308, Poland.,Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw, PL-02668, Poland
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk, 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk, 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul, 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk, 80-308, Poland
| |
Collapse
|
16
|
Inverse Resolution Limit of Partition Density and Detecting Overlapping Communities by Link-Surprise. Sci Rep 2017; 7:12399. [PMID: 28963540 PMCID: PMC5622083 DOI: 10.1038/s41598-017-12432-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 09/04/2017] [Indexed: 11/16/2022] Open
Abstract
Finding overlapping communities of complex networks remains a challenge in network science. To address this challenge, one of the widely used approaches is finding the communities of links by optimizing the objective function, partition density. In this study, we show that partition density suffers from inverse resolution limit; it has a strong preference to triangles. This resolution limit makes partition density an improper objective function for global optimization. The conditions where partition density prefers triangles to larger link community structures are analytically derived and confirmed with global optimization calculations using synthetic and real-world networks. To overcome this limitation of partition density, we suggest an alternative measure, Link Surprise, to find link communities, which is suitable for global optimization. Benchmark studies demonstrate that global optimization of Link Surprise yields meaningful and more accurate link community structures than partition density optimization.
Collapse
|
17
|
Karczyńska AS, Czaplewski C, Krupa P, Mozolewska MA, Joo K, Lee J, Liwo A. Ergodicity and model quality in template-restrained canonical and temperature/Hamiltonian replica exchange coarse-grained molecular dynamics simulations of proteins. J Comput Chem 2017; 38:2730-2746. [DOI: 10.1002/jcc.25070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 07/10/2017] [Accepted: 09/01/2017] [Indexed: 01/22/2023]
Affiliation(s)
- Agnieszka S. Karczyńska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
| | - Paweł Krupa
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46; Warsaw PL 02668 Poland
| | - Magdalena A. Mozolewska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5; Warsaw 01-248 Poland
| | - Keehyoung Joo
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Adam Liwo
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| |
Collapse
|
18
|
Cheng Q, Joung I, Lee J. A Simple and Efficient Protein Structure Refinement Method. J Chem Theory Comput 2017; 13:5146-5162. [PMID: 28800396 DOI: 10.1021/acs.jctc.7b00470] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Improving the quality of a given protein structure can serve as the ultimate solution for accurate protein structure prediction, and seeking such a method is currently a challenge in computational structural biology. In order to promote and encourage much needed such efforts, CASP (Critical Assessment of Structure Prediction) has been providing an ideal computational experimental platform, where it was reported only recently (since CASP10) that systematic protein structure refinement is possible by carrying out extensive (approximately millisecond) MD simulations with proper restraints generated from the given structure. Using an explicit solvent model and much reduced positional and distance restraints than previously exercised, we propose a refinement protocol that combines a series of short (5 ns) MD simulations with energy minimization procedures. Testing and benchmarking on 54 CASP8-10 refinement targets and 34 CASP11 refinement targets shows quite promising results. Using only a small fraction of MD simulation steps (nanosecond versus millisecond), systematic protein structure refinement was demonstrated in this work, indicating that refinement of a given model can be achieved using a few hours of desktop computing.
Collapse
Affiliation(s)
- Qianyi Cheng
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study , Seoul 02455, Korea
| | - InSuk Joung
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study , Seoul 02455, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study , Seoul 02455, Korea
| |
Collapse
|
19
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
20
|
Kwak MJ, Kim JD, Kim H, Kim C, Bowman JW, Kim S, Joo K, Lee J, Jin KS, Kim YG, Lee NK, Jung JU, Oh BH. Architecture of the type IV coupling protein complex of Legionella pneumophila. Nat Microbiol 2017; 2:17114. [PMID: 28714967 PMCID: PMC6497169 DOI: 10.1038/nmicrobiol.2017.114] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 06/14/2017] [Indexed: 12/14/2022]
Abstract
Many bacteria, including Legionella pneumophila, rely on the type IV secretion system to translocate a repertoire of effector proteins into the hosts for their survival and growth. Type IV coupling protein (T4CP) is a hexameric ATPase that links translocating substrates to the transenvelope secretion conduit. Yet, how a large number of effector proteins are selectively recruited and processed by T4CPs remains enigmatic. DotL, the T4CP of L. pneumophila, contains an ATPase domain and a C-terminal extension whose function is unknown. Unlike T4CPs involved in plasmid DNA translocation, DotL appeared to function by forming a multiprotein complex with four other proteins. Here, we show that the C-terminal extension of DotL interacts with DotN, IcmS, IcmW and an additionally identified subunit LvgA, and that this pentameric assembly binds Legionella effector proteins. We determined the crystal structure of this assembly and built an architecture of the T4CP holocomplex by combining a homology model of the ATPase domain of DotL. The holocomplex is a hexamer of a bipartite structure composed of a membrane-proximal ATPase domain and a membrane-distal substrate-recognition assembly. The presented information demonstrates the architecture and functional dissection of the multiprotein T4CP complexes and provides important insights into their substrate recruitment and processing.
Collapse
Affiliation(s)
- Mi-Jeong Kwak
- Department of Biological Sciences, KAIST Institute for the Biocentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - J. Dongun Kim
- Department of Biological Sciences, KAIST Institute for the Biocentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Hyunmin Kim
- Department of Biological Sciences, KAIST Institute for the Biocentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Cheolhee Kim
- Department of Physics, Pohang University of Science and Technology, Pohang, Kyungbuk 37673, Korea
| | - James W. Bowman
- Department of Molecular Microbiology and Immunology, Keck School of Medicine, University of Southern California, 1975 Zonal Avenue, Los Angeles, California 90033, USA
| | - Seonghoon Kim
- Department of Biological Sciences, KAIST Institute for the Biocentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jooyoung Lee
- Center for Advanced Computation, School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Kyeong Sik Jin
- Pohang Accelerator Laboratory, Pohang University of Science and Technology, Pohang, Kyungbuk 37673, Korea
| | - Yeon-Gil Kim
- Pohang Accelerator Laboratory, Pohang University of Science and Technology, Pohang, Kyungbuk 37673, Korea
| | - Nam Ki Lee
- Department of Physics, Pohang University of Science and Technology, Pohang, Kyungbuk 37673, Korea
| | - Jae U. Jung
- Department of Molecular Microbiology and Immunology, Keck School of Medicine, University of Southern California, 1975 Zonal Avenue, Los Angeles, California 90033, USA
| | - Byung-Ha Oh
- Department of Biological Sciences, KAIST Institute for the Biocentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| |
Collapse
|
21
|
Mozolewska MA, Krupa P, Zaborowski B, Liwo A, Lee J, Joo K, Czaplewski C. Use of Restraints from Consensus Fragments of Multiple Server Models To Enhance Protein-Structure Prediction Capability of the UNRES Force Field. J Chem Inf Model 2016; 56:2263-2279. [DOI: 10.1021/acs.jcim.6b00189] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | - Paweł Krupa
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | | | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Jooyoung Lee
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center
for Advanced Computation, Korea Institute for Advanced Study, 85
Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
22
|
Skolnick J, Zhou H. Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods? J Phys Chem B 2016; 121:3546-3554. [PMID: 27748116 DOI: 10.1021/acs.jpcb.6b09517] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , 950 Atlantic Drive Northwest, Atlanta, Georgia 30318, United States
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , 950 Atlantic Drive Northwest, Atlanta, Georgia 30318, United States
| |
Collapse
|
23
|
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES : EUROPEAN CONFERENCE, ECML PKDD ... : PROCEEDINGS. ECML PKDD (CONFERENCE) 2016; 9852:1-16. [PMID: 28884168 PMCID: PMC5584645 DOI: 10.1007/978-3-319-46227-1_1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
Collapse
|
24
|
Seo JH, Kim HH, Jeon EY, Song YH, Shin CS, Park JB. Engineering of Baeyer-Villiger monooxygenase-based Escherichia coli biocatalyst for large scale biotransformation of ricinoleic acid into (Z)-11-(heptanoyloxy)undec-9-enoic acid. Sci Rep 2016; 6:28223. [PMID: 27311560 PMCID: PMC4911592 DOI: 10.1038/srep28223] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 06/01/2016] [Indexed: 01/25/2023] Open
Abstract
Baeyer-Villiger monooxygenases (BVMOs) are able to catalyze regiospecific Baeyer-Villiger oxygenation of a variety of cyclic and linear ketones to generate the corresponding lactones and esters, respectively. However, the enzymes are usually difficult to express in a functional form in microbial cells and are rather unstable under process conditions hindering their large-scale applications. Thereby, we investigated engineering of the BVMO from Pseudomonas putida KT2440 and the gene expression system to improve its activity and stability for large-scale biotransformation of ricinoleic acid (1) into the ester (i.e., (Z)-11-(heptanoyloxy)undec-9-enoic acid) (3), which can be hydrolyzed into 11-hydroxyundec-9-enoic acid (5) (i.e., a precursor of polyamide-11) and n-heptanoic acid (4). The polyionic tag-based fusion engineering of the BVMO and the use of a synthetic promoter for constitutive enzyme expression allowed the recombinant Escherichia coli expressing the BVMO and the secondary alcohol dehydrogenase of Micrococcus luteus to produce the ester (3) to 85 mM (26.6 g/L) within 5 h. The 5 L scale biotransformation process was then successfully scaled up to a 70 L bioreactor; 3 was produced to over 70 mM (21.9 g/L) in the culture medium 6 h after biotransformation. This study demonstrated that the BVMO-based whole-cell reactions can be applied for large-scale biotransformations.
Collapse
Affiliation(s)
- Joo-Hyun Seo
- Department of Food Science and Engineering, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Hwan-Hee Kim
- Department of Food Science and Engineering, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Eun-Yeong Jeon
- Department of Food Science and Engineering, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Young-Ha Song
- AP Technology, Suwon, Kyunggi 443-702, Republic of Korea
| | - Chul-Soo Shin
- AP Technology, Suwon, Kyunggi 443-702, Republic of Korea
| | - Jin-Byung Park
- Department of Food Science and Engineering, Ewha Womans University, Seoul 120-750, Republic of Korea
| |
Collapse
|
25
|
Modi V, Xu Q, Adhikari S, Dunbrack RL. Assessment of template-based modeling of protein structure in CASP11. Proteins 2016; 84 Suppl 1:200-20. [PMID: 27081927 DOI: 10.1002/prot.25049] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Revised: 04/04/2016] [Accepted: 04/11/2016] [Indexed: 12/27/2022]
Abstract
We present the assessment of predictions submitted in the template-based modeling (TBM) category of CASP11 (Critical Assessment of Protein Structure Prediction). Model quality was judged on the basis of global and local measures of accuracy on all atoms including side chains. The top groups on 39 human-server targets based on model 1 predictions were LEER, Zhang, LEE, MULTICOM, and Zhang-Server. The top groups on 81 targets by server groups based on model 1 predictions were Zhang-Server, nns, BAKER-ROSETTASERVER, QUARK, and myprotein-me. In CASP11, the best models for most targets were equal to or better than the best template available in the Protein Data Bank, even for targets with poor templates. The overall performance in CASP11 is similar to the performance of predictors in CASP10 with slightly better performance on the hardest targets. For most targets, assessment measures exhibited bimodal probability density distributions. Multi-dimensional scaling of an RMSD matrix for each target typically revealed a single cluster with models similar to the target structure, with a mode in the GDT-TS density between 40 and 90, and a wide distribution of models highly divergent from each other and from the experimental structure, with density mode at a GDT-TS value of ∼20. The models in this peak in the density were either compact models with entirely the wrong fold, or highly non-compact models. The results argue for a density-driven approach in future CASP TBM assessments that accounts for the bimodal nature of these distributions instead of Z scores, which assume a unimodal, Gaussian distribution. Proteins 2016; 84(Suppl 1):200-220. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vivek Modi
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Sam Adhikari
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Roland L Dunbrack
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111.
| |
Collapse
|