51
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
52
|
Gao J, Zheng S, Yao M, Wu P. Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method. Bioinformatics 2021; 38:94-98. [PMID: 34450651 DOI: 10.1093/bioinformatics/btab616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 08/12/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China
| | - Mengting Yao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Peikun Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
53
|
Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. Proc Natl Acad Sci U S A 2021; 118:2110828118. [PMID: 34873061 DOI: 10.1073/pnas.2110828118] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 12/26/2022] Open
Abstract
Information derived from metagenome sequences through deep-learning techniques has significantly improved the accuracy of template free protein structure modeling. However, most of the deep learning-based modeling studies are based on blind sequence database searches and suffer from low efficiency in computational resource utilization and model construction, especially when the sequence library becomes prohibitively large. We proposed a MetaSource model built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil, and Fermentor) to decode the inherent linkage of microbial niches with protein homologous families. Large-scale protein family folding experiments on 8,700 unknown Pfam families showed that a microbiome targeted approach with multiple sequence alignment constructed from individual MetaSource biomes requires more than threefold less computer memory and CPU (central processing unit) time but generates contact-map and three-dimensional structure models with a significantly higher accuracy, compared with that using combined metagenome datasets. These results demonstrate an avenue to bridge the gap between the rapidly increasing metagenome databases and the limited computing resources for efficient genome-wide database mining, which provides a useful bluebook to guide future microbiome sequence database and modeling development for high-accuracy protein structure and function prediction.
Collapse
|
54
|
Ovchinnikov S, Huang PS. Structure-based protein design with deep learning. Curr Opin Chem Biol 2021; 65:136-144. [PMID: 34547592 PMCID: PMC8671290 DOI: 10.1016/j.cbpa.2021.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
Since the first revelation of proteins functioning as macromolecular machines through their three dimensional structures, researchers have been intrigued by the marvelous ways the biochemical processes are carried out by proteins. The aspiration to understand protein structures has fueled extensive efforts across different scientific disciplines. In recent years, it has been demonstrated that proteins with new functionality or shapes can be designed via structure-based modeling methods, and the design strategies have combined all available information - but largely piece-by-piece - from sequence derived statistics to the detailed atomic-level modeling of chemical interactions. Despite the significant progress, incorporating data-derived approaches through the use of deep learning methods can be a game changer. In this review, we summarize current progress, compare the arc of developing the deep learning approaches with the conventional methods, and describe the motivation and concepts behind current strategies that may lead to potential future opportunities.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, 02138, USA.
| | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
55
|
Timmons PB, Hewage CM. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform 2021; 22:bbab308. [PMID: 34396417 PMCID: PMC8575040 DOI: 10.1093/bib/bbab308] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/29/2023] Open
Abstract
Good knowledge of a peptide's tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5-40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
56
|
Nguyen TT, Marzolf DR, Seffernick JT, Heinze S, Lindert S. Protein structure prediction using residue-resolved protection factors from hydrogen-deuterium exchange NMR. Structure 2021; 30:313-320.e3. [PMID: 34739840 DOI: 10.1016/j.str.2021.10.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 08/04/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022]
Abstract
Hydrogen-deuterium exchange (HDX) measured by nuclear magnetic resonance (NMR) provides structural information for proteins relating to solvent accessibility and flexibility. While this structural information is beneficial, the data cannot be used exclusively to elucidate structures. However, the structural information provided by the HDX-NMR data can be supplemented by computational methods. In previous work, we developed an algorithm in Rosetta to predict structures using qualitative HDX-NMR data (categories of exchange rate). Here we expand on the effort, and utilize quantitative protection factors (PFs) from HDX-NMR for structure prediction. From observed correlations between PFs and solvent accessibility/flexibility measures, we present a scoring function to quantify the agreement with HDX data. Using a benchmark set of 10 proteins, an average improvement of 5.13 Å in root-mean-square deviation (RMSD) is observed for cases of inaccurate Rosetta predictions. Ultimately, seven out of 10 predictions are accurate without including HDX data, and nine out of 10 are accurate when using our PF-based HDX score.
Collapse
Affiliation(s)
- Tung T Nguyen
- Department of Chemistry and Biochemistry, Denison University, Granville, OH 43023, USA
| | - Daniel R Marzolf
- Department of Chemistry and Biochemistry, Ohio State University, 2114 Newman & Wolfrom Laboratory, 100 W. 18(th) Avenue, Columbus, OH 43210, USA
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, 2114 Newman & Wolfrom Laboratory, 100 W. 18(th) Avenue, Columbus, OH 43210, USA
| | - Sten Heinze
- Department of Chemistry and Biochemistry, Ohio State University, 2114 Newman & Wolfrom Laboratory, 100 W. 18(th) Avenue, Columbus, OH 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, 2114 Newman & Wolfrom Laboratory, 100 W. 18(th) Avenue, Columbus, OH 43210, USA.
| |
Collapse
|
57
|
Mortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat Commun 2021; 12:5011. [PMID: 34408149 PMCID: PMC8373938 DOI: 10.1038/s41467-021-25316-w] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 08/04/2021] [Indexed: 11/28/2022] Open
Abstract
Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.
Collapse
Affiliation(s)
- S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
58
|
Sabban SS. Computationally grafting an IgE epitope onto a scaffold: Implications for a pan anti-allergy vaccine design. Comput Struct Biotechnol J 2021; 19:4738-4750. [PMID: 34504666 PMCID: PMC8403545 DOI: 10.1016/j.csbj.2021.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 08/04/2021] [Accepted: 08/08/2021] [Indexed: 12/02/2022] Open
Abstract
Allergy is becoming an intensifying disease among the world population, particularly in the developed world. Once allergy develops, sufferers are permanently trapped in a hyper-immune response that makes them sensitive to innocuous substances. The immune pathway concerned with developing allergy is the Th2 immune pathway where the IgE antibody binds to its Fc ∊ RI receptor on Mast and Basophil cells. This paper discusses a protocol that could disrupt the binding between the antibody and its receptor for a potential permanent treatment. Ten proteins were computationally designed to display a human IgE motif very close in proximity to the IgE antibody's Fc ∊ RI receptor's binding site in an effort for these proteins to be used as a vaccine against our own IgE antibody. The motif of interest was the FG loop motif and it was excised and grafted onto a Staphylococcus aureus protein (PDB ID 1YN3), then the motif + scaffold structure had its sequence re-designed around the motif to find an amino acid sequence that would fold to the designed structure correctly. These ten computationally designed proteins showed successful folding when simulated using Rosetta's AbinitioRelax folding simulation and the IgE epitope was clearly displayed in its native three-dimensional structure in all of them. These designed proteins have the potential to be used as a pan anti-allergy vaccine. This work employedin silicobased methods for designing the proteins and did not include any experimental verifications.
Collapse
Affiliation(s)
- Sari S. Sabban
- King Abdulaziz University, Faculty of Science, Department of Biological Sciences, Jeddah, Saudi Arabia
| |
Collapse
|
59
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
60
|
Chen TR, Juan SH, Huang YW, Lin YC, Lo WC. A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction. PLoS One 2021; 16:e0255076. [PMID: 34320027 PMCID: PMC8318245 DOI: 10.1371/journal.pone.0255076] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/11/2021] [Indexed: 11/18/2022] Open
Abstract
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
Collapse
Affiliation(s)
- Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yen-Cheng Lin
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
61
|
The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction. PLoS One 2021; 16:e0254555. [PMID: 34260641 PMCID: PMC8279362 DOI: 10.1371/journal.pone.0254555] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/29/2021] [Indexed: 11/28/2022] Open
Abstract
The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.
Collapse
|
62
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
63
|
Liu S, Wang T, Xu Q, Shao B, Yin J, Liu TY. Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction. BMC Bioinformatics 2021; 22:351. [PMID: 34182922 PMCID: PMC8240311 DOI: 10.1186/s12859-021-04258-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fragment libraries play a key role in fragment-assembly based protein structure prediction, where protein fragments are assembled to form a complete three-dimensional structure. Rich and accurate structural information embedded in fragment libraries has not been systematically extracted and used beyond fragment assembly. METHODS To better leverage the valuable structural information for protein structure prediction, we extracted seven types of structural information from fragment libraries. We broadened the usage of such structural information by transforming fragment libraries into protein-specific potentials for gradient-descent based protein folding and encoding fragment libraries as structural features for protein property prediction. RESULTS Fragment libraires improved the accuracy of protein folding and outperformed state-of-the-art algorithms with respect to predicted properties, such as torsion angles and inter-residue distances. CONCLUSION Our work implies that the rich structural information extracted from fragment libraries can complement sequence-derived features to help protein structure prediction.
Collapse
Affiliation(s)
- Siyuan Liu
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
- Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
- Microsoft Research Asia, Beijing, China
| | - Tong Wang
- Microsoft Research Asia, Beijing, China.
| | | | - Bin Shao
- Microsoft Research Asia, Beijing, China
| | - Jian Yin
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
- Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
| | | |
Collapse
|
64
|
Koga N, Koga R, Liu G, Castellanos J, Montelione GT, Baker D. Role of backbone strain in de novo design of complex α/β protein structures. Nat Commun 2021; 12:3921. [PMID: 34168113 PMCID: PMC8225619 DOI: 10.1038/s41467-021-24050-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 05/28/2021] [Indexed: 12/24/2022] Open
Abstract
We previously elucidated principles for designing ideal proteins with completely consistent local and non-local interactions which have enabled the design of a wide range of new αβ-proteins with four or fewer β-strands. The principles relate local backbone structures to supersecondary-structure packing arrangements of α-helices and β-strands. Here, we test the generality of the principles by employing them to design larger proteins with five- and six- stranded β-sheets flanked by α-helices. The initial designs were monomeric in solution with high thermal stability, and the nuclear magnetic resonance (NMR) structure of one was close to the design model, but for two others the order of strands in the β-sheet was swapped. Investigation into the origins of this strand swapping suggested that the global structures of the design models were more strained than the NMR structures. We incorporated explicit consideration of global backbone strain into the design methodology, and succeeded in designing proteins with the intended unswapped strand arrangements. These results illustrate the value of experimental structure determination in guiding improvement of de novo design, and the importance of consistency between local, supersecondary, and global tertiary interactions in determining protein topology. The augmented set of principles should inform the design of larger functional proteins.
Collapse
Affiliation(s)
- Nobuyasu Koga
- University of Washington, Department of Biochemistry and Howard Hughes Medical Institute, Seattle, Washington, WA, USA. .,Research Center of Integrative Molecular Systems, Institute for Molecular Science, National Institutes of Natural Sciences, Okazaki, Aichi, Japan. .,Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi, Japan. .,SOKENDAI, The Graduate University for Advanced Studies, Hayama, Kanagawa, Japan.
| | - Rie Koga
- University of Washington, Department of Biochemistry and Howard Hughes Medical Institute, Seattle, Washington, WA, USA.,Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | - Gaohua Liu
- Nexomics Biosciences, Rocky Hill, NJ, USA
| | - Javier Castellanos
- University of Washington, Department of Biochemistry and Howard Hughes Medical Institute, Seattle, Washington, WA, USA
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, and Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, NY, USA.
| | - David Baker
- University of Washington, Department of Biochemistry and Howard Hughes Medical Institute, Seattle, Washington, WA, USA.
| |
Collapse
|
65
|
Osakabe K, Wada N, Murakami E, Miyashita N, Osakabe Y. Genome editing in mammalian cells using the CRISPR type I-D nuclease. Nucleic Acids Res 2021; 49:6347-6363. [PMID: 34076237 PMCID: PMC8216271 DOI: 10.1093/nar/gkab348] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 04/15/2021] [Accepted: 05/20/2021] [Indexed: 12/26/2022] Open
Abstract
Adoption of CRISPR-Cas systems, such as CRISPR-Cas9 and CRISPR-Cas12a, has revolutionized genome engineering in recent years; however, application of genome editing with CRISPR type I-the most abundant CRISPR system in bacteria-remains less developed. Type I systems, such as type I-E, and I-F, comprise the CRISPR-associated complex for antiviral defense ('Cascade': Cas5, Cas6, Cas7, Cas8 and the small subunit) and Cas3, which degrades the target DNA; in contrast, for the sub-type CRISPR-Cas type I-D, which lacks a typical Cas3 nuclease in its CRISPR locus, the mechanism of target DNA degradation remains unknown. Here, we found that Cas10d is a functional nuclease in the type I-D system, performing the role played by Cas3 in other CRISPR-Cas type I systems. The type I-D system can be used for targeted mutagenesis of genomic DNA in human cells, directing both bi-directional long-range deletions and short insertions/deletions. Our findings suggest the CRISPR-Cas type I-D system as a unique effector pathway in CRISPR that can be repurposed for genome engineering in eukaryotic cells.
Collapse
Affiliation(s)
- Keishi Osakabe
- Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Tokushima, Tokushima 770-8503, Japan
| | - Naoki Wada
- Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Tokushima, Tokushima 770-8503, Japan
| | - Emi Murakami
- Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Tokushima, Tokushima 770-8503, Japan
| | - Naoyuki Miyashita
- Department of Computational Systems Biology, Faculty of Biology-Oriented Science and Technology, Kindai University, Kinokawa, Wakayama 649-6493, Japan
| | - Yuriko Osakabe
- Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Tokushima, Tokushima 770-8503, Japan
- School of Life Science and Technology, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8502, Japan
| |
Collapse
|
66
|
Zhu M, Wang DD, Yan H. Genotype-determined EGFR-RTK heterodimerization and its effects on drug resistance in lung Cancer treatment revealed by molecular dynamics simulations. BMC Mol Cell Biol 2021; 22:34. [PMID: 34112110 PMCID: PMC8191231 DOI: 10.1186/s12860-021-00358-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 03/10/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Epidermal growth factor receptor (EGFR) and its signaling pathways play a vital role in pathogenesis of lung cancer. By disturbing EGFR signaling, mutations of EGFR may lead to progression of cancer or the emergence of resistance to EGFR-targeted drugs. RESULTS We investigated the correlation between EGFR mutations and EGFR-receptor tyrosine kinase (RTK) crosstalk in the signaling network, in order to uncover the drug resistance mechanism induced by EGFR mutations. For several EGFR wild type (WT) or mutated proteins, we measured the EGFR-RTK interactions using several computational methods based on molecular dynamics (MD) simulations, including geometrical characterization of the interfaces and conventional estimation of free energy of binding. Geometrical properties, namely the matching rate of atomic solid angles in the interfaces and center-of-mass distances between interacting atoms, were extracted relying on Alpha Shape modeling. For a couple of RTK partners (c-Met, ErbB2 and IGF-1R), results have shown a looser EGFR-RTK crosstalk for the drug-sensitive EGFR mutant while a tighter crosstalk for the drug-resistant mutant. It guarantees the genotype-determined EGFR-RTK crosstalk, and further proposes a potential drug resistance mechanism by amplified EGFR-RTK crosstalk induced by EGFR mutations. CONCLUSIONS This study will lead to a deeper understanding of EGFR mutation-induced drug resistance mechanisms and promote the design of innovative drugs.
Collapse
Affiliation(s)
- Mengxu Zhu
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong.
| | - Debby D Wang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
67
|
Basu S, Chakravarty D, Bhattacharyya D, Saha P, Patra HK. Plausible blockers of Spike RBD in SARS-CoV2-molecular design and underlying interaction dynamics from high-level structural descriptors. J Mol Model 2021; 27:191. [PMID: 34057647 PMCID: PMC8165686 DOI: 10.1007/s00894-021-04779-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 04/26/2021] [Indexed: 12/24/2022]
Abstract
Abstract COVID-19 is characterized by an unprecedented abrupt increase in the viral transmission rate (SARS-CoV-2) relative to its pandemic evolutionary ancestor, SARS-CoV (2003). The complex molecular cascade of events related to the viral pathogenicity is triggered by the Spike protein upon interacting with the ACE2 receptor on human lung cells through its receptor binding domain (RBDSpike). One potential therapeutic strategy to combat COVID-19 could thus be limiting the infection by blocking this key interaction. In this current study, we adopt a protein design approach to predict and propose non-virulent structural mimics of the RBDSpike which can potentially serve as its competitive inhibitors in binding to ACE2. The RBDSpike is an independently foldable protein domain, resilient to conformational changes upon mutations and therefore an attractive target for strategic re-design. Interestingly, in spite of displaying an optimal shape fit between their interacting surfaces (attributed to a consequently high mutual affinity), the RBDSpike–ACE2 interaction appears to have a quasi-stable character due to a poor electrostatic match at their interface. Structural analyses of homologous protein complexes reveal that the ACE2 binding site of RBDSpike has an unusually high degree of solvent-exposed hydrophobic residues, attributed to key evolutionary changes, making it inherently “reaction-prone.” The designed mimics aimed to block the viral entry by occupying the available binding sites on ACE2, are tested to have signatures of stable high-affinity binding with ACE2 (cross-validated by appropriate free energy estimates), overriding the native quasi-stable feature. The results show the apt of directly adapting natural examples in rational protein design, wherein, homology-based threading coupled with strategic “hydrophobic ↔ polar” mutations serve as a potential breakthrough. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1007/s00894-021-04779-0.
Collapse
Affiliation(s)
- Sankar Basu
- Department of Microbiology, Asutosh College (affiliated to University of Calcutta), Kolkata, 700026, West Bengal, India.
| | - Devlina Chakravarty
- Department of Chemistry, University of Rutgers-Camden, Camden, 08102, NJ, USA
| | - Dhananjay Bhattacharyya
- Computational Science Division, Saha Institute of Nuclear Physics, Kolkata, 700064, West Bengal, India
| | - Pampa Saha
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Hirak K Patra
- Department of Surgical Biotechnology, Division of Surgery and Interventional Science, University College London, London, NW3 2PF, UK
| |
Collapse
|
68
|
Salgado MM, Manchado A, Nieto CT, Díez D, Garrido NM. Synthesis and Modeling of Ezetimibe Analogues. Molecules 2021; 26:molecules26113107. [PMID: 34067439 PMCID: PMC8196997 DOI: 10.3390/molecules26113107] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 05/18/2021] [Accepted: 05/20/2021] [Indexed: 11/16/2022] Open
Abstract
Ezetimibe is a well-known drug that lowers blood cholesterol levels by reducing its absorption in the small intestine when joining to Niemann-Pick C1-like protein (NPC1L1). A ligand-based study on ezetimibe analogues is reported, together with one-hit synthesis, highlighted in the study. A convenient asymmetric synthesis of (2S,3S)-N-α-(R)-methylbenzyl-3-methoxycarbonylethyl-4-methoxyphenyl β-lactam is described starting from Baylis-Hillman adducts. The route involves a domino process: allylic acetate rearrangement, stereoselective Ireland-Claisen rearrangement and asymmetric Michael addition, which provides a δ-amino acid derivative with full stereochemical control. A subsequent inversion of ester and acid functionality paves the way to the lactam core after monodebenzylation and lactam formation. It also shows interesting results when it comes to a pharmacophore study based on ezetimibe as the main ligand in lowering blood cholesterol levels, revealing which substituents on the azetidine-2-one ring are more similar to the ezetimibe skeleton and will more likely bind to NPC1L1 than ezetimibe.
Collapse
|
69
|
Wegrzyn K, Zabrocka E, Bury K, Tomiczek B, Wieczor M, Czub J, Uciechowska U, Moreno-Del Alamo M, Walkow U, Grochowina I, Dutkiewicz R, Bujnicki JM, Giraldo R, Konieczny I. Defining a novel domain that provides an essential contribution to site-specific interaction of Rep protein with DNA. Nucleic Acids Res 2021; 49:3394-3408. [PMID: 33660784 PMCID: PMC8034659 DOI: 10.1093/nar/gkab113] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/04/2021] [Accepted: 02/10/2021] [Indexed: 12/24/2022] Open
Abstract
An essential feature of replication initiation proteins is their ability to bind to DNA. In this work, we describe a new domain that contributes to a replication initiator sequence-specific interaction with DNA. Applying biochemical assays and structure prediction methods coupled with DNA–protein crosslinking, mass spectrometry, and construction and analysis of mutant proteins, we identified that the replication initiator of the broad host range plasmid RK2, in addition to two winged helix domains, contains a third DNA-binding domain. The phylogenetic analysis revealed that the composition of this unique domain is typical within the described TrfA-like protein family. Both in vitro and in vivo experiments involving the constructed TrfA mutant proteins showed that the newly identified domain is essential for the formation of the protein complex with DNA, contributes to the avidity for interaction with DNA, and the replication activity of the initiator. The analysis of mutant proteins, each containing a single substitution, showed that each of the three domains composing TrfA is essential for the formation of the protein complex with DNA. Furthermore, the new domain, along with the winged helix domains, contributes to the sequence specificity of replication initiator interaction within the plasmid replication origin.
Collapse
Affiliation(s)
- Katarzyna Wegrzyn
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Elzbieta Zabrocka
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Katarzyna Bury
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Bartlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Milosz Wieczor
- Department of Physical Chemistry, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland
| | - Jacek Czub
- Department of Physical Chemistry, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland
| | - Urszula Uciechowska
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - María Moreno-Del Alamo
- Department of Cellular and Molecular Biology, Centro de Investigaciones Biológicas - CSIC, E28040 Madrid, Spain
| | - Urszula Walkow
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Igor Grochowina
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Rafal Dutkiewicz
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Księcia Trojdena 4, 02-109 Warsaw, Poland.,Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| | - Rafael Giraldo
- Department of Cellular and Molecular Biology, Centro de Investigaciones Biológicas - CSIC, E28040 Madrid, Spain
| | - Igor Konieczny
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| |
Collapse
|
70
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
71
|
Pereira JM, Vieira M, Santos SM. Step-by-step design of proteins for small molecule interaction: A review on recent milestones. Protein Sci 2021; 30:1502-1520. [PMID: 33934427 DOI: 10.1002/pro.4098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 01/01/2023]
Abstract
Protein design is the field of synthetic biology that aims at developing de novo custom-made proteins and peptides for specific applications. Despite exploring an ambitious goal, recent computational advances in both hardware and software technologies have paved the way to high-throughput screening and detailed design of novel folds and improved functionalities. Modern advances in the field of protein design for small molecule targeting are described in this review, organized in a step-by-step fashion: from the conception of a new or upgraded active binding site, to scaffold design, sequence optimization, and experimental expression of the custom protein. In each step, contemporary examples are described, and state-of-the-art software is briefly explored.
Collapse
Affiliation(s)
- José M Pereira
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| | - Maria Vieira
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| | - Sérgio M Santos
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| |
Collapse
|
72
|
Wiese JG, Shanmugaratnam S, Höcker B. Extension of a de novo TIM barrel with a rationally designed secondary structure element. Protein Sci 2021; 30:982-989. [PMID: 33723882 PMCID: PMC8040861 DOI: 10.1002/pro.4064] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 02/02/2021] [Accepted: 03/09/2021] [Indexed: 11/12/2022]
Abstract
The ability to construct novel enzymes is a major aim in de novo protein design. A popular enzyme fold for design attempts is the TIM barrel. This fold is a common topology for enzymes and can harbor many diverse reactions. The recent de novo design of a four-fold symmetric TIM barrel provides a well understood minimal scaffold for potential enzyme designs. Here we explore opportunities to extend and diversify this scaffold by adding a short de novo helix on top of the barrel. Due to the size of the protein, we developed a design pipeline based on computational ab initio folding that solves a less complex sub-problem focused around the helix and its vicinity and adapt it to the entire protein. We provide biochemical characterization and a high-resolution X-ray structure for one variant and compare it to our design model. The successful extension of this robust TIM-barrel scaffold opens opportunities to diversify it towards more pocket like arrangements and as such can be considered a building block for future design of binding or catalytic sites.
Collapse
Affiliation(s)
- Jonas Gregor Wiese
- Max Planck Institute for Developmental BiologyTübingenGermany
- Present address:
Technical University of MunichMunichGermany
| | - Sooruban Shanmugaratnam
- Max Planck Institute for Developmental BiologyTübingenGermany
- University of Bayreuth, Department for BiochemistryBayreuthGermany
| | - Birte Höcker
- Max Planck Institute for Developmental BiologyTübingenGermany
- University of Bayreuth, Department for BiochemistryBayreuthGermany
| |
Collapse
|
73
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
74
|
Sulimov VB, Kutov DC, Taschilova AS, Ilin IS, Tyrtyshnikov EE, Sulimov AV. Docking Paradigm in Drug Design. Curr Top Med Chem 2021; 21:507-546. [PMID: 33292135 DOI: 10.2174/1568026620666201207095626] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 09/28/2020] [Accepted: 10/16/2020] [Indexed: 11/22/2022]
Abstract
Docking is in demand for the rational computer aided structure based drug design. A review of docking methods and programs is presented. Different types of docking programs are described. They include docking of non-covalent small ligands, protein-protein docking, supercomputer docking, quantum docking, the new generation of docking programs and the application of docking for covalent inhibitors discovery. Taking into account the threat of COVID-19, we present here a short review of docking applications to the discovery of inhibitors of SARS-CoV and SARS-CoV-2 target proteins, including our own result of the search for inhibitors of SARS-CoV-2 main protease using docking and quantum chemical post-processing. The conclusion is made that docking is extremely important in the fight against COVID-19 during the process of development of antivirus drugs having a direct action on SARS-CoV-2 target proteins.
Collapse
Affiliation(s)
- Vladimir B Sulimov
- Research Computer Center of Lomonosov Moscow State University, Moscow, Russian Federation
| | - Danil C Kutov
- Research Computer Center of Lomonosov Moscow State University, Moscow, Russian Federation
| | - Anna S Taschilova
- Research Computer Center of Lomonosov Moscow State University, Moscow, Russian Federation
| | - Ivan S Ilin
- Research Computer Center of Lomonosov Moscow State University, Moscow, Russian Federation
| | - Eugene E Tyrtyshnikov
- Institute of Numerical Mathematics of Russian Academy of Sciences, Moscow, Russian Federation
| | - Alexey V Sulimov
- Research Computer Center of Lomonosov Moscow State University, Moscow, Russian Federation
| |
Collapse
|
75
|
Lindsay RJ, Mansbach RA, Gnanakaran S, Shen T. Effects of pH on an IDP conformational ensemble explored by molecular dynamics simulation. Biophys Chem 2021; 271:106552. [PMID: 33581430 PMCID: PMC8024028 DOI: 10.1016/j.bpc.2021.106552] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/15/2021] [Accepted: 01/20/2021] [Indexed: 01/03/2023]
Abstract
The conformational ensemble of intrinsically disordered proteins, such as α-synuclein, are responsible for their function and malfunction. Misfolding of α-synuclein can lead to neurodegenerative diseases, and the ability to study their conformations and those of other intrinsically disordered proteins under varying physiological conditions can be crucial to understanding and preventing pathologies. In contrast to well-folded peptides, a consensus feature of IDPs is their low hydropathy and high charge, which makes their conformations sensitive to pH perturbation. We examine a prominent member of this subset of IDPs, α-synuclein, using a divide-and-conquer scheme that provides enhanced sampling of IDP structural ensembles. We constructed conformational ensembles of α-synuclein under neutral (pH ~ 7) and low (pH ~ 3) pH conditions and compared our results with available information obtained from smFRET, SAXS, and NMR studies. Specifically, α-synuclein has been found to in a more compact state at low pH conditions and the structural changes observed are consistent with those from experiments. We also characterize the conformational and dynamic differences between these ensembles and discussed the implication on promoting pathogenic fibril formation. We find that under low pH conditions, neutralization of negatively charged residues leads to compaction of the C-terminal portion of α-synuclein while internal reorganization allows α-synuclein to maintain its overall end-to-end distance. We also observe different levels of intra-protein interaction between three regions of α-synuclein at varying pH and a shift towards more hydrophilic interactions with decreasing pH.
Collapse
Affiliation(s)
- Richard J Lindsay
- UT- ORNL Graduate School of Genome Science and Technology, Knoxville, TN, 37996, USA.
| | - Rachael A Mansbach
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA; Department of Physics, Concordia University, Montreal, Quebec, Canada.
| | - S Gnanakaran
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA.
| | - Tongye Shen
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, 37996, USA.
| |
Collapse
|
76
|
Marzolf DR, Seffernick JT, Lindert S. Protein Structure Prediction from NMR Hydrogen-Deuterium Exchange Data. J Chem Theory Comput 2021; 17:2619-2629. [PMID: 33780620 DOI: 10.1021/acs.jctc.1c00077] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Amide hydrogen-deuterium exchange (HDX) has long been used to determine regional flexibility and binding sites in proteins; however, the data are too sparse for full structural characterization. Experiments that measure HDX rates, such as HDX-NMR, have far higher throughput compared to structure determination via X-ray crystallography, cryo-EM, or a full suite of NMR experiments. Data from HDX-NMR experiments encode information on the protein structure, making HDX a prime candidate to be supplemented by computational algorithms for protein structure prediction. We have developed a methodology to incorporate HDX-NMR data into ab initio protein structure prediction using the Rosetta software framework to predict structures based on experimental agreement. To demonstrate the efficacy of our algorithm, we examined 38 proteins with HDX-NMR data available, comparing the predicted model with and without the incorporation of HDX data into scoring. The root-mean-square deviation (rmsd, a measure of the average atomic distance between superimposed models) of the predicted model improved by 1.42 Å on average after incorporating the HDX-NMR data into scoring. The average rmsd improvement for the proteins where the selected model rmsd changed after incorporating HDX data was 3.63 Å, including one improvement of more than 11 Å and seven proteins improving by greater than 4 Å, with 12/15 proteins improving overall. Additionally, for independent verification, two proteins that were not part of the original benchmark were scored including HDX data, with a dramatic improvement of the selected model rmsd of nearly 9 Å for one of the proteins. Moreover, we have developed a confidence metric allowing us to successfully identify near-native models in the absence of a native structure. Improvement in model selection with a strong confidence measure demonstrates that protein structure prediction with HDX-NMR is a powerful tool which can be performed with minimal additional computational strain and expense.
Collapse
Affiliation(s)
- Daniel R Marzolf
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
77
|
Norn C, Wicky BIM, Juergens D, Liu S, Kim D, Tischer D, Koepnick B, Anishchenko I, Baker D, Ovchinnikov S. Protein sequence design by conformational landscape optimization. Proc Natl Acad Sci U S A 2021; 118:e2017228118. [PMID: 33712545 PMCID: PMC7980421 DOI: 10.1073/pnas.2017228118] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.
Collapse
Affiliation(s)
- Christoffer Norn
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Basile I M Wicky
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - David Juergens
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Graduate Program in Molecular Engineering, University of Washington, Seattle, WA 98105
| | - Sirui Liu
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA 02138
| | - David Kim
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Doug Tischer
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105;
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA 02138;
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138
| |
Collapse
|
78
|
Zhang GJ, Xie TY, Zhou XG, Wang LJ, Hu J. Protein Structure Prediction Using Population-Based Algorithm Guided by Information Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:697-707. [PMID: 31180869 DOI: 10.1109/tcbb.2019.2921958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ab initio protein structure prediction is one of the most challenging problems in computational biology. Multistage algorithms are widely used in ab initio protein structure prediction. The different computational costs of a multistage algorithm for different proteins are important to be considered. In this study, a population-based algorithm guided by information entropy (PAIE), which includes exploration and exploitation stages, is proposed for protein structure prediction. In PAIE, an entropy-based stage switch strategy is designed to switch from the exploration stage to the exploitation stage. Torsion angle statistical information is also deduced from the first stage and employed to enhance the exploitation in the second stage. Results indicate that an improvement in the performance of protein structure prediction in a benchmark of 30 proteins and 17 other free modeling targets in CASP.
Collapse
|
79
|
Li T, Kong L, Li X, Wu S, Attri KS, Li Y, Gong W, Li L, Herring LE, Asara JM, Xu L, Luo X, Lei YL, Ma Q, Seveau S, Gunn JS, Cheng X, Singh PK, Green DR, Wang H, Wen H, Wen H. Listeria monocytogenes upregulates mitochondrial calcium signalling to inhibit LC3-associated phagocytosis as a survival strategy. Nat Microbiol 2021; 6:366-379. [PMID: 33462436 PMCID: PMC8323152 DOI: 10.1038/s41564-020-00843-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/27/2020] [Indexed: 01/29/2023]
Abstract
Mitochondria are believed to have originated ~2.5 billion years ago. As well as energy generation in cells, mitochondria have a role in defence against bacterial pathogens. Despite profound changes in mitochondrial morphology and functions following bacterial challenge, whether intracellular bacteria can hijack mitochondria to promote their survival remains elusive. We report that Listeria monocytogenes-an intracellular bacterial pathogen-suppresses LC3-associated phagocytosis (LAP) by modulation of mitochondrial Ca2+ (mtCa2+) signalling in order to survive inside cells. Invasion of macrophages by L. monocytogenes induced mtCa2+ uptake through the mtCa2+ uniporter (MCU), which in turn increased acetyl-coenzyme A (acetyl-CoA) production by pyruvate dehydrogenase. Acetylation of the LAP effector Rubicon with acetyl-CoA decreased LAP formation. Genetic ablation of MCU attenuated intracellular bacterial growth due to increased LAP formation. Our data show that modulation of mtCa2+ signalling can increase bacterial survival inside cells, and highlight the importance of mitochondrial metabolism in host-microbial interactions.
Collapse
Affiliation(s)
- Tianliang Li
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA
| | - Ligang Kong
- Shandong Institute of Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Shandong ENT Hospital Affiliated to Shandong University, Jinan, Shandong, China
| | - Xinghui Li
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA
| | - Sijin Wu
- College of Pharmacy, Medicinal Chemistry & Pharmacognosy Division, The Ohio State University, Columbus, OH, USA
| | - Kuldeep S. Attri
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
| | - Yan Li
- Department of Physiology and Cell Biology, The Ohio State University, Columbus, OH, USA
| | - Weipeng Gong
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA
| | - Lupeng Li
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Laura E. Herring
- Proteomics Core Facility, Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - John M. Asara
- Division of Signal Transduction, Beth Israel Deaconess Medical Center and Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Lei Xu
- Shandong Institute of Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Shandong ENT Hospital Affiliated to Shandong University, Jinan, Shandong, China
| | - Xiaobo Luo
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry, Ann Arbor, MI, USA
| | - Yu L Lei
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry, Ann Arbor, MI, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Stephanie Seveau
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA
| | - John S Gunn
- Center for Microbial Pathogenesis, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
| | - Xiaolin Cheng
- College of Pharmacy, Medicinal Chemistry & Pharmacognosy Division, The Ohio State University, Columbus, OH, USA
| | - Pankaj K. Singh
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
| | - Douglas R. Green
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Haibo Wang
- Shandong Institute of Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Shandong ENT Hospital Affiliated to Shandong University, Jinan, Shandong, China,Correspondence: Dr. Haitao Wen (), Telephone: 614-292-6724, Fax: 614-292-9616, Address: 796 Biomedical Research Tower, 460 W 12th Ave, Columbus, OH 43210, Dr. Haibo Wang (), Telephone: 86-531-68777588, Address: #4 Duanxing Xilu, Jinan, Shandong, China 25011
| | - Haitao Wen
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA,Correspondence: Dr. Haitao Wen (), Telephone: 614-292-6724, Fax: 614-292-9616, Address: 796 Biomedical Research Tower, 460 W 12th Ave, Columbus, OH 43210, Dr. Haibo Wang (), Telephone: 86-531-68777588, Address: #4 Duanxing Xilu, Jinan, Shandong, China 25011
| | - Haitao Wen
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
80
|
Abstract
While native proteins cover diverse structural spaces and achieve various biological events, not many of them can directly serve human needs. One reason is that the native proteins usually contain idiosyncrasies evolved for their native functions but disfavoring engineering requirements. To overcome this issue, one strategy is to create de novo proteins which are designed to possess improved stability, high environmental tolerance, and enhanced engineering potential. Compared to other protein engineering strategies, in silico design of de novo proteins has significantly expanded the protein structural and sequence spaces, reduced wet lab workload, and incorporated engineered features in a guided and efficient manner. In the Baker laboratory we have been applying a design pipeline that uses the blueprint builder to design different folds of de novo proteins, and have successfully obtained libraries of de novo proteins with improved stability and engineering potential. In this article, we will use the design of de novo β-barrel proteins as an example to describe the principles and basic procedures of the blueprint builder-based design pipeline. © 2020 Wiley Periodicals LLC. Basic Protocol 1: The construction of blueprints Alternate Protocol: Build blueprints based on existing protein .pdb files Basic Protocol 2: De novo protein design pipeline using the blueprint builder.
Collapse
Affiliation(s)
- Linna An
- Institute for Protein Design, University of Washington, Seattle, Washington
| | - Gyu Rie Lee
- Institute for Protein Design, University of Washington, Seattle, Washington
| |
Collapse
|
81
|
Dybowski R. Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_318-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
82
|
Abstract
Biologists are increasingly aware of the importance of protein structure in revealing function. The computational tools now exist which allow researchers to model unknown proteins simply on the basis of their primary sequence. However, for the non-specialist bioinformatician, there is a dazzling array of terminology, acronyms, and competing computer software available for this process. This review is intended to highlight the key stages of computational protein structure prediction, as well as explain the reasons behind some of the procedures and list some established workarounds for common pitfalls. Thereafter follows a review of five one-stop servers for start-to-finish structure prediction.
Collapse
|
83
|
Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem 2021; 296:100558. [PMID: 33744284 PMCID: PMC8065224 DOI: 10.1016/j.jbc.2021.100558] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Collapse
Affiliation(s)
- Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
84
|
Seffernick JT, Lindert S. Hybrid methods for combined experimental and computational determination of protein structure. J Chem Phys 2020; 153:240901. [PMID: 33380110 PMCID: PMC7773420 DOI: 10.1063/5.0026025] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 11/10/2020] [Indexed: 02/04/2023] Open
Abstract
Knowledge of protein structure is paramount to the understanding of biological function, developing new therapeutics, and making detailed mechanistic hypotheses. Therefore, methods to accurately elucidate three-dimensional structures of proteins are in high demand. While there are a few experimental techniques that can routinely provide high-resolution structures, such as x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-EM, which have been developed to determine the structures of proteins, these techniques each have shortcomings and thus cannot be used in all cases. However, additionally, a large number of experimental techniques that provide some structural information, but not enough to assign atomic positions with high certainty have been developed. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. In cases where it is not possible to determine the structure of a protein experimentally, computational structure prediction methods can be used as an alternative. Although computational methods can be performed without any experimental data in a large number of studies, inclusion of sparse experimental data into these prediction methods has yielded significant improvement. In this Perspective, we cover many of the successes of integrative modeling, computational modeling with experimental data, specifically for protein folding, protein-protein docking, and molecular dynamics simulations. We describe methods that incorporate sparse data from cryo-EM, NMR, mass spectrometry, electron paramagnetic resonance, small-angle x-ray scattering, Förster resonance energy transfer, and genetic sequence covariation. Finally, we highlight some of the major challenges in the field as well as possible future directions.
Collapse
Affiliation(s)
- Justin T. Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
85
|
MHCII3D-Robust Structure Based Prediction of MHC II Binding Peptides. Int J Mol Sci 2020; 22:ijms22010012. [PMID: 33374958 PMCID: PMC7792572 DOI: 10.3390/ijms22010012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 12/17/2020] [Accepted: 12/17/2020] [Indexed: 02/02/2023] Open
Abstract
Knowledge of MHC II binding peptides is highly desired in immunological research, particularly in the context of cancer, autoimmune diseases, or allergies. The most successful prediction methods are based on machine learning methods trained on sequences of experimentally characterized binding peptides. Here, we describe a complementary approach called MHCII3D, which is based on structural scaffolds of MHC II-peptide complexes and statistical scoring functions (SSFs). The MHC II alleles reported in the Immuno Polymorphism Database are processed in a dedicated 3D-modeling pipeline providing a set of scaffold complexes for each distinct allotype sequence. Antigen protein sequences are threaded through the scaffolds and evaluated by optimized SSFs. We compared the predictive power of MHCII3D with different sequence-based machine learning methods. The Pearson correlation to experimentally determine IC50 values for MHC II Automated Server Benchmarks data sets from IEDB (Immune Epitope Database) is 0.42, which is in the competitor methods range. We show that MHCII3D is quite robust in leaving one molecule out tests and is therefore not prone to overfitting. Finally, we provide evidence that MHCII3D can complement the current sequence-based methods and help to identify problematic entries in IEDB. Scaffolds and MHCII3D executables can be freely downloaded from our web pages.
Collapse
|
86
|
Abstract
We describe the de novo design of an allosterically regulated protein, which comprises two tightly coupled domains. One domain is based on the DF (Due Ferri in Italian or two-iron in English) family of de novo proteins, which have a diiron cofactor that catalyzes a phenol oxidase reaction, while the second domain is based on PS1 (Porphyrin-binding Sequence), which binds a synthetic Zn-porphyrin (ZnP). The binding of ZnP to the original PS1 protein induces changes in structure and dynamics, which we expected to influence the catalytic rate of a fused DF domain when appropriately coupled. Both DF and PS1 are four-helix bundles, but they have distinct bundle architectures. To achieve tight coupling between the domains, they were connected by four helical linkers using a computational method to discover the most designable connections capable of spanning the two architectures. The resulting protein, DFP1 (Due Ferri Porphyrin), bound the two cofactors in the expected manner. The crystal structure of fully reconstituted DFP1 was also in excellent agreement with the design, and it showed the ZnP cofactor bound over 12 Å from the dimetal center. Next, a substrate-binding cleft leading to the diiron center was introduced into DFP1. The resulting protein acts as an allosterically modulated phenol oxidase. Its Michaelis-Menten parameters were strongly affected by the binding of ZnP, resulting in a fourfold tighter K m and a 7-fold decrease in k cat These studies establish the feasibility of designing allosterically regulated catalytic proteins, entirely from scratch.
Collapse
|
87
|
Leclère L, Nir TS, Bazarsky M, Braitbard M, Schneidman-Duhovny D, Gat U. Dynamic Evolution of the Cthrc1 Genes, a Newly Defined Collagen-Like Family. Genome Biol Evol 2020; 12:3957-3970. [PMID: 32022859 PMCID: PMC7058181 DOI: 10.1093/gbe/evaa020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/28/2020] [Indexed: 12/11/2022] Open
Abstract
Collagen triple helix repeat containing protein 1 (Cthrc1) is a secreted glycoprotein reported to regulate collagen deposition and to be linked to the Transforming growth factor β/Bone morphogenetic protein and the Wnt/planar cell polarity pathways. It was first identified as being induced upon injury to rat arteries and was found to be highly expressed in multiple human cancer types. Here, we explore the phylogenetic and evolutionary trends of this metazoan gene family, previously studied only in vertebrates. We identify Cthrc1 orthologs in two distant cnidarian species, the sea anemone Nematostella vectensis and the hydrozoan Clytia hemisphaerica, both of which harbor multiple copies of this gene. We find that Cthrc1 clade-specific diversification occurred multiple times in cnidarians as well as in most metazoan clades where we detected this gene. Many other groups, such as arthropods and nematodes, have entirely lost this gene family. Most vertebrates display a single highly conserved gene, and we show that the sequence evolutionary rate of Cthrc1 drastically decreased within the gnathostome lineage. Interestingly, this reduction coincided with the origin of its conserved upstream neighboring gene, Frizzled 6 (FZD6), which in mice has been shown to functionally interact with Cthrc1. Structural modeling methods further reveal that the yet uncharacterized C-terminal domain of Cthrc1 is similar in structure to the globular C1q superfamily domain, also found in the C-termini of collagens VIII and X. Thus, our studies show that the Cthrc1 genes are a collagen-like family with a variable short collagen triple helix domain and a highly conserved C-terminal domain structure resembling the C1q family.
Collapse
Affiliation(s)
- Lucas Leclère
- Laboratoire de Biologie du Développement de Villefranche-sur-Mer (LBDV), Sorbonne Université, CNRS, Villefranche-sur-Mer, France
| | - Tal S Nir
- Department of Cell and Developmental Biology, Silberman Life Sciences Institute, The Hebrew University of Jerusalem, Israel
| | - Michael Bazarsky
- Department of Cell and Developmental Biology, Silberman Life Sciences Institute, The Hebrew University of Jerusalem, Israel
| | - Merav Braitbard
- Department of Biochemistry, Silberman Life Sciences Institute, The Hebrew University of Jerusalem, Israel
| | - Dina Schneidman-Duhovny
- Department of Biochemistry, Silberman Life Sciences Institute, The Hebrew University of Jerusalem, Israel.,School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
| | - Uri Gat
- Department of Cell and Developmental Biology, Silberman Life Sciences Institute, The Hebrew University of Jerusalem, Israel
| |
Collapse
|
88
|
McGehee AJ, Bhattacharya S, Roche R, Bhattacharya D. PolyFold: An interactive visual simulator for distance-based protein folding. PLoS One 2020; 15:e0243331. [PMID: 33270805 PMCID: PMC7714222 DOI: 10.1371/journal.pone.0243331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/18/2020] [Indexed: 11/18/2022] Open
Abstract
Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Andrew J. McGehee
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- * E-mail:
| |
Collapse
|
89
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
90
|
Ikuta T, Shihoya W, Sugiura M, Yoshida K, Watari M, Tokano T, Yamashita K, Katayama K, Tsunoda SP, Uchihashi T, Kandori H, Nureki O. Structural insights into the mechanism of rhodopsin phosphodiesterase. Nat Commun 2020; 11:5605. [PMID: 33154353 PMCID: PMC7644710 DOI: 10.1038/s41467-020-19376-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 10/07/2020] [Indexed: 02/06/2023] Open
Abstract
Rhodopsin phosphodiesterase (Rh-PDE) is an enzyme rhodopsin belonging to a recently discovered class of microbial rhodopsins with light-dependent enzymatic activity. Rh-PDE consists of the N-terminal rhodopsin domain and C-terminal phosphodiesterase (PDE) domain, connected by 76-residue linker, and hydrolyzes both cAMP and cGMP in a light-dependent manner. Thus, Rh-PDE has potential for the optogenetic manipulation of cyclic nucleotide concentrations, as a complementary tool to rhodopsin guanylyl cyclase and photosensitive adenylyl cyclase. Here we present structural and functional analyses of the Rh-PDE derived from Salpingoeca rosetta. The crystal structure of the rhodopsin domain at 2.6 Å resolution revealed a new topology of rhodopsins, with 8 TMs including the N-terminal extra TM, TM0. Mutational analyses demonstrated that TM0 plays a crucial role in the enzymatic photoactivity. We further solved the crystal structures of the rhodopsin domain (3.5 Å) and PDE domain (2.1 Å) with their connecting linkers, which showed a rough sketch of the full-length Rh-PDE. Integrating these structures, we proposed a model of full-length Rh-PDE, based on the HS-AFM observations and computational modeling of the linker region. These findings provide insight into the photoactivation mechanisms of other 8-TM enzyme rhodopsins and expand the definition of rhodopsins.
Collapse
Affiliation(s)
- Tatsuya Ikuta
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo, Tokyo, 113-0033, Japan
| | - Wataru Shihoya
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo, Tokyo, 113-0033, Japan.
| | - Masahiro Sugiura
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
| | - Kazuho Yoshida
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
| | - Masahito Watari
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
| | - Takaya Tokano
- Department of Physics, Nagoya University, Nagoya, 464-8602, Japan
| | - Keitaro Yamashita
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo, Tokyo, 113-0033, Japan
| | - Kota Katayama
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
- OptoBioTechnology Research Center, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
| | - Satoshi P Tsunoda
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
- OptoBioTechnology Research Center, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan
| | - Takayuki Uchihashi
- Department of Physics, Nagoya University, Nagoya, 464-8602, Japan
- Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, 444-8787, Japan
| | - Hideki Kandori
- Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan.
- OptoBioTechnology Research Center, Nagoya Institute of Technology, Showa-Ku, Nagoya, 466-8555, Japan.
| | - Osamu Nureki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo, Tokyo, 113-0033, Japan.
| |
Collapse
|
91
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
92
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
93
|
The Last Secret of Protein Folding: The Real Relationship Between Long-Range Interactions and Local Structures. Protein J 2020; 39:422-433. [PMID: 33040262 DOI: 10.1007/s10930-020-09925-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2020] [Indexed: 01/20/2023]
Abstract
The protein folding problem has been extensively studied for decades, and hundreds of thousands of protein structures have been solved. Yet, how proteins fold from a linear peptide chain to their unique 3D structures is not fully understood. With key clues having emerged unexpectedly from the field of nanoscience, a "Confined Lowest Energy Fragment" (CLEF) hypothesis was proposed. The CLEF hypothesis states that a protein chain can be divided into CLEFs, the semi-independent folding units, by a small number of key residues that form key long-range interactions. The native structure of a CLEF is the lowest energy state under the constraints of the key long-range interactions, but the native structure of the whole protein is not necessary the lowest energy state as Anfinsen's thermodynamic hypothesis suggested. The CLEF hypothesis proposes a unified CLEF mechanism for protein folding, basically a two-step process. In the first step, the favorable enthalpy of CLEFs for native structures quickly brings those residues for the key long-range interactions together, forming intermediates corresponding to the so-called hydrophobic collapse. In the second step, those collapsed key residues shuffle for the right combination to form the native key long-range interactions. The CLEF hypothesis provides a simple solution to all protein folding paradoxes, and proposes a "CLEF Age" or "Stone Age" for the prebiotic evolution of proteins.
Collapse
|
94
|
Liu J, Zhou XG, Zhang Y, Zhang GJ. CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm. Bioinformatics 2020; 36:2443-2450. [PMID: 31860059 DOI: 10.1093/bioinformatics/btz943] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/10/2019] [Accepted: 12/18/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
95
|
Du Z, Pan S, Wu Q, Peng Z, Yang J. CATHER: a novel threading algorithm with predicted contacts. Bioinformatics 2020; 36:2119-2125. [PMID: 31790141 DOI: 10.1093/bioinformatics/btz876] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/31/2019] [Accepted: 11/28/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. RESULTS We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/CATHER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Shuo Pan
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
96
|
Shao J, Liu B. ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm. Brief Bioinform 2020; 22:5901980. [PMID: 32892224 DOI: 10.1093/bib/bbaa192] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 07/16/2020] [Accepted: 07/28/2020] [Indexed: 12/27/2022] Open
Abstract
As one of the most important tasks in protein structure prediction, protein fold recognition has attracted more and more attention. In this regard, some computational predictors have been proposed with the development of machine learning and artificial intelligence techniques. However, these existing computational methods are still suffering from some disadvantages. In this regard, we propose a new network-based predictor called ProtFold-DFG for protein fold recognition. We propose the Directed Fusion Graph (DFG) to fuse the ranking lists generated by different methods, which employs the transitive closure to incorporate more relationships among proteins and uses the KL divergence to calculate the relationship between two proteins so as to improve its generalization ability. Finally, the PageRank algorithm is performed on the DFG to accurately recognize the protein folds by considering the global interactions among proteins in the DFG. Tested on a widely used and rigorous benchmark data set, LINDAHL dataset, experimental results show that the ProtFold-DFG outperforms the other 35 competing methods, indicating that ProtFold-DFG will be a useful method for protein fold recognition. The source code and data of ProtFold-DFG can be downloaded from http://bliulab.net/ProtFold-DFG/download.
Collapse
Affiliation(s)
- Jiangyi Shao
- School of Computer Science and Technology, Beijing Institute of Technology, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
97
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
98
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
99
|
Gong Z, Ye SX, Tang C. Tightening the Crosslinking Distance Restraints for Better Resolution of Protein Structure and Dynamics. Structure 2020; 28:1160-1167.e3. [PMID: 32763142 DOI: 10.1016/j.str.2020.07.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 07/04/2020] [Accepted: 07/21/2020] [Indexed: 12/11/2022]
Abstract
Chemical crosslinking coupled with mass spectrometry (CXMS) has been increasingly used in structural biology. CXMS distance restraints are usually applied to Cα or Cβ atoms of the crosslinked residues, with upper bounds typically over 20 Å. The incorporation of loose CXMS restraints only marginally improves the resolution of the calculated structures. Here, we present a revised format of CXMS distance restraints, which works by first modifying the crosslinked residue with a rigid extension derived from the crosslinker. With the flexible side chain explicitly represented, the reformatted restraint can be applied to the modification group instead, with an upper bound of 6 Å or less. The short distance restraint can be represented and back-calculated simply with a straight line. The use of tighter restraints not only afford better-resolved structures but also uncover protein dynamics. Together, our approach enables more information extracted from the CXMS data.
Collapse
Affiliation(s)
- Zhou Gong
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, Hubei Province 430071, China
| | - Shang-Xiang Ye
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, Hubei Province 430071, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei Province 430074, China
| | - Chun Tang
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, Hubei Province 430071, China; Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei Province 430074, China; Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
100
|
Watkins AM, Rangan R, Das R. FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds. Structure 2020; 28:963-976.e6. [PMID: 32531203 PMCID: PMC7415647 DOI: 10.1016/j.str.2020.05.011] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 04/27/2020] [Accepted: 05/20/2020] [Indexed: 01/01/2023]
Abstract
Predicting RNA three-dimensional structures from sequence could accelerate understanding of the growing number of RNA molecules being discovered across biology. Rosetta's Fragment Assembly of RNA with Full-Atom Refinement (FARFAR) has shown promise in community-wide blind RNA-Puzzle trials, but lack of a systematic and automated benchmark has left unclear what limits FARFAR performance. Here, we benchmark FARFAR2, an algorithm integrating RNA-Puzzle-inspired innovations with updated fragment libraries and helix modeling. In 16 of 21 RNA-Puzzles revisited without experimental data or expert intervention, FARFAR2 recovers native-like structures more accurate than models submitted during the RNA-Puzzles trials. Remaining bottlenecks include conformational sampling for >80-nucleotide problems and scoring function limitations more generally. Supporting these conclusions, preregistered blind models for adenovirus VA-I RNA and five riboswitch complexes predicted native-like folds with 3- to 14 Å root-mean-square deviation accuracies. We present a FARFAR2 webserver and three large model archives (FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles) to guide future applications and advances.
Collapse
Affiliation(s)
- Andrew Martin Watkins
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ramya Rangan
- Biophysics Program, Stanford University, Stanford, CA 94305, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA; Biophysics Program, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|