1
|
Gniewek P, Kolinski A, Kloczkowski A, Gront D. BioShell-Threading: versatile Monte Carlo package for protein 3D threading. BMC Bioinformatics 2014; 15:22. [PMID: 24444459 PMCID: PMC3937128 DOI: 10.1186/1471-2105-15-22] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2012] [Accepted: 11/18/2013] [Indexed: 11/26/2022] Open
Abstract
Background The comparative modeling approach to protein structure prediction inherently relies on a template structure. Before building a model such a template protein has to be found and aligned with the query sequence. Any error made on this stage may dramatically affects the quality of result. There is a need, therefore, to develop accurate and sensitive alignment protocols. Results BioShell threading software is a versatile tool for aligning protein structures, protein sequences or sequence profiles and query sequences to a template structures. The software is also capable of sub-optimal alignment generation. It can be executed as an application from the UNIX command line, or as a set of Java classes called from a script or a Java application. The implemented Monte Carlo search engine greatly facilitates the development and benchmarking of new alignment scoring schemes even when the functions exhibit non-deterministic polynomial-time complexity. Conclusions Numerical experiments indicate that the new threading application offers template detection abilities and provides much better alignments than other methods. The package along with documentation and examples is available at: http://bioshell.pl/threading3d.
Collapse
Affiliation(s)
| | | | | | - Dominik Gront
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| |
Collapse
|
2
|
Sun W, He J. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One 2011; 6:e19238. [PMID: 21552527 PMCID: PMC3084275 DOI: 10.1371/journal.pone.0019238] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/29/2011] [Indexed: 11/19/2022] Open
Abstract
The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance.
Collapse
Affiliation(s)
- Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing, China.
| | | |
Collapse
|
3
|
Castiglione F, Santoni D, Rapin N. CTLs' repertoire shaping in the thymus: a Monte Carlo simulation. Autoimmunity 2011; 44:261-70. [PMID: 21244330 DOI: 10.3109/08916934.2011.523272] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The human immune system evolved a multi-layered control mechanism to eliminate self-reactive cells. Of these so-called tolerance induction mechanisms, lymphocytes T education in the thymus gland represents the very first one. This complicated process is not fully understood and quantitative models able to help in this endeavor are lacking. Here, we present a stochastic computational model of the thymus which combines data-driven prediction methods and a novel method based on protein-protein potential measurements for assessing molecular binding among cell receptors, major histocompatibility complex (MHC) molecules, and self-peptides. RESULTS Of all possible specificities of immature T cells entering the thymus, only a small fraction is actually selected for maturation. Monte Carlo simulations of thymocytes selection in the thymus are performed varying the size of the self and a parameter determining the number of encounter with antigen-presenting cells (APCs). We score the fraction of self-reacting thymocytes leaving the thymus as mature naive T cells and show that self-reactivity is only marginally dependent on the number of self-molecules presented by APCs, while it is strongly affected by a parameter proportional to the time spent in the thymus. We study how this measure changes when we vary the number of MHC alleles and found an optimal number not too different from what we have in reality. The main result of this study is more methodological than biological as we show that immunoinformatics data and methods can be used in systemic level simulation of immune processes.
Collapse
Affiliation(s)
- F Castiglione
- Istituto per le Applicazioni del Calcolo "M. Picone" (IAC), Consiglio Nazionale delle Ricerche (CNR), 00185 Rome, Italy.
| | | | | |
Collapse
|
4
|
Rapin N, Lund O, Bernaschi M, Castiglione F. Computational immunology meets bioinformatics: the use of prediction tools for molecular binding in the simulation of the immune system. PLoS One 2010; 5:e9862. [PMID: 20419125 PMCID: PMC2855701 DOI: 10.1371/journal.pone.0009862] [Citation(s) in RCA: 514] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Accepted: 02/19/2010] [Indexed: 01/21/2023] Open
Abstract
We present a new approach to the study of the immune system that combines techniques of systems biology with information provided by data-driven prediction methods. To this end, we have extended an agent-based simulator of the immune response, C-ImmSim, such that it represents pathogens, as well as lymphocytes receptors, by means of their amino acid sequences and makes use of bioinformatics methods for T and B cell epitope prediction. This is a key step for the simulation of the immune response, because it determines immunogenicity. The binding of the epitope, which is the immunogenic part of an invading pathogen, together with activation and cooperation from T helper cells, is required to trigger an immune response in the affected host. To determine a pathogen's epitopes, we use existing prediction methods. In addition, we propose a novel method, which uses Miyazawa and Jernigan protein-protein potential measurements, for assessing molecular binding in the context of immune complexes. We benchmark the resulting model by simulating a classical immunization experiment that reproduces the development of immune memory. We also investigate the role of major histocompatibility complex (MHC) haplotype heterozygosity and homozygosity with respect to the influenza virus and show that there is an advantage to heterozygosity. Finally, we investigate the emergence of one or more dominating clones of lymphocytes in the situation of chronic exposure to the same immunogenic molecule and show that high affinity clones proliferate more than any other. These results show that the simulator produces dynamics that are stable and consistent with basic immunological knowledge. We believe that the combination of genomic information and simulation of the dynamics of the immune system, in one single tool, can offer new perspectives for a better understanding of the immune system.
Collapse
Affiliation(s)
- Nicolas Rapin
- Biotech Research and Innovation Centre and Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Massimo Bernaschi
- Institute for Computing Applications, National Research Council, Rome, Italy
| | - Filippo Castiglione
- Institute for Computing Applications, National Research Council, Rome, Italy
- * E-mail:
| |
Collapse
|
5
|
Kinjo AR. Profile conditional random fields for modeling protein families with structural information. Biophysics (Nagoya-shi) 2009; 5:37-44. [PMID: 27857577 PMCID: PMC5036637 DOI: 10.2142/biophysics.5.37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Accepted: 05/12/2009] [Indexed: 12/01/2022] Open
Abstract
A statistical model of protein families, called profile conditional random fields (CRFs), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate arbitrary correlations in the sequences to be aligned to the model. In addition, like in the FR theory, the profile CRF can incorporate long-range pair-wise interactions between model states via mean-field-like approximations. We give the detailed formulation of the model, self-consistent approximations for treating long-range interactions, and algorithms for computing partition functions and marginal probabilities. We also outline the methods for the global optimization of model parameters as well as a Bayesian framework for parameter learning and selection of optimal alignments.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
6
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 737] [Impact Index Per Article: 46.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
7
|
Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007; 68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-0320, USA
| | | | | |
Collapse
|
8
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1778] [Impact Index Per Article: 104.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
9
|
Miyazawa S, Jernigan RL. How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? J Chem Phys 2006; 122:024901. [PMID: 15638624 DOI: 10.1063/1.1824012] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We estimate the statistical distribution of relative orientations between contacting residues from a database of protein structures and evaluate the potential of mean force for relative orientations between contacting residues. Polar angles and Euler angles are used to specify two degrees of directional freedom and three degrees of rotational freedom for the orientation of one residue relative to another in contacting residues, respectively. A local coordinate system affixed to each residue based only on main chain atoms is defined for fold recognition. The number of contacting residue pairs in the database will severely limit the resolution of the statistical distribution of relative orientations, if it is estimated by dividing space into cells and counting samples observed in each cell. To overcome such problems and to evaluate the fully anisotropic distributions of relative orientations as a function of polar and Euler angles, we choose a method in which the observed distribution is represented as a sum of delta functions each of which represents the observed orientation of a contacting residue, and is evaluated as a series expansion of spherical harmonics functions. The sample size limits the frequencies of modes whose expansion coefficients can be reliably estimated. High frequency modes are statistically less reliable than low frequency modes. Each expansion coefficient is separately corrected for the sample size according to suggestions from a Bayesian statistical analysis. As a result, many expansion terms can be utilized to evaluate orientational distributions. Also, unlike other orientational potentials, the uniform distribution is used for a reference distribution in evaluating a potential of mean force for each type of contacting residue pair from its orientational distribution, so that residue-residue orientations can be fully evaluated. It is shown by using decoy sets that the discrimination power of the orientational potential in fold recognition increases by taking account of the Euler angle dependencies and becomes comparable to that of a simple contact potential, and that the total energy potential taken as a simple sum of contact, orientation, and (phi,psi) potentials performs well to identify the native folds.
Collapse
Affiliation(s)
- Sanzo Miyazawa
- Faculty of Technology, Gunma University, Kiryu, Gunma 376-8515, Japan.
| | | |
Collapse
|
10
|
Cao HB, Wang CZ, Dobbs D, Ihm Y, Ho KM. Codability criterion for picking proteinlike structures from random three-dimensional configurations. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:031921. [PMID: 17025681 DOI: 10.1103/physreve.74.031921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2004] [Revised: 07/24/2006] [Indexed: 05/12/2023]
Abstract
We show that the dominant eigenvectors of real protein structural contact matrices are highly correlated with their amino acid sequences. These results suggests that an ab initio sequence-independent profile exists for every protein structure and that this profile is highly effective in differentiating the ordering of amino acids in natural protein sequences from random sequences. This profile provides a structural code and is a key for understanding the unique behavior of protein structures. Using a lattice model, we show that there are special codable structures highly separated from random structures in the dominant eigenvector space of their structural contact matrices. As an example, we show our results provide a good explanation to the "designable principle" of protein structures.
Collapse
Affiliation(s)
- Hai-Bo Cao
- Department of Physics and Astronomy, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | | | |
Collapse
|
11
|
Sen TZ, Kloczkowski A, Jernigan RL, Yan C, Honavar V, Ho KM, Wang CZ, Ihm Y, Cao H, Gu X, Dobbs D. Predicting binding sites of hydrolase-inhibitor complexes by combining several methods. BMC Bioinformatics 2004; 5:205. [PMID: 15606919 PMCID: PMC544855 DOI: 10.1186/1471-2105-5-205] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2004] [Accepted: 12/17/2004] [Indexed: 11/17/2022] Open
Abstract
Background Protein-protein interactions play a critical role in protein function. Completion of many genomes is being followed rapidly by major efforts to identify interacting protein pairs experimentally in order to decipher the networks of interacting, coordinated-in-action proteins. Identification of protein-protein interaction sites and detection of specific amino acids that contribute to the specificity and the strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Results In order to increase the power of predictive methods for protein-protein interaction sites, we have developed a consensus methodology for combining four different methods. These approaches include: data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. Results obtained on a dataset of hydrolase-inhibitor complexes demonstrate that the combination of all four methods yield improved predictions over the individual methods. Conclusions We developed a consensus method for predicting protein-protein interface residues by combining sequence and structure-based methods. The success of our consensus approach suggests that similar methodologies can be developed to improve prediction accuracies for other bioinformatic problems.
Collapse
Affiliation(s)
- Taner Z Sen
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Andrzej Kloczkowski
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
| | - Robert L Jernigan
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Changhui Yan
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Vasant Honavar
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Kai-Ming Ho
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA
| | - Cai-Zhuang Wang
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA
| | - Yungok Ihm
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA
| | - Haibo Cao
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA
| | - Xun Gu
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Drena Dobbs
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
12
|
Koike R, Kinoshita K, Kidera A. Probabilistic description of protein alignments for sequences and structures. Proteins 2004; 56:157-66. [PMID: 15162495 DOI: 10.1002/prot.20067] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A number of equally optimal alignments inherently exist in the sequence and structure comparisons among proteins. To represent the sub-optimal alignments systematically, we have developed a method of generating probabilistic alignments for sequences and structures, by which the correspondence between pairs of residues is evaluated in a probabilistic manner. Our method uses the periodic boundary condition to avoid the entropy artifact favoring full-length matches. In the structure comparison, the environmental effects are incorporated by the mean-field approximation. We applied this method in comparisons of two pairs of proteins with internal symmetry; the first set were proteins of TIM-barrel fold and the second were beta-trefoil fold. These pairs are expected to have distinct sub-optimal alignments suitable for probabilistic description with the periodic boundary. It was shown that the sequence and structure alignments are consistent with each other and that the alignments with the highest probability represent circular permutation.
Collapse
Affiliation(s)
- Ryotaro Koike
- Department of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa-Oiwake-cho, Sakyo-ku, Kyoto 606-8502, Japan
| | | | | |
Collapse
|
13
|
Cao H, Ihm Y, Wang CZ, Morris JR, Su M, Dobbs D, Ho KM. Three-dimensional threading approach to protein structure recognition. POLYMER 2004. [DOI: 10.1016/j.polymer.2003.10.091] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Miyazawa S, Jernigan RL. Long- and short-range interactions in native protein structures are consistent/minimally frustrated in sequence space. Proteins 2003; 50:35-43. [PMID: 12471597 DOI: 10.1002/prot.10242] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We show that long- and short-range interactions in almost all protein native structures are actually consistent with each other for coarse-grained energy scales; specifically we mean the long-range inter-residue contact energies and the short-range secondary structure energies based on peptide dihedral angles, which are potentials of mean force evaluated from residue distributions observed in protein native structures. This consistency is observed at equilibrium in sequence space rather than in conformational space. Statistical ensembles of sequences are generated by exchanging residues for each of 797 protein native structures with the Metropolis method. It is shown that adding the other category of interaction to either the short- or long-range interactions decreases the means and variances of those energies for essentially all protein native structures, indicating that both interactions consistently work by more-or-less restricting sequence spaces available to one of the interactions. In addition to this consistency, independence by these interaction classes is also indicated by the fact that there are almost no correlations between them when equilibrated using both interactions and significant but small, positive correlations at equilibrium using only one of the interactions. Evidence is provided that protein native sequences can be regarded approximately as samples from the statistical ensembles of sequences with these energy scales and that all proteins have the same effective conformational temperature. Designing protein structures and sequences to be consistent and minimally frustrated among the various interactions is a most effective way to increase protein stability and foldability.
Collapse
Affiliation(s)
- Sanzo Miyazawa
- Faculty of Technology, Gunma University, Kiryu, Gunma, Japan.
| | | |
Collapse
|
15
|
Smit E, Jager D, Martinez B, Tielen FJ, Pouwels PH. Structural and functional analysis of the S-layer protein crystallisation domain of Lactobacillus acidophilus ATCC 4356: evidence for protein-protein interaction of two subdomains. J Mol Biol 2002; 324:953-64. [PMID: 12470951 DOI: 10.1016/s0022-2836(02)01135-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The structure of the crystallisation domain, SAN, of the S(A)-protein of Lactobacillus acidophilus ATCC 4356 was analysed by insertion and deletion mutagenesis, and by proteolytic treatment. Mutant S(A)-protein synthesised in Escherichia coli with 7-13 amino acid insertions near the N terminus or within regions of sequence variation in SAN (amino acid position 7, 45, 114, 125, 193), or in the cell wall-binding domain (position 345) could form crystalline sheets, whereas insertions in conserved regions or in regions with predicted secondary structure elements (positions 30, 67, 88 and 156) destroyed this capacity. FACscan analysis of L.acidophilus synthesising three crystallising and one non-crystallising S(A)-protein c-myc (19 amino acid residues) insertion mutant was performed with c-myc antibodies. Fluorescence was most pronounced for insertions at positions 125 and 156, less for position 45 and severely reduced for position 7. By cytometric flow sorting a transformant harbouring the mutant S(A)-protein gene (position 125) was isolated that showed an increased fluorescense signal. Immunofluorescence microscopy suggested that the transformant synthesized mutant S(A)-protein only. PCR analysis of the transformant grown in the absence of selection pressure indicated that the mutant allele was stably integrated in the chromosome. Proteolytic treatment of S(A)-protein indicated that only sites near the middle of SAN are susceptible, although potential cleavage sites are present through the entire molecule. Expression in E.coli of DNA sequences encoding the two halves of SAN yielded peptides that could oligomerize. Our results indicate that SAN consists of a approximately 12kDa N and a approximately 18kDa C-terminal subdomain linked by a surface exposed loop. The capacity of S(A)-protein of L.acidophilus to present epitopes, up to approximately 19 amino acid residues in length, at the bacterial surface in a genetically stable form, makes the system, in principle, suitable for application as an oral delivery vehicle.
Collapse
Affiliation(s)
- Egbert Smit
- Department of Applied Microbiology and Gene Technology, TNO Nutrition and Food Research Institute, Utrechtseweg 48, 3700 AJ, Zeist, The Netherlands
| | | | | | | | | |
Collapse
|
16
|
Abstract
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.
Collapse
Affiliation(s)
- Francisco Melo
- Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA
| | | | | |
Collapse
|
17
|
Geourjon C, Combet C, Blanchet C, Deléage G. Identification of related proteins with weak sequence identity using secondary structure information. Protein Sci 2001; 10:788-97. [PMID: 11274470 PMCID: PMC2373959 DOI: 10.1110/ps.30001] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.
Collapse
Affiliation(s)
- C Geourjon
- Pôle BioInformatique Lyonnais, Institut de Biologie et Chimie des Protéines, Centre National de la Recherche Scientifique, UMR 5086, 69 367 Lyon CEDEX 07, France.
| | | | | | | |
Collapse
|