1
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
2
|
Torrisi M, Kaleel M, Pollastri G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci Rep 2019; 9:12374. [PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/12/2019] [Indexed: 01/10/2023] Open
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88–90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Manaz Kaleel
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
3
|
Karanji AK, Khakinejad M, Kondalaji SG, Majuta SN, Attanayake K, Valentine SJ. Comparison of Peptide Ion Conformers Arising from Non-Helical and Helical Peptides Using Ion Mobility Spectrometry and Gas-Phase Hydrogen/Deuterium Exchange. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:2402-2412. [PMID: 30324261 PMCID: PMC6553874 DOI: 10.1007/s13361-018-2053-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 07/17/2018] [Accepted: 08/03/2018] [Indexed: 05/06/2023]
Abstract
The dominant gas-phase conformer of [M+3H]3+ ions of the model peptide acetyl-PSSSSKSSSSKSSSSKSSSSK has been examined with ion mobility spectrometry (IMS), gas-phase hydrogen deuterium exchange (HDX), and mass spectrometry (MS) techniques. The [M+3H]3+ peptide ions are observed predominantly as a relatively compact conformer type. Upon subjecting these ions to electron transfer dissociation (ETD), the level of protection for each amino acid residue in the peptide sequence is assessed. The overall per-residue deuterium uptake is observed to be relatively more efficient for the neutral residues than for the model peptide acetyl-PAAAAKAAAAKAAAAKAAAAK. In comparison, the N-terminal and C-terminal regions of the serine peptide show greater relative protection compared with interior residues. Molecular dynamics (MD) simulations have been used to generate candidate structures for collision cross section and HDX reactivity matching. Hydrogen accessibility scoring (HAS) for select structural candidates from MD simulations has been used to suggest conformer types that could contribute to the observed HDX patterns. The results are discussed with respect to recent studies employing extensive MD simulations of gas-phase structure establishment of a peptide system. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Ahmad Kiani Karanji
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Mahdiar Khakinejad
- Department of Biophysics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Sandra N Majuta
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Kushani Attanayake
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Stephen J Valentine
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
4
|
Deller MC, Kong L, Rupp B. Protein stability: a crystallographer's perspective. ACTA CRYSTALLOGRAPHICA SECTION F-STRUCTURAL BIOLOGY COMMUNICATIONS 2016; 72:72-95. [PMID: 26841758 PMCID: PMC4741188 DOI: 10.1107/s2053230x15024619] [Citation(s) in RCA: 154] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 12/21/2015] [Indexed: 12/18/2022]
Abstract
Protein stability is a topic of major interest for the biotechnology, pharmaceutical and food industries, in addition to being a daily consideration for academic researchers studying proteins. An understanding of protein stability is essential for optimizing the expression, purification, formulation, storage and structural studies of proteins. In this review, discussion will focus on factors affecting protein stability, on a somewhat practical level, particularly from the view of a protein crystallographer. The differences between protein conformational stability and protein compositional stability will be discussed, along with a brief introduction to key methods useful for analyzing protein stability. Finally, tactics for addressing protein-stability issues during protein expression, purification and crystallization will be discussed.
Collapse
Affiliation(s)
- Marc C Deller
- Stanford ChEM-H, Macromolecular Structure Knowledge Center, Stanford University, Shriram Center, 443 Via Ortega, Room 097, MC5082, Stanford, CA 94305-4125, USA
| | - Leopold Kong
- Laboratory of Cell and Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Building 8, Room 1A03, 8 Center Drive, Bethesda, MD 20814, USA
| | - Bernhard Rupp
- Department of Forensic Crystallography, k.-k. Hofkristallamt, 91 Audrey Place, Vista, CA 92084, USA
| |
Collapse
|
5
|
Burgess AW, Ponnuswamy PK, Scheraga HA. Analysis of Conformations of Amino Acid Residues and Prediction of Backbone Topography in Proteins. Isr J Chem 2013. [DOI: 10.1002/ijch.197400022] [Citation(s) in RCA: 205] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
6
|
de Sousa MM, Munteanu CR, Pazos A, Fonseca NA, Camacho R, Magalhães AL. Amino acid pair- and triplet-wise groupings in the interior of α-helical segments in proteins. J Theor Biol 2010; 271:136-44. [PMID: 21130100 DOI: 10.1016/j.jtbi.2010.11.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2010] [Revised: 11/03/2010] [Accepted: 11/23/2010] [Indexed: 10/18/2022]
Abstract
A statistical approach has been applied to analyse primary structure patterns at inner positions of α-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse α-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the α-helix were considered as inner. Amino acid pairings i, i+k (k=1, 2, 3, 4, 5), were analysed and the corresponding 20×20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for α-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.
Collapse
Affiliation(s)
- Miguel M de Sousa
- REQUIMTE/University of Porto, Faculty of Sciences, R. Campo Alegre 687, 4169-007 Porto, Portugal
| | - Cristian R Munteanu
- REQUIMTE/University of Porto, Faculty of Sciences, R. Campo Alegre 687, 4169-007 Porto, Portugal; Computer Science Faculty, University of A Coruña, Campus de Elviña S/N, 15071A Coruña, Spain
| | - Alejandro Pazos
- Computer Science Faculty, University of A Coruña, Campus de Elviña S/N, 15071A Coruña, Spain
| | - Nuno A Fonseca
- CRACS-INESC Porto L.A., R. Campo Alegre 1021/1055, 4169-007 Porto, Portugal
| | - Rui Camacho
- LIAAD-INESC-Porto, DEI and FEUP, R. Dr. Roberto Frias s/n, 4200-465 Porto, Portugal
| | - A L Magalhães
- REQUIMTE/University of Porto, Faculty of Sciences, R. Campo Alegre 687, 4169-007 Porto, Portugal
| |
Collapse
|
7
|
Kernytsky A, Rost B. Using genetic algorithms to select most predictive protein features. Proteins 2009; 75:75-88. [PMID: 18798568 DOI: 10.1002/prot.22211] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning method's input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity.
Collapse
Affiliation(s)
- Andrew Kernytsky
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York 10032, New York, USA.
| | | |
Collapse
|
8
|
Fonseca NA, Camacho R, Magalhães AL. Amino acid pairing at the N- and C-termini of helical segments in proteins. Proteins 2008; 70:188-96. [PMID: 17654550 DOI: 10.1002/prot.21525] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 A and with a R-factor lower than 25%. A set of 5556 subsequences with alpha-helix or 3(10)-helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N-int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N-cap and C-cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu-Lys-Tyr-Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in alpha-helices when Pro is located at the C-cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site.
Collapse
Affiliation(s)
- Nuno A Fonseca
- IBMC and LIACC, R. Campo Alegre, 1021/1055, 4169-007 Porto, Portugal
| | | | | |
Collapse
|
9
|
Stability and Design of α-Helical Peptides. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2008; 83:1-52. [DOI: 10.1016/s0079-6603(08)00601-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
10
|
Liquori AM. The role of van der Waals interactions on the conformational stability of helical macromolecules. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/polc.5070120117] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
11
|
Chou PY, Fasman GD. Prediction of the secondary structure of proteins from their amino acid sequence. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2006; 47:45-148. [PMID: 364941 DOI: 10.1002/9780470122921.ch2] [Citation(s) in RCA: 878] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
12
|
Morse DE, Horecker BL. The mechanism of action of aldolases. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2006; 31:125-81. [PMID: 4880215 DOI: 10.1002/9780470122761.ch4] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
13
|
López-Llano J, Campos LA, Sancho J. Alpha-helix stabilization by alanine relative to glycine: roles of polar and apolar solvent exposures and of backbone entropy. Proteins 2006; 64:769-78. [PMID: 16755589 DOI: 10.1002/prot.21041] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The energetics of alpha-helix formation are fairly well understood and the helix content of a given amino acid sequence can be calculated with reasonable accuracy from helix-coil transition theories that assign to the different residues specific effects on helix stability. In internal helical positions, alanine is regarded as the most stabilizing residue, whereas glycine, after proline, is the more destabilizing. The difference in stabilization afforded by alanine and glycine has been explained by invoking various physical reasons, including the hydrophobic effect and the entropy of folding. Herein, the contribution of these two effects and that of hydrophilic area burial is evaluated by analyzing Ala and Gly mutants implemented in three helices of apoflavodoxin. These data, combined with available data for similar mutations in other proteins (22 Ala/Gly mutations in alpha-helices have been considered), allow estimation of the difference in backbone entropy between alanine and glycine and evaluation of its contribution and that of apolar and polar area burial to the helical stabilization typically associated to Gly-->Ala substitutions. Alanine consistently stabilizes the helical conformation relative to glycine because it buries more apolar area upon folding and because its backbone entropy is lower. However, the relative contribution of polar area burial (which is shown to be destabilizing) and of backbone entropy critically depends on the approximation used to model the structure of the denatured state. In this respect, the excised-peptide model of the unfolded state, proposed by Creamer and coworkers (1995), predicts a major contribution of polar area burial, which is in good agreement with recent quantitations of the relative enthalpic contribution of Ala and Gly residues to alpha-helix formation.
Collapse
Affiliation(s)
- J López-Llano
- Departamento de Bioquímica y Biología Molecular y Celular & Biocomputation and Complex Systems Physics Institute BIFI, Facultad de Ciencias, Universidad de Zaragoza, Zaragoza, Spain
| | | | | |
Collapse
|
14
|
Chakrabarti P, Pal D. The interrelationships of side-chain and main-chain conformations in proteins. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2001; 76:1-102. [PMID: 11389934 DOI: 10.1016/s0079-6107(01)00005-0] [Citation(s) in RCA: 174] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The accurate determination of a large number of protein structures by X-ray crystallography makes it possible to conduct a reliable statistical analysis of the distribution of the main-chain and side-chain conformational angles, how these are dependent on residue type, adjacent residue in the sequence, secondary structure, residue-residue interactions and location at the polypeptide chain termini. The interrelationship between the main-chain (phi, psi) and side-chain (chi 1) torsion angles leads to a classification of amino acid residues that simplify the folding alphabet considerably and can be a guide to the design of new proteins or mutational studies. Analyses of residues occurring with disallowed main-chain conformation or with multiple conformations shed some light on why some residues are less favoured in thermophiles.
Collapse
Affiliation(s)
- P Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12, CIT Scheme VIIM, 700 054, Calcutta, India. boseinst.ernet.in
| | | |
Collapse
|
15
|
Sun JK, Penel S, Doig AJ. Determination of alpha-helix N1 energies after addition of N1, N2, and N3 preferences to helix/coil theory. Protein Sci 2000; 9:750-4. [PMID: 10794417 PMCID: PMC2144615 DOI: 10.1110/ps.9.4.750] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Surveys of protein crystal structures have revealed that amino acids show unique structural preferences for the N1, N2, and N3 positions in the first turn of the alpha-helix. We have therefore extended helix-coil theory to include statistical weights for these locations. The helix content of a peptide in this model is a function of N-cap, C-cap, N1, N2, N3, C1, and helix interior (N4 to C2) preferences. The partition function for the system is calculated using a matrix incorporating the weights of the fourth residue in a hexamer of amino acids and is implemented using a FORTRAN program. We have applied the model to calculate the N1 preferences of Gln, Val, Ile, Ala, Met, Pro, Leu, Thr, Gly, Ser, and Asn, using our previous data on helix contents of peptides Ac-XAKAAAAKAAGY-CONH2. We find that Ala has the highest preference for the N1 position. Asn is the most unfavorable, destabilizing a helix at N1 by at least 1.4 kcal mol(-1) compared to Ala. The remaining amino acids all have similar preferences, 0.5 kcal mol(-1) less than Ala. Gln, Asn, and Ser, therefore, do not stabilize the helix when at N1.
Collapse
Affiliation(s)
- J K Sun
- Department of Biomolecular Sciences, UMIST, Manchester, United Kingdom
| | | | | |
Collapse
|
16
|
Jaenicke R. Stability and folding of domain proteins. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 1999; 71:155-241. [PMID: 10097615 DOI: 10.1016/s0079-6107(98)00032-7] [Citation(s) in RCA: 136] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- R Jaenicke
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, Germany
| |
Collapse
|
17
|
Abstract
The first three residues at the N terminus of the alpha-helix are called N1, N2 and N3. We surveyed 2102 alpha-helix N termini in 298 high-resolution, non-homologous protein crystal structures for N1, N2 and N3 amino acid and side-chain rotamer propensities and hydrogen-bonding patterns. We find strong structural preferences that are unique to these sites. The rotamer distributions as a function of amino acid identity and position in the helix are often explained in terms of hydrogen-bonding interactions to the free N1, N2 and N3 backbone NH groups. Notably, the "good N2" amino acid residues Gln, Glu, Asp, Asn, Ser, Thr and His preferentially form i, i or i,i+1 hydrogen bonds to the backbone, though this is hindered by good N-caps (Asp, Asn, Ser, Thr and Cys) that compete for these hydrogen bond donors. We find a number of specific side-chain to side-chain interactions between N1 and N2 or between the N-cap and N2 or N3, such as Arg(N-cap) to Asp(N2). The strong energetic and structural preferences found for N1, N2 and N3, which differ greatly from positions within helix interiors, suggest that these sites should be treated explicitly in any consideration of helical structure in peptides or proteins.
Collapse
Affiliation(s)
- S Penel
- Department of Biomolecular Sciences, UMIST, Manchester, M60 1QD, UK
| | | | | |
Collapse
|
18
|
Abstract
The average globular protein contains 30% alpha-helix, the most common type of secondary structure. Some amino acids occur more frequently in alpha-helices than others; this tendency is known as helix propensity. Here we derive a helix propensity scale for solvent-exposed residues in the middle positions of alpha-helices. The scale is based on measurements of helix propensity in 11 systems, including both proteins and peptides. Alanine has the highest helix propensity, and, excluding proline, glycine has the lowest, approximately 1 kcal/mol less favorable than alanine. Based on our analysis, the helix propensities of the amino acids are as follows (kcal/mol): Ala = 0, Leu = 0.21, Arg = 0.21, Met = 0.24, Lys = 0.26, Gln = 0.39, Glu = 0.40, Ile = 0.41, Trp = 0.49, Ser = 0.50, Tyr = 0. 53, Phe = 0.54, Val = 0.61, His = 0.61, Asn = 0.65, Thr = 0.66, Cys = 0.68, Asp = 0.69, and Gly = 1.
Collapse
Affiliation(s)
- C N Pace
- Department of Medical Biochemistry and Genetics, Texas A&M University, College Station, Texas 77843-1114, USA.
| | | |
Collapse
|
19
|
Fernández-Recio J, Sancho J. Intrahelical side chain interactions in alpha-helices: poor correlation between energetics and frequency. FEBS Lett 1998; 429:99-103. [PMID: 9657391 DOI: 10.1016/s0014-5793(98)00569-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Polypeptide sequences in proteins may increase their tendency to adopt helical conformations in several ways. One is the recruiting of amino acid residues with high helical propensity. Another is the appropriate distribution of residues along the helix to establish stabilising side chain interactions. The first strategy is known to be followed by natural proteins because amino acids with high helical propensity are more frequent in alpha-helices. If proteins also followed the second strategy, stabilising amino acid pairs should be more frequent than others. To test this possibility we compared empirical energies of side chain interactions in alpha-helices with statistical energies calculated from a data base of proteins with low homology. We find some correlation between the stability afforded by the pairs and their relative abundance in alpha-helices but the realisation of energetic preferences into statistical preferences is very low. This indicates that natural alpha-helices do not regularly use intrahelical side chain interactions to increase their stability.
Collapse
Affiliation(s)
- J Fernández-Recio
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Spain
| | | |
Collapse
|
20
|
Matthews BW. Picture story. Nice guys needn't finish last. NATURE STRUCTURAL BIOLOGY 1997; 4:518. [PMID: 9228941 DOI: 10.1038/nsb0797-518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- B W Matthews
- Institute of Molecular Biology, Howard Hughes Medical Institute, USA
| |
Collapse
|
21
|
Szilák L, Moitra J, Krylov D, Vinson C. Phosphorylation destabilizes alpha-helices. NATURE STRUCTURAL BIOLOGY 1997; 4:112-4. [PMID: 9033589 DOI: 10.1038/nsb0297-112] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Phosphorylation of threonine destabilizes the leucine zipper of a bZIP protein by 4.6 kcal mol-1 dimer-1, which reduces DNA binding 100-fold. This decrease in stability reflects the low alpha-helix forming propensity of a phosphorylated threonine.
Collapse
|
22
|
MCDONOUGH MW. AMINO ACID COMPOSITION OF ANTIGENICALLY DISTINCT SALMONELLA FLAGELLAR PROTEINS. J Mol Biol 1996; 12:342-55. [PMID: 14337498 DOI: 10.1016/s0022-2836(65)80258-3] [Citation(s) in RCA: 88] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Zhang CT, Zhang Z, He Z. Prediction of the secondary structure content of globular proteins based on structural classes. JOURNAL OF PROTEIN CHEMISTRY 1996; 15:775-86. [PMID: 9008302 DOI: 10.1007/bf01887152] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The prediction of the secondary structure content (alpha-helix and beta-strand content) of a globular protein may play an important complementary role in the prediction of the protein's structure. We propose a new prediction algorithm based on Chou's database [Chou (1995), Proteins Struct. Funct. Genet. 21, 319]. The new algorithm is an improved multiple linear regression method, taking the nonlinear and coupling terms of the frequencies of different amino acids into account. The prediction is also based on the structural classes of proteins. A resubstitution examination for the algorithm shows that the average errors are 0.040 and 0.033 for the prediction of alpha-helix content and beta-strand content, respectively. The examination of cross-validation, the jackknife analysis, shows that the average errors are 0.051 and 0.044 for the prediction of alpha-helix content and beta-strand content, respectively. Both examinations indicate the self-consistency and the extrapolative effectiveness of the new algorithm. Compared with the other methods available currently, our method has the merits of simplicity and convenience for use, as well as a high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition of the protein to be predicted.
Collapse
Affiliation(s)
- C T Zhang
- Department of Physics, Tianjin University, China
| | | | | |
Collapse
|
24
|
|
25
|
Abstract
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.
Collapse
Affiliation(s)
- I Dubchak
- Department of Chemistry, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
26
|
Holbrook SR. Application of computational neural networks to the prediction of protein structural features. GENETIC ENGINEERING 1993; 15:1-19. [PMID: 7763836 DOI: 10.1007/978-1-4899-1666-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- S R Holbrook
- Structural Biology Division, Lawrence Berkeley Laboratory, Berkeley, CA 94720
| |
Collapse
|
27
|
Chen CC, Zhu Y, King JA, Evans LB. A molecular thermodynamic approach to predict the secondary structure of homopolypeptides in aqueous systems. Biopolymers 1992; 32:1375-92. [PMID: 1420965 DOI: 10.1002/bip.360321011] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Under physiological conditions, many polypeptide chains spontaneously fold into discrete and tightly packed three-dimensional structures. The folded polypeptide chain conformation is believed to represent a minimum Gibbs energy of the system, governed by the weak interactions that operate between the amino acid residues and between the residues and the solvent. A semiempirical molecular thermodynamic model is proposed to represent the Gibbs energy of folding of aqueous homopolypeptide systems. The model takes into consideration both the entropy contribution and the enthalpy contribution of folding homopolypeptide chains in aqueous solutions. The entropy contribution is derived from the Flory-Huggins expression for the entropy of mixing. It accounts for the entropy loss in folding a random-coiled polypeptide chain into a specific polypeptide conformation. The enthalpy contribution is derived from a molecular segment-based Non-Random Two Liquid (NRTL) local composition model [H. Renon and J. M. Prausnitz (1968) AIChE J., Vol. 14, pp. 135-142; C.-C. Chen and L. B. Evans (1986) AIChE J., Vol. 32, pp. 444-454], which takes into consideration of the residue-residue, residue-solvent, and solvent-solvent binary physical interactions along with the local compositions of amino acid residues in aqueous homopolypeptides. The UNIFAC group contribution method [A. Fredenslund, R. L. Jones, and J. M. Prausnitz (1975) AIChE J., 21, 1086-1099; A. Fredenslund, J. Gmehling, and P. Rasmussen (1977) Vapor-Liquid Equilibrium Using UNIFAC, Elsevier Scientific Publishing Company, Amsterdam], developed originally to estimate the excess Gibbs energy of solutions of small molecules, was used to estimate the NRTL binary interaction parameters. The model yields a hydrophobicity scale for the 20 amino acid side chains, which compares favorably with established scales [Y. Nozaki and C. Tanford (1971) Journal of Biological Chemistry, Vol. 46, pp. 2211-2217; E. B. Leodidis and T. A. Hatton (1990) Journal of Physical Chemistry, Vol. 94, pp. 6411-6420]. In addition, the model generates qualitatively correct thermodynamic constants and it accurately predicts thermodynamically favorable folding of a number of aqueous homopolypeptides from random-coiled states into alpha-helices. The model further facilitates estimation of the Zimm-Bragg helix growth parameter s and the nucleation parameter sigma for amino acid residues [B. H. Zimm and J. K. Bragg (1959) Journal of Chemical Physics, Vol. 31, pp. 526-535]. The calculated values of the two parameters fall into the ranges suggested by Zimm and Bragg.
Collapse
Affiliation(s)
- C C Chen
- Aspen Technology, Inc., Cambridge, Massachusetts
| | | | | | | |
Collapse
|
28
|
Abstract
A priori knowledge of secondary structure content can be of great use in theoretical and experimental determination of protein structure. We present a method that uses two computer-simulated neural networks placed in "tandem" to predict the secondary structure content of water-soluble, globular proteins. The first of the two networks, NET1, predicts a protein's helix and strand content given information about the protein's amino acid composition, molecular weight and heme presence. Because NET1 contained more adjustable parameters (network weights) than learning examples, this network experienced problems with memorization, which is the inability to generalize onto new, never-seen-before examples. To overcome this problem, we designed a second network, NET2, which learned to determine when NET1 was in a state of generalization. Together, these two networks produce prediction errors as low as 5.0% and 5.6% for helix and strand content, respectively, on a set of protein crystal structures bearing little homology to those used in network training. A comparison between three other methods including a multiple linear regression analysis, a non-hidden-node network analysis and a secondary structure assignment analysis reveals that our tandem neural network scheme is, indeed, the best method for predicting secondary structure content. The results of our analysis suggest that the knowledge of sequence information is not necessary for highly accurate predictions of protein secondary structure content.
Collapse
Affiliation(s)
- S M Muskal
- Department of Chemistry, University of California, Berkeley 94720
| | | |
Collapse
|
29
|
Fukugita M, Nakazawa T, Kawai H, Okamoto Y. Monte Carlo Simulated Annealing Prediction for α-Helix Propensity of Amino Acid Homopolymers. CHEM LETT 1991. [DOI: 10.1246/cl.1991.1279] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
30
|
Barba D, He Z, Marrelli L. COMPUTER MODELING OF PROTEIN STRUCTURES: ENERGY MINIMIZATION AS A TOOL FOR THE DESIGN OF NOVEL MOLECULES. REV CHEM ENG 1991. [DOI: 10.1515/revce.1991.7.1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
31
|
Yada RY, Jackman RL, Nakai S. Secondary structure prediction and determination of proteins--a review. INTERNATIONAL JOURNAL OF PEPTIDE AND PROTEIN RESEARCH 1988; 31:98-108. [PMID: 3284835 DOI: 10.1111/j.1399-3011.1988.tb00011.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The rapid increase in sequence data in combination with a greater understanding of the forces regulating protein structure has been the impetus for an upsurge in the development of theoretical prediction methods. These methods have afforded protein chemists the ability to identify and quantify the various secondary structures along the protein chain. Concurrently, various physico-chemical techniques have been developed such as nuclear Overhauser enhancement n.m.r. and laser Raman spectroscopy. In addition, traditional methods such as infrared and circular dichroism spectroscopy have been refined. Although both predictive and physico-chemical techniques are limited in the types of secondary structure they are capable of determining, they have provided valuable information with regards to protein folding and topology in the absence of X-ray data, and have formed the basis for the development of improved methods for secondary structure determination. This paper reviews some of the predictive and physico-chemical methods presently used to determine protein secondary structure.
Collapse
Affiliation(s)
- R Y Yada
- Department of Food Science, University of Guelph, Ontario, Canada
| | | | | |
Collapse
|
32
|
Novotný J, Auffray C. A program for prediction of protein secondary structure from nucleotide sequence data: application to histocompatibility antigens. Nucleic Acids Res 1984; 12:243-55. [PMID: 6546418 PMCID: PMC321001 DOI: 10.1093/nar/12.1part1.243] [Citation(s) in RCA: 132] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A computer program is described which, given a nucleotide or an amino acid sequence, outputs protein secondary structure prediction curves as well as hydrophobicity and charged-residue profiles. The program allows for cumulative averaging of properties (secondary structure propensities, hydrophobicity and charge profiles) from several homologous primary structures, a novel concept shown to improve the predictive accuracy. The use of the program is demonstrated on a set of nucleotide and amino acid sequences from human and murine histocompatibility antigens of class I and II. The last extracellular domains of both class I and II antigens (alpha 3 of class I, alpha 2 and beta 2 of class II) and the beta 2-microglobulin domain are predicted to consist of seven anti-parallel beta-strands, in accord with previous claims of homology between these domains and the constant domains of immunoglobulin chains. The remaining extracellular domains are all proposed to form an anti-parallel, four-stranded beta-sheet with one of its faces being covered by alpha-helices and/or structureless segments ("open face sandwiches").
Collapse
|
33
|
|
34
|
|
35
|
A highly stable adenosine triphosphatase from a thermophillie bacterium. Purification, properties, and reconstitution. J Biol Chem 1975. [DOI: 10.1016/s0021-9258(19)40902-2] [Citation(s) in RCA: 174] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
36
|
|
37
|
Pullman B, Pullman A. Molecular orbital calculations on the conformation of amino acid residues of proteins. ADVANCES IN PROTEIN CHEMISTRY 1974; 28:347-526. [PMID: 4598825 DOI: 10.1016/s0065-3233(08)60233-8] [Citation(s) in RCA: 160] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
38
|
Proton magnetic resonance study of conformational transitions in heterogeneous oxidized-wool proteins. POLYMER 1973. [DOI: 10.1016/0032-3861(73)90163-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
|
40
|
Kabat EA, Wu TT. The influence of nearest-neighboring amino acid residues on aspects of secondary structure of proteins. Attempts to locate -helices and -sheets. Biopolymers 1973; 12:751-74. [PMID: 4695672 DOI: 10.1002/bip.1973.360120406] [Citation(s) in RCA: 43] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
41
|
Wu TT, Kabat EA. An attempt to evaluate the influence of neighboring amino acids (n-1) and (n+1) on the backbone conformation of amino acid (n) in proteins. Use in predicting the three-dimensional structure of the polypeptide backbone of other proteins. J Mol Biol 1973; 75:13-31. [PMID: 4351543 DOI: 10.1016/0022-2836(73)90526-3] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
42
|
Maigret B, Perahia D, Pullman B. Molecular orbital calculations on the conformation of polypeptides and proteins. V. Conformational energy maps and stereochemical rotational states of aliphatic residues. Biopolymers 1971; 10:491-511. [PMID: 5552657 DOI: 10.1002/bip.360100306] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
43
|
|
44
|
|
45
|
Molecular orbital calculations on the conformation of polypeptides and proteins. ACTA ACUST UNITED AC 1970. [DOI: 10.1007/bf01185857] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
46
|
Johnson P, Miller JN. Studies on Waldenström macroglobulins. II. Optical rotatory dispersion. BIOCHIMICA ET BIOPHYSICA ACTA 1970; 207:308-17. [PMID: 5450132 DOI: 10.1016/0005-2795(70)90023-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
47
|
|
48
|
Scanu AM. The effect of reduction and carboxymethylation on the circular dichroic spectra of two polypeptide classes of serum high density lipoprotein. BIOCHIMICA ET BIOPHYSICA ACTA 1970; 200:570-2. [PMID: 5436648 DOI: 10.1016/0005-2795(70)90114-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
49
|
Gotto AM. Recent studies on the structure of human serum low-and high-density lipoproteins. Proc Natl Acad Sci U S A 1969; 64:1119-27. [PMID: 5264142 PMCID: PMC223351 DOI: 10.1073/pnas.64.3.1119] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Several methods have been recently developed for the preparation of soluble apo-low-density lipoprotein (apoLLL) Delipidation of LDL alters the immunochemical activity of the molecule and causes an apparent increase in random structure. Despite these changes, soluble apoLDL retains prominent immunological and optical characteristics of native LDL. The lipid of LDL appears to stabilize the protein conformation of LDL derivatives. Optical measurements suggest that native LDL contains a significant amount of pleatedsheet, antiparallel chain beta-structure in addition to random and probably some helical structure, while HDL, by optical criteria, is relatively richer in the alpha-helical conformation. Two proteins containing C-terminal glutamine and C-terminal threonine have been isolated from high-density lipoprotein (HDL). Circular dichroic measurements and total amino acid content are consistent with a greater helical content of apoHDL-Thr than apoHDL-Gln. The techniques of nuclear magnetic resonance and electron spin resonance have been used to probe the structure of LDL and HDL. The presence of protein does not seem to exert a constraining effect on the proton resonance of lipoprotein lipids, while the presence of lipid does appreciably constrain nitroxide tumbling in spin-labeled lipoprotein protein, the constraint being relatively greater in HDL than in LDL. Existing structural evidence is consistent with lipoprotein models in which protein and phospholipid occupy the surface and other lipids are more internal.
Collapse
|
50
|
Gotto AM, Shore B. Conformation of human serum high density lipoprotein and its peptide components. Nature 1969; 224:69-70. [PMID: 5822907 DOI: 10.1038/224069a0] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|