1
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|
2
|
Elofsson A, Joo K, Keasar C, Lee J, Maghrabi AHA, Manavalan B, McGuffin LJ, Ménendez Hurtado D, Mirabello C, Pilstål R, Sidi T, Uziela K, Wallner B. Methods for estimation of model accuracy in CASP12. Proteins 2017; 86 Suppl 1:361-373. [DOI: 10.1002/prot.25395] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 09/25/2017] [Accepted: 10/03/2017] [Indexed: 12/28/2022]
Affiliation(s)
- Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Keehyoung Joo
- Center for In Silico Protein Science and Center for Advanced Computation; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Chen Keasar
- Department of Computer Science; Ben Gurion University of the Negev; Israel
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Ali H. A. Maghrabi
- School of Biological Sciences; University of Reading, Whiteknights, Reading; RG6 6AS United Kingdom
| | - Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Liam J. McGuffin
- School of Biological Sciences; University of Reading, Whiteknights, Reading; RG6 6AS United Kingdom
| | - David Ménendez Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Claudio Mirabello
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| | - Robert Pilstål
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| | - Tomer Sidi
- Department of Computer Science; Ben Gurion University of the Negev; Israel
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| |
Collapse
|
3
|
Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 2009; 15:1093-108. [PMID: 19234730 PMCID: PMC2712621 DOI: 10.1007/s00894-009-0454-9] [Citation(s) in RCA: 207] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Accepted: 01/02/2009] [Indexed: 12/01/2022]
Abstract
The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Collapse
Affiliation(s)
- Elizabeth Durham
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 465 21st Ave South, Nashville, TN 37232-8725, USA
| | | | | | | | | |
Collapse
|
4
|
Eramian D, Eswar N, Shen MY, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci 2008; 17:1881-93. [PMID: 18832340 DOI: 10.1110/ps.036061.108] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Calpha root-mean-squared deviation (RMSD) and native overlap (NO3.5A) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5A errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, University of California at San Francisco, California 94158, USA
| | | | | | | |
Collapse
|
5
|
Fernandez-Fuentes N, Rai BK, Madrid-Aliste CJ, Fajardo JE, Fiser A. Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics 2007; 23:2558-65. [PMID: 17823132 DOI: 10.1093/bioinformatics/btm377] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Two major bottlenecks in advancing comparative protein structure modeling are the efficient combination of multiple template structures and the generation of a correct input target-template alignment. RESULTS A novel method, Multiple Mapping Method with Multiple Templates (M4T) is introduced that implements an algorithm to automatically select and combine Multiple Template structures (MT) and an alignment optimization protocol (Multiple Mapping Method, MMM). The MT module of M4T selects and combines multiple template structures through an iterative clustering approach that takes into account the 'unique' contribution of each template, their sequence similarity among themselves and to the target sequence, and their experimental resolution. MMM is a sequence-to-structure alignment method that optimally combines alternatively aligned regions according to their fit in the structural environment of the template structure. The resulting M4T alignment is used as input to a comparative modeling module. The performance of M4T has been benchmarked on CASP6 comparative modeling target sequences and on a larger independent test set, and showed favorable performance to current state of the art methods.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Department of Biochemistry and Seaver Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | | | | | | | | |
Collapse
|
6
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
7
|
Pisabarro MT, Leung B, Kwong M, Corpuz R, Frantz GD, Chiang N, Vandlen R, Diehl LJ, Skelton N, Kim HS, Eaton D, Schmidt KN. Cutting Edge: Novel Human Dendritic Cell- and Monocyte-Attracting Chemokine-Like Protein Identified by Fold Recognition Methods. THE JOURNAL OF IMMUNOLOGY 2006; 176:2069-73. [PMID: 16455961 DOI: 10.4049/jimmunol.176.4.2069] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Chemokines play an important role in the immune system by regulating cell trafficking in homeostasis and inflammation. In this study, we report the identification and characterization of a novel cytokine-like protein, DMC (dendritic cell and monocyte chemokine-like protein), which attracts dendritic cells and monocytes. The key to the identification of this putative new chemokine was the application of threading techniques to its uncharacterized sequence. Based on our studies, DMC is predicted to have an IL-8-like chemokine fold and to be structurally and functionally related to CXCL8 and CXCL14. Consistent with our predictions, DMC induces migration of monocytes and immature dendritic cells. Expression studies show that DMC is constitutively expressed in lung, suggesting a potential role for DMC in recruiting monocytes and dendritic cells from blood into lung parenchyma.
Collapse
Affiliation(s)
- M Teresa Pisabarro
- Department of Protein Engineering, Genentech Inc., South Francisco, CA 94080, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Xu Z, Fang S, Shi H, Li H, Deng Y, Liao Y, Wu JM, Zheng H, Zhu H, Chen HM, Tsang SY, Xue H. Topology characterization of a benzodiazepine-binding beta-rich domain of the GABAA receptor alpha1 subunit. Protein Sci 2005; 14:2622-37. [PMID: 16195550 PMCID: PMC2253290 DOI: 10.1110/ps.051555205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Structural investigation of GABAA receptors has been limited by difficulties imposed by its trans-membrane-complex nature. In the present study, the topology of a membrane-proximal beta-rich (MPB) domain in the C139-L269 segment of the receptor alpha1 subunit was probed by mapping the benzodiazepine (BZ)-binding and epitopic sites, as well as fluorescence resonance energy transfer (FRET) analysis. Ala-scanning and semiconservative substitutions within this segment revealed the contribution of the phenyl rings of Y160 and Y210, the hydroxy group of S186 and the positive charge on R187 to BZ-binding. FRET with the bound BZ ligand indicated the proximity of Y160, S186, R187, and S206 to the BZ-binding site. On the other hand, epitope-mapping using the monoclonal antibodies (mAbs) against the MPB domain established a clustering of T172, R173, E174, Q196, and T197. Based on the lack of FRET between Trp substitutionally placed at R173 or V198 and bound BZ, this epitope-mapped cluster is located on a separate end of the folded protein from the BZ-binding site. Mutations of the five conserved Cys and Trp residues in the MPB domain gave rise to synergistic and rescuing effects on protein secondary structures and unfolding stability that point to a CCWCW-pentad, reminiscent to the CWC-triad "pin" of immunoglobulin (Ig)-like domains, important for the structural maintenance. These findings, together with secondary structure and fold predictions suggest an anti-parallel beta-strand topology with resemblance to Ig-like fold, having the BZ-binding and the epitopic residues being clustered at two different ends of the fold.
Collapse
Affiliation(s)
- Zhiwen Xu
- Department of Biochemistry, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Abstract
Target- and ligand-based virtual screening have emerged as resource-saving techniques that have been successfully applied to identify novel chemotypes in biologically active molecules. Eight confirmed virtual screening hits have recently been described and are discussed in this review, with focus on the workflow. These are then evaluated in the light of pharmacokinetics prediction (e.g. Caco-2 permeability, cytochrome P450 inhibition and hERG binding). We anticipate problems for five of these hits (e.g. cardiac toxicity), which warrant further experiments. Future challenges include dynamic tautomer/protonation treatment for both ligands and targets and improved pre- and post- virtual screening filters.
Collapse
Affiliation(s)
- Tudor I Oprea
- Division of Biocomputing, University of New Mexico School of Medicine, MSC 08 4560, 1 University of New Mexico, Albuquerque, New Mexico 87131-0001, USA.
| | | |
Collapse
|
10
|
Yu M, Schreek S, Cerni C, Schamberger C, Lesniewicz K, Poreba E, Vervoorts J, Walsemann G, Grötzinger J, Kremmer E, Mehraein Y, Mertsching J, Kraft R, Austen M, Lüscher-Firzlaff J, Lüscher B. PARP-10, a novel Myc-interacting protein with poly(ADP-ribose) polymerase activity, inhibits transformation. Oncogene 2005; 24:1982-93. [PMID: 15674325 DOI: 10.1038/sj.onc.1208410] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The proto-oncoprotein c-Myc functions as a transcriptional regulator that controls different aspects of cell behavior, including proliferation, differentiation, and apoptosis. In addition, Myc proteins have the potential to transform cells and are deregulated in the majority of human cancers. Several Myc-interacting factors have been described that mediate part of Myc's functions in the control of cell behavior. Here, we describe the isolation of a novel 150 kDa protein, designated PARP-10, that interacts with Myc. PARP-10 possesses domains with homology to RNA recognition motifs and to poly(ADP-ribose) polymerases (PARP). Molecular modeling and biochemical analysis define a PARP domain that is capable of ADP-ribosylating PARP-10 itself and core histones, but neither Myc nor Max. PARP-10 is localized to the nuclear and cytoplasmic compartments that is controlled at least in part by a Leu-rich nuclear export sequence (NES). Functionally, PARP-10 inhibits c-Myc- and E1A-mediated cotransformation of rat embryo fibroblasts, a function that is independent of PARP activity but that depends on a functional NES. Together, our findings define a novel PARP enzyme involved in the control of cell proliferation.
Collapse
Affiliation(s)
- Mei Yu
- Abteilung Biochemie und Molekularbiologie, Institut für Biochemie, Klinikum der RWTH, Pauwelsstrasse 30, 52057 Aachen, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Hoffmann B, Eichmüller C, Steinhauser O, Konrat R. Rapid Assessment of Protein Structural Stability and Fold Validation via NMR. Methods Enzymol 2005; 394:142-75. [PMID: 15808220 DOI: 10.1016/s0076-6879(05)94006-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
In structural proteomics, it is necessary to efficiently screen in a high-throughput manner for the presence of stable structures in proteins that can be subjected to subsequent structure determination by X-ray or NMR spectroscopy. Here we illustrate that the (1)H chemical distribution in a protein as detected by (1)H NMR spectroscopy can be used to probe protein structural stability (e.g., the presence of stable protein structures) of proteins in solution. Based on experimental data obtained on well-structured proteins and proteins that exist in a molten globule state or a partially folded alpha-helical state, a well-defined threshold exists that can be used as a quantitative benchmark for protein structural stability (e.g., foldedness) in solution. Additionally, in this chapter we describe a largely automated strategy for rapid fold validation and structure-based backbone signal assignment. Our methodology is based on a limited number of NMR experiments (e.g., HNCA and 3D NOESY-HSQC) and performs a Monte Carlo-type optimization. The novel feature of the method is the opportunity to screen for structural fragments (e.g., template scanning). The performance of this new validation tool is demonstrated with applications to a diverse set of proteins.
Collapse
Affiliation(s)
- Bernd Hoffmann
- Institute of Theoretical Chemistry and Molecular Structural Biology, University of Vienna, Austria
| | | | | | | |
Collapse
|
12
|
de Bakker PIW, DePristo MA, Burke DF, Blundell TL. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins 2003; 51:21-40. [PMID: 12596261 DOI: 10.1002/prot.10235] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific phi/psi propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 A for 4-mers to 2.9 A for 8-mers to 6.2 A for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 A for 4-mers, 2.3 A for 8-mers, and 5.0 A for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 A) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 A for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling.
Collapse
Affiliation(s)
- Paul I W de Bakker
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
| | | | | | | |
Collapse
|
13
|
Juan D, Graña O, Pazos F, Fariselli P, Casadio R, Valencia A. A neural network approach to evaluate fold recognition results. Proteins 2003; 50:600-8. [PMID: 12577266 DOI: 10.1002/prot.10322] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Fold recognition techniques assist the exploration of protein structures, and web-based servers are part of the standard set of tools used in the analysis of biochemical problems. Despite their success, current methods are only able to predict the correct fold in a relatively small number of cases. We propose an approach that improves the selection of correct folds from among the results of two methods implemented as web servers (SAMT99 and 3DPSSM). Our approach is based on the training of a system of neural networks with models generated by the servers and a set of associated characteristics such as the quality of the sequence-structure alignment, distribution of sequence features (sequence-conserved positions and apolar residues), and compactness of the resulting models. Our results show that it is possible to detect adequate folds to model 80% of the sequences with a high level of confidence. The improvements achieved by taking into account sequence characteristics open the door to future improvements by directly including such factors in the step of model generation. This approach has been implemented as an automatic system LIBELLULA, available as a public web server at http://www.pdg.cnb.uam.es/servers/libellula.html.
Collapse
Affiliation(s)
- D Juan
- Protein Design Group, National Center for Biotechnology, CNB-CSIC, Campus Universidad Autónoma, Cantoblanco, Madrid, M-28049, Spain
| | | | | | | | | | | |
Collapse
|
14
|
Klepeis JL, Floudas CA. Prediction of beta-sheet topology and disulfide bridges in polypeptides. J Comput Chem 2003; 24:191-208. [PMID: 12497599 DOI: 10.1002/jcc.10167] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An ab initio method has been developed to predict beta architectures in polypeptides. The approach predicts the topology of beta-sheets and disulfide bridges through a novel superstructure-based mathematical framework originally established for chemical process synthesis problems. Two types of superstructure are introduced, both of which emanate from the principle that hydrophobic interactions drive the formation of a beta-structure. The mathematical formulation of the problem results in a set of integer linear programming (ILP) problems that can be solved to global optimality to identify the optimal beta-configuration. These (ILP) models can also predict a ranked ordered list of the best, second-best, third-best, etc., topologies of beta-sheets and disulfide bridges. The approach is shown to perform very well for several benchmark polypeptide systems, as well as polypeptides exhibiting challenging nonsequential beta-sheet topologies folds (56 to 187 amino acids).
Collapse
Affiliation(s)
- J L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | |
Collapse
|
15
|
Das A, Bellofatto V. RNA polymerase II-dependent transcription in trypanosomes is associated with a SNAP complex-like transcription factor. Proc Natl Acad Sci U S A 2003; 100:80-5. [PMID: 12486231 PMCID: PMC140888 DOI: 10.1073/pnas.262609399] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Spliced leader RNA transcription is essential for cell viability in trypanosomes. The SL RNA genes are expressed from the only defined RNA polymerase II-dependent promoter identified to date in the trypanosome genome. The SL RNA gene promoter has been shown by in vitro and in vivo analyses to have a tripartite architecture. The upstream most cis-acting element, called PBP-1E, is located between 70 and 60 bp upstream from the transcription start site. This essential element functions along with two downstream elements to direct efficient and proper initiation of transcription. Electrophoretic mobility-shift studies detected a 122-kDa protein, called PBP-1, which interacts with PBP-1E. This protein is the first sequence-specific, double-stranded DNA-binding protein isolated in trypanosomes. Three polypeptides copurify with PBP-1 activity, suggesting that PBP-1 is composed of 57-, 46-, and 36-kDa subunits. We have cloned the genes that encode the 57- and 46-kDa subunits. The 46-kDa protein is a previously uncharacterized protein and may be unique to trypanosomes. Its predicted tertiary structure suggests it binds DNA as part of a complex. The 57-kDa subunit is orthologous to the human small nuclear RNA-activating protein (SNAP)50, which is an essential subunit of the SNAP complex (SNAPc). In human cells, SNAPc binds to the proximal sequence element in both RNA polymerase II- and III-dependent small nuclear RNA gene promoters. These findings identify a surprising link in the transcriptional machinery across a large evolutionary distance in the regulation of small nuclear RNA genes in eukaryotes.
Collapse
Affiliation(s)
- Anish Das
- Department of Microbiology and Molecular Genetics, University of Medicine and Dentistry of New Jersey-New Jersey Medical School, International Center for Public Health, Newark 07103, USA
| | | |
Collapse
|
16
|
Sippl MJ, Lackner P, Domingues FS, Prlić A, Malik R, Andreeva A, Wiederstein M. Assessment of the CASP4 fold recognition category. Proteins 2002; Suppl 5:55-67. [PMID: 11835482 DOI: 10.1002/prot.10006] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We present the assessment of the CASP4 fold recognition category. The tasks we had to execute include the splitting of multidomain targets into single domains, the classification of target domains in terms of prediction categories, the numerical evaluation of predictions, the mapping of numerical scores to quality indices, the ranking of predictors, the selection of top-performing groups, and the analysis and critical discussion of the state of the art in this field. The 125 fold recognition groups were assessed by a total score that summarizes their performance over all targets and a quality score reflecting the average quality of the submitted models. Most of the top-performing groups achieved respectable results on both scores simultaneously. Several groups submitted models that were much closer to the respective target structures than any of the known folds in the Protein Data Bank. The CASP4 assessment included the automated servers of the parallel CAFASP experiment. For the total score, the highest rank achieved by a fully automated server is 12. Two thirds of the predictors have rather low scores.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Salzburg, Austria.
| | | | | | | | | | | | | |
Collapse
|
17
|
An Y, Friesner RA. A novel fold recognition method using composite predicted secondary structures. Proteins 2002; 48:352-66. [PMID: 12112702 DOI: 10.1002/prot.10145] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.
Collapse
Affiliation(s)
- Yuling An
- Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, New York 10027, USA
| | | |
Collapse
|
18
|
Dominy BN, Brooks CL. Identifying native-like protein structures using physics-based potentials. J Comput Chem 2002; 23:147-60. [PMID: 11913380 DOI: 10.1002/jcc.10018] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As the field of structural genomics matures, new methods will be required that can accurately and rapidly distinguish reliable structure predictions from those that are more dubious. We present a method based on the CHARMM gas phase implicit hydrogen force field in conjunction with a generalized Born implicit solvation term that allows one to make such discrimination. We begin by analyzing pairs of threaded structures from the EMBL database, and find that it is possible to identify the misfolded structures with over 90% accuracy. Further, we find that misfolded states are generally favored by the solvation term due to the mispairing of favorable intramolecular ionic contacts. We also examine 29 sets of 29 misfolded globin sequences from Levitt's "Decoys 'R' Us" database generated using a sequence homology-based method. Again, we find that discrimination is possible with approximately 90% accuracy. Also, even in these less distorted structures, mispairing of ionic contacts results in a more favorable solvation energy for misfolded states. This is also found to be the case for collapsed, partially folded conformations of CspA and protein G taken from folding free energy calculations. We also find that the inclusion of the generalized Born solvation term, in postprocess energy evaluation, improves the correlation between structural similarity and energy in the globin database. This significantly improves the reliability of the hypothesis that more energetically favorable structures are also more similar to the native conformation. Additionally, we examine seven extensive collections of misfolded structures created by Park and Levitt using a four-state reduced model also contained in the "Decoys 'R' Us" database. Results from these large databases confirm those obtained in the EMBL and misfolded globin databases concerning predictive accuracy, the energetic advantage of misfolded proteins regarding the solvation component, and the improved correlation between energy and structural similarity due to implicit solvation. Z-scores computed for these databases are improved by including the generalized Born implicit solvation term, and are found to be comparable to trained and knowledge-based scoring functions. Finally, we briefly explore the dynamic behavior of a misfolded protein relative to properly folded conformations. We demonstrate that the misfolded conformation diverges quickly from its initial structure while the properly folded states remain stable. Proteins in this study are shown to be more stable than their misfolded counterparts and readily identified based on energetic as well as dynamic criteria. In summary, we demonstrate the utility of physics-based force fields in identifying native-like conformations in a variety of preconstructed structural databases. The details of this discrimination are shown to be dependent on the construction of the structural database.
Collapse
Affiliation(s)
- Brian N Dominy
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
19
|
Haan C, Is'harc H, Hermanns HM, Schmitz-Van De Leur H, Kerr IM, Heinrich PC, Grötzinger J, Behrmann I. Mapping of a region within the N terminus of Jak1 involved in cytokine receptor interaction. J Biol Chem 2001; 276:37451-8. [PMID: 11468294 DOI: 10.1074/jbc.m106135200] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Janus kinase 1 (Jak1) is a cytoplasmic tyrosine kinase that noncovalently associates with a variety of cytokine receptors. Here we show that the in vitro translated N-terminal domains of Jak1 are sufficient for binding to a biotinylated peptide comprising the membrane-proximal 73 amino acids of gp130, the signal-transducing receptor chain of interleukin-6-type cytokines. By the fold recognition approach amino acid residues 36-112 of Jak1 were predicted to adopt a beta-grasp fold, and a structural model was built using ubiquitin as a template. Substitution of Tyr(107) to alanine, a residue conserved among Jaks and involved in hydrophobic core interactions of the proposed beta-grasp domain, abrogated binding of full-length Jak1 to gp130 in COS-7 transfectants. By further mutagenesis we identified the loop 4 region of the Jak1 beta-grasp domain as essential for gp130 association and gp130-mediated signal transduction. In Jak1-deficient U4C cells reconstituted with the loop 4 Jak1 mutants L80A/Y81A and Delta(Tyr(81)-Ser(84)), the interferon-gamma, interferon-alpha, and interleukin-6 responses were similarly impaired. Thus, loop 4 of the beta-grasp domain plays a role in the association of Jak1 with both class I and II cytokine receptors. Taken together the structural model and the mutagenesis data provide further insight into the interaction of Janus kinases with cytokine receptors.
Collapse
Affiliation(s)
- C Haan
- Department of Biochemistry, Rheinisch Westfälische Technische Hochschule Aachen, Pauwelsstr. 30, 52074 Aachen, Germany
| | | | | | | | | | | | | | | |
Collapse
|
20
|
|
21
|
Abstract
A homology-based structure prediction method ideally gives both a correct fold assignment and an accurate query-template alignment. In this article we show that the combination of two existing methods, PSI-BLAST and threading, leads to significant enhancement in the success rate of fold recognition. The combined approach, termed COBLATH, also yields much higher alignment accuracy than found in previous studies. It consists of two-way searches both by PSI-BLAST and by threading. In the PSI-BLAST portion, a query is used to search for hits in a library of potential templates and, conversely, each potential template is used to search for hits in a library of queries. In the threading portion, the scoring function is the sum of a sequence profile and a 6x6 substitution matrix between predicted query and known template secondary structure and solvent exposure. "Two-way" in threading means that the query's sequence profile is used to match the sequences of all potential templates and the sequence profiles of all potential templates are used to match the query's sequence. When tested on a set of 533 nonhomologous proteins, COBLATH was able to assign folds for 390 (73%). Among these 390 queries, 265 (68%) had root-mean-square deviations (RMSDs) of less than 8 A between predicted and actual structures. Such high success rate and accuracy make COBLATH an ideal tool for structural genomics.
Collapse
Affiliation(s)
- Y Shan
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
22
|
Abstract
Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence-sequence, sequence-profile, and profile-profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison. In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile-profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI-BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile-profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.
Collapse
Affiliation(s)
- L Jaroszewski
- The Burnham Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
23
|
Domingues FS, Lackner P, Andreeva A, Sippl MJ. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J Mol Biol 2000; 297:1003-13. [PMID: 10736233 DOI: 10.1006/jmbi.2000.3615] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.
Collapse
Affiliation(s)
- F S Domingues
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob Haringer Strasse 3, Salzburg, A-5020, Austria
| | | | | | | |
Collapse
|
24
|
Panchenko AR, Marchler-Bauer A, Bryant SH. Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol 2000; 296:1319-31. [PMID: 10698636 DOI: 10.1006/jmbi.2000.3541] [Citation(s) in RCA: 102] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Using a benchmark set of structurally similar proteins, we conduct a series of threading experiments intended to identify a scoring function with an optimal combination of contact-potential and sequence-profile terms. The benchmark set is selected to include many medium-difficulty fold recognition targets, where sequence similarity is undetectable by BLAST but structural similarity is extensive. The contact potential is based on the log-odds of non-local contacts involving different amino acid pairs, in native as opposed to randomly compacted structures. The sequence profile term is that used in PSI-BLAST. We find that combination of these terms significantly improves the success rate of fold recognition over use of either term alone, with respect to both recognition sensitivity and the accuracy of threading models. Improvement is greatest for targets between 10 % and 20 % sequence identity and 60 % to 80 % superimposable residues, where the number of models crossing critical accuracy and significance thresholds more than doubles. We suggest that these improvements account for the successful performance of the combined scoring function at CASP3. We discuss possible explanations as to why sequence-profile and contact-potential terms appear complementary.
Collapse
Affiliation(s)
- A R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Building 38A, Room 8N805, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
25
|
Koppensteiner WA, Lackner P, Wiederstein M, Sippl MJ. Characterization of novel proteins based on known protein structures. J Mol Biol 2000; 296:1139-52. [PMID: 10686110 DOI: 10.1006/jmbi.1999.3501] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The genome sciences face the challenge to characterize structure and function of a vast number of novel genes. Sequence search techniques are used to infer functional and structural information from similarities to experimentally characterized genes or proteins. The persistent goal is to refine these techniques and to develop alternative and complementary methods to increase the range of reliable inference.Here, we focus on the structural and functional assignments that can be inferred from the known three-dimensional structures of proteins. The study uses all structures in the Protein Data Bank that were known by the end of 1997. The protein structures released in 1998 were then characterized in terms of functional and structural similarity to the previously known structures, yielding an estimate of the maximum amount of information on novel protein sequences that can be obtained from inference techniques. The 147 globular proteins corresponding to 196 domains released in 1998 have no clear sequence similarity to previously known structures. However, 75 % of the domains have extensive structure similarity to previously known folds, and most importantly, in two out of three cases similarity in structure coincides with related function. In view of this analysis, full utilization of existing structure data bases would provide information for many new targets even if the relationship is not accessible from sequence information alone. Currently, the most sophisticated techniques detect of the order of one-third of these relationships.
Collapse
Affiliation(s)
- W A Koppensteiner
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob-Haringer-Strasse 3, Salzburg, A-5020, Austria
| | | | | | | |
Collapse
|
26
|
Abstract
The current state of the art in modeling protein structure has been assessed, based on the results of the CASP (Critical Assessment of protein Structure Prediction) experiments. In comparative modeling, improvements have been made in sequence alignment, sidechain orientation and loop building. Refinement of the models remains a serious challenge. Improved sequence profile methods have had a large impact in fold recognition. Although there has been some progress in alignment quality, this factor still limits model usefulness. In ab initio structure prediction, there has been notable progress in building approximately correct structures of 40-60 residue-long protein fragments. There is still a long way to go before the general ab initio prediction problem is solved. Overall, the field is maturing into a practical technology, able to deliver useful models for a large number of sequences.
Collapse
Affiliation(s)
- J Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA.
| |
Collapse
|