1
|
Kondra S, Sarkar T, Raghavan V, Xu W. Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery. Front Chem 2021; 8:602291. [PMID: 33520934 PMCID: PMC7838567 DOI: 10.3389/fchem.2020.602291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 12/14/2020] [Indexed: 11/24/2022] Open
Abstract
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across proteins. A structure is thereby represented by a vector of integers. Our method is able to accurately quantify similarity of structure or substructure by matching numbers of identical keys between two proteins. The uniqueness of our method includes: (i) a unique way to represent structures to avoid performing structural superimposition; (ii) use of triangles to represent substructures as it is the simplest primitive to capture shape; (iii) complex structure comparison is achieved by matching integers corresponding to multiple TSRs. Every substructure of one protein is compared to every other substructure in a different protein. The method is used in the studies of proteases and kinases because they play essential roles in cell signaling, and a majority of these constitute drug targets. The new motifs or substructures we identified specifically for proteases and kinases provide a deeper insight into their structural relations. Furthermore, the method provides a unique way to study protein conformational changes. In addition, the results from CATH and SCOP data sets clearly demonstrate that our method can distinguish alpha helices from beta pleated sheets and vice versa. Our method has the potential to be developed into a powerful tool for efficient structure-BLAST search and comparison, just as BLAST is for sequence search and alignment.
Collapse
Affiliation(s)
- Sarika Kondra
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Titli Sarkar
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, Lafayette, LA, United States
| |
Collapse
|
2
|
Saracino GAA, Fontana F, Jekhmane S, Silva JM, Weingarth M, Gelain F. Elucidating Self-Assembling Peptide Aggregation via Morphoscanner: A New Tool for Protein-Peptide Structural Characterization. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2018; 5:1800471. [PMID: 30128255 PMCID: PMC6097002 DOI: 10.1002/advs.201800471] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 05/11/2018] [Indexed: 05/13/2023]
Abstract
Self-assembling and molecular folding are ubiquitous in Nature: they drive the organization of systems ranging from living creatures to DNA molecules. Elucidating the complex dynamics underlying these phenomena is of crucial importance. However, a tool for the analysis of the various phenomena involved in protein/peptide aggregation is still missing. Here, an innovative software is developed and validated for the identification and visualization of b-structuring and b-sheet formation in both simulated systems and crystal structures of proteins and peptides. The novel software suite, dubbed Morphoscanner, is designed to identify and intuitively represent b-structuring and b-sheet formation during molecular dynamics trajectories, paying attention to temporary strand-strand alignment, suboligomer formation and evolution of local order. Self-assembling peptides (SAPs) constitute a promising class of biomaterials and an interesting model to study the spontaneous assembly of molecular systems in vitro. With the help of coarse-grained molecular dynamics the self-assembling of diverse SAPs is simulated into molten aggregates. When applied to these systems, Morphoscanner highlights different b-structuring schemes and kinetics related to SAP sequences. It is demonstrated that Morphoscanner is a novel versatile tool designed to probe the aggregation dynamics of self-assembling systems, adaptable to the analysis of differently coarsened simulations of a variety of biomolecules.
Collapse
Affiliation(s)
- Gloria A. A. Saracino
- Center for Nanomedicine and Tissue Engineering (CNTE)ASST Ospedale Niguarda Cà GrandaPiazza dell'Ospedale Maggiore 320162MilanItaly
| | - Federico Fontana
- IRCCS Casa Sollievo della SofferenzaOpera di San Pio da PietralcinaViale Capuccini 171013San Giovanni RotondoItaly
| | - Shehrazade Jekhmane
- NMR SpectroscopyBijvoet Center for Biomolecular ResearchDepartment of ChemistryUtrecht UniversityPadualaan 83584 CHUtrechtThe Netherlands
| | - João Medeiros Silva
- NMR SpectroscopyBijvoet Center for Biomolecular ResearchDepartment of ChemistryUtrecht UniversityPadualaan 83584 CHUtrechtThe Netherlands
| | - Markus Weingarth
- NMR SpectroscopyBijvoet Center for Biomolecular ResearchDepartment of ChemistryUtrecht UniversityPadualaan 83584 CHUtrechtThe Netherlands
| | - Fabrizio Gelain
- IRCCS Casa Sollievo della SofferenzaOpera di San Pio da PietralcinaViale Capuccini 171013San Giovanni RotondoItaly
| |
Collapse
|
3
|
Chang KT, Guo J, di Ronza A, Sardiello M. Aminode: Identification of Evolutionary Constraints in the Human Proteome. Sci Rep 2018; 8:1357. [PMID: 29358731 PMCID: PMC5778061 DOI: 10.1038/s41598-018-19744-w] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 01/05/2018] [Indexed: 12/12/2022] Open
Abstract
Evolutionarily constrained regions (ECRs) are a hallmark for sites of critical importance for a protein's structure or function. ECRs can be inferred by comparing the amino acid sequences from multiple protein homologs in the context of the evolutionary relationships that link the analyzed proteins. The compilation and analysis of the datasets required to infer ECRs, however, are time consuming and require skills in coding and bioinformatics, which can limit the use of ECR analysis in the biomedical community. Here, we developed Aminode, a user-friendly webtool for the routine and rapid inference of ECRs. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. Profiles of the relative rates of amino acid substitution and ECR maps of human proteins are available for immediate search and download on the Aminode website. Aminode can also be used for custom analyses of protein families of interest. Interestingly, mapping of known missense variants shows great enrichment of pathogenic variants and depletion of non-pathogenic variants in Aminode-generated ECRs, suggesting that ECR analysis may help evaluate the potential pathogenicity of variants of unknown significance. Aminode is freely available at http://www.aminode.org .
Collapse
Affiliation(s)
- Kevin T Chang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Junyan Guo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
- Microsoft Corporation, 1 Microsoft Way, Redmond, WA, 98052, USA
| | - Alberto di Ronza
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Marco Sardiello
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.
| |
Collapse
|
4
|
Vetrivel I, Mahajan S, Tyagi M, Hoffmann L, Sanejouand YH, Srinivasan N, de Brevern AG, Cadet F, Offmann B. Knowledge-based prediction of protein backbone conformation using a structural alphabet. PLoS One 2017; 12:e0186215. [PMID: 29161266 PMCID: PMC5697859 DOI: 10.1371/journal.pone.0186215] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 09/27/2017] [Indexed: 01/19/2023] Open
Abstract
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Collapse
Affiliation(s)
- Iyanar Vetrivel
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | - Swapnil Mahajan
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
- DSIMB, INSERM, UMR S-1134, Laboratory of Excellence, GR-Ex, Université de La Réunion, Faculty of Sciences and Technology, Saint Denis Cedex, La Réunion, France
| | - Manoj Tyagi
- Université de La Réunion, Saint Denis Cedex, La Réunion, France
| | - Lionel Hoffmann
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | - Yves-Henri Sanejouand
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | | | - Alexandre G. de Brevern
- INSERM UMR_S 1134, DSIMB team, Laboratory of Excellence, GR-Ex, Univ Paris Diderot, Univ Sorbonne Paris Cité, INTS, rue Alexandre Cabanel, Paris, France
| | - Frédéric Cadet
- DSIMB, INSERM, UMR S-1134, Laboratory of Excellence, GR-Ex, Université de La Réunion, Faculty of Sciences and Technology, Saint Denis Cedex, La Réunion, France
- PEACCEL SAS, Paris, France
| | - Bernard Offmann
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| |
Collapse
|
5
|
Characterization and Prediction of Protein Flexibility Based on Structural Alphabets. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628025. [PMID: 27660756 PMCID: PMC5021887 DOI: 10.1155/2016/4628025] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 08/02/2016] [Indexed: 11/25/2022]
Abstract
Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.
Collapse
|
6
|
|
7
|
Heffernan R, Dehzangi A, Lyons J, Paliwal K, Sharma A, Wang J, Sattar A, Zhou Y, Yang Y. Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins. Bioinformatics 2015; 32:843-9. [DOI: 10.1093/bioinformatics/btv665] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/07/2015] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ.
Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction.
Availability and implementation: The method is available at http://sparks-lab.org.
Contact: yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rhys Heffernan
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Abdollah Dehzangi
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- Medical Research Center (MRC), Department of Psychiatry, University of Iowa, Iowa City, USA,
| | - James Lyons
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji,
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, Shandong 253023, China,
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- National ICT Australia (NICTA), Brisbane, Australia and
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, Shandong 253023, China,
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
8
|
Ford TJ, Way JC. Enhancement of E. coli acyl-CoA synthetase FadD activity on medium chain fatty acids. PeerJ 2015; 3:e1040. [PMID: 26157619 PMCID: PMC4493641 DOI: 10.7717/peerj.1040] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2015] [Accepted: 05/31/2015] [Indexed: 12/22/2022] Open
Abstract
FadD catalyses the first step in E. coli beta-oxidation, the activation of free fatty acids into acyl-CoA thioesters. This activation makes fatty acids competent for catabolism and reduction into derivatives like alcohols and alkanes. Alcohols and alkanes derived from medium chain fatty acids (MCFAs, 6-12 carbons) are potential biofuels; however, FadD has low activity on MCFAs. Herein, we generate mutations in fadD that enhance its acyl-CoA synthetase activity on MCFAs. Homology modeling reveals that these mutations cluster on a face of FadD from which the co-product, AMP, is expected to exit. Using FadD homology models, we design additional FadD mutations that enhance E. coli growth rate on octanoate and provide evidence for a model wherein FadD activity on octanoate can be enhanced by aiding product exit. These studies provide FadD mutants useful for producing MCFA derivatives and a rationale to alter the substrate specificity of adenylating enzymes.
Collapse
Affiliation(s)
- Tyler J Ford
- Department of Systems Biology, Harvard Medical School , Boston, MA , USA
| | - Jeffrey C Way
- Wyss Institute for Biologically Inspired Engineering, Harvard Medical School , Boston, MA , USA
| |
Collapse
|
9
|
Meier A, Söding J. Context similarity scoring improves protein sequence alignments in the midnight zone. Bioinformatics 2014; 31:674-81. [PMID: 25338715 DOI: 10.1093/bioinformatics/btu697] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High-quality protein sequence alignments are essential for a number of downstream applications such as template-based protein structure prediction. In addition to the similarity score between sequence profile columns, many current profile-profile alignment tools use extra terms that compare 1D-structural properties such as secondary structure and solvent accessibility, which are predicted from short profile windows around each sequence position. Such scores add non-redundant information by evaluating the conservation of local patterns of hydrophobicity and other amino acid properties and thus exploiting correlations between profile columns. RESULTS Here, instead of predicting and comparing known 1D properties, we follow an agnostic approach. We learn in an unsupervised fashion a set of maximally conserved patterns represented by 13-residue sequence profiles, without the need to know the cause of the conservation of these patterns. We use a maximum likelihood approach to train a set of 32 such profiles that can best represent patterns conserved within pairs of remotely homologs, structurally aligned training profiles. We include the new context score into our Hmm-Hmm alignment tool hhsearch and improve especially the quality of difficult alignments significantly. CONCLUSION The context similarity score improves the quality of homology models and other methods that depend on accurate pairwise alignments.
Collapse
Affiliation(s)
- Armin Meier
- Gene Center, LMU Munich, 81377 Munich and Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - Johannes Söding
- Gene Center, LMU Munich, 81377 Munich and Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany Gene Center, LMU Munich, 81377 Munich and Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| |
Collapse
|
10
|
Brylinski M. eVolver: an optimization engine for evolving protein sequences to stabilize the respective structures. BMC Res Notes 2013; 6:303. [PMID: 23902875 PMCID: PMC3735418 DOI: 10.1186/1756-0500-6-303] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 07/30/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many structural bioinformatics approaches employ sequence profile-based threading techniques. To improve fold recognition rates, homology searching may include artificially evolved amino acid sequences, which were demonstrated to enhance the sensitivity of protein threading in targeting midnight zone templates. FINDINGS We describe implementation details of eVolver, an optimization algorithm that evolves protein sequences to stabilize the respective structures by a variety of potentials, which are compatible with those commonly used in protein threading. In a case study focusing on LARG PDZ domain, we show that artificially evolved sequences have quite high capabilities to recognize the correct protein structures using standard sequence profile-based fold recognition. CONCLUSIONS Computationally design protein sequences can be incorporated in existing sequence profile-based threading approaches to increase their sensitivity. They also provide a desired linkage between protein structure and function in in silico experiments that relate to e.g. the completeness of protein structure space, the origin of folds and protein universe. eVolver is freely available as a user-friendly webserver and a well-documented stand-alone software distribution at http://www.brylinski.org/evolver.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
11
|
Lee HW, Lee HC, Lee LK, Teber ET, Church WB. The use of soluble protein structures in modeling helical proteins in a layered membrane. J Biomol Struct Dyn 2013; 32:308-18. [PMID: 23527746 DOI: 10.1080/07391102.2013.765808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Major advances have been made in the prediction of soluble protein structures, led by the knowledge-based modeling methods that extract useful structural trends from known protein structures and incorporate them into scoring functions. The same cannot be reported for the class of transmembrane proteins, primarily due to the lack of high-resolution structural data for transmembrane proteins, which render many of the knowledge-based method unreliable or invalid. We have developed a method that harnesses the vast structural knowledge available in soluble protein data for use in the modeling of transmembrane proteins. At the core of the method, a set of transmembrane protein decoy sets that allow us to filter and train features recognized from soluble proteins for transmembrane protein modeling into a set of scoring functions. We have demonstrated that structures of soluble proteins can provide significant insight into transmembrane protein structures. A complementary novel two-stage modeling/selection process that mimics the two-stage helical membrane protein folding was developed. Combined with the scoring function, the method was successfully applied to model 5 transmembrane proteins. The root mean square deviations of the predicted models ranged from 5.0 to 8.8 Å to the native structures.
Collapse
Affiliation(s)
- Hong Wing Lee
- a Faculty of Pharmacy , Group in Biomolecular Structure and Informatics, University of Sydney , Sydney , NSW , 2006 , Australia
| | | | | | | | | |
Collapse
|
12
|
Brylinski M. The utility of artificially evolved sequences in protein threading and fold recognition. J Theor Biol 2013; 328:77-88. [PMID: 23542050 DOI: 10.1016/j.jtbi.2013.03.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/24/2013] [Accepted: 03/18/2013] [Indexed: 12/23/2022]
Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
13
|
Singha Roy A, Tripathy DR, Chatterjee A, Dasgupta S. The influence of common metal ions on the interactions of the isoflavone genistein with bovine serum albumin. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2013; 102:393-402. [PMID: 23237845 DOI: 10.1016/j.saa.2012.09.053] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Revised: 08/28/2012] [Accepted: 09/20/2012] [Indexed: 05/20/2023]
Abstract
The interaction of genistein with bovine serum albumin (BSA) has been characterized via UV-vis, fluorescence spectroscopy and Circular Dichroism (CD) measurements under physiological conditions. In this study, we have investigated the effect of some common metal ions on the binding of genistein with BSA using fluorescence studies. The fluorescence data reveal that the binding affinity of genistein to BSA increases in presence of certain metal ions. The possibility of non-radiative energy transition from the donor tryptophan to the acceptor genistein has been observed in absence and presence of metal ions. The observed similarities in the values of efficiency of energy transfer (E) and the separation between the donor and acceptor (r) in both the cases may be correlated with the complexation between the genistein and metal ions, which is also observed from the UV-vis studies. The changes in enthalpy (ΔH°) and entropy (ΔS°) of the interaction were found to be -14.64 kJ mol(-1) and +42.75 J mol(-1)K(-1) respectively. These values indicate the involvement of electrostatic interactions along with a hydrophobic association that results in a positive entropy change. CD analysis shows that there is a slight increase in the% α-helical content of BSA on binding with genistein at lower molar ratios. Warfarin and ibuprofen displacement studies in accordance with the molecular docking show that genistein binds to site I (subdomain IIA) of BSA.
Collapse
Affiliation(s)
- Atanu Singha Roy
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721 302, India
| | | | | | | |
Collapse
|
14
|
Ghorai SK, Samanta SK, Mukherjee M, Saha Sardar P, Ghosh S. Tuning of “Antenna Effect” of Eu(III) in Ternary Systems in Aqueous Medium through Binding with Protein. Inorg Chem 2013; 52:1476-87. [DOI: 10.1021/ic302218m] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Shyamal Kr Ghorai
- Department of Chemistry and Biochemistry, Presidency University, 86/1, College Street, Kolkata
700073, India
| | - Swarna Kamal Samanta
- Department of Chemistry and Biochemistry, Presidency University, 86/1, College Street, Kolkata
700073, India
| | - Manini Mukherjee
- Department of Chemistry and Biochemistry, Presidency University, 86/1, College Street, Kolkata
700073, India
| | - Pinki Saha Sardar
- Department of Chemistry and Biochemistry, Presidency University, 86/1, College Street, Kolkata
700073, India
| | - Sanjib Ghosh
- Department of Chemistry and Biochemistry, Presidency University, 86/1, College Street, Kolkata
700073, India
| |
Collapse
|
15
|
Huang IK, Pei J, Grishin NV. Defining and predicting structurally conserved regions in protein superfamilies. ACTA ACUST UNITED AC 2012. [PMID: 23193223 DOI: 10.1093/bioinformatics/bts682] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. RESULTS Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. AVAILABILITY The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. CONTACT 91huangi@gmail.com or grishin@chop.swmed.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Ivan K Huang
- Department of Mathematics, Rice University, Houston, TX 77005, USA.
| | | | | |
Collapse
|
16
|
Ghorai SK, Samanta SK, Mukherjee M, Ghosh S. Protein-Mediated Efficient Synergistic “Antenna Effect” in a Ternary System in D2O Medium. J Phys Chem A 2012; 116:8303-12. [DOI: 10.1021/jp304405u] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Shyamal Kr Ghorai
- Department of Chemistry and Biochemistry, Presidency University, Kolkata 700073, India
| | - Swarna Kamal Samanta
- Department of Chemistry and Biochemistry, Presidency University, Kolkata 700073, India
| | - Manini Mukherjee
- Department of Chemistry and Biochemistry, Presidency University, Kolkata 700073, India
| | - Sanjib Ghosh
- Department of Chemistry and Biochemistry, Presidency University, Kolkata 700073, India
| |
Collapse
|
17
|
Li P, Pok G, Jung KS, Shon HS, Ryu KH. QSE: A new 3-D solvent exposure measure for the analysis of protein structure. Proteomics 2011; 11:3793-801. [PMID: 21761564 DOI: 10.1002/pmic.201100189] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 06/29/2011] [Accepted: 07/05/2011] [Indexed: 11/05/2022]
Abstract
Solvent exposure of amino acids measures how deep residues are buried in tertiary structure of proteins, and hence it provides important information for analyzing and predicting protein structure and functions. Existing methods of calculating solvent exposure such as accessible surface area, relative accessible surface area, residue depth, contact number, and half-sphere exposure still have some limitations. In this article, we propose a novel solvent exposure measure named quadrant-sphere exposure (QSE) based on eight quadrants derived from spherical neighborhood. The proposed measure forms a microenvironment around Cα atom as a sphere with a radius of 13 Å, and subdivides it into eight quadrants according to a rectangular coordinate system constructed based on geometric relationships of backbone atoms. The number of neighboring Cα atoms whose labels are the same is given as the QSE value of the center Cα atom at hand. As evidenced by histograms that show very different distributions for different structure configurations, the proposed measure captures local properties that are characteristic for a residue's eight-directional neighborhood within a sphere. Compared with other measures, QSE provides a different view of solvent exposure, and provides information that is specific for different tertiary structure. As the experimental results show, QSE measure can potentially be used in protein structure analysis and predictions.
Collapse
Affiliation(s)
- Peipei Li
- College of Electrical and Computer Engineering, Chungbuk National University, Chungbuk, Korea
| | | | | | | | | |
Collapse
|
18
|
Teng Y, Liu R, Li C, Xia Q, Zhang P. The interaction between 4-aminoantipyrine and bovine serum albumin: multiple spectroscopic and molecular docking investigations. JOURNAL OF HAZARDOUS MATERIALS 2011; 190:574-581. [PMID: 21497437 DOI: 10.1016/j.jhazmat.2011.03.084] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 03/22/2011] [Accepted: 03/22/2011] [Indexed: 05/30/2023]
Abstract
4-Aminoantipyrine (AAP) is widely used in the pharmaceutical industry, in biochemical experiments and in environmental monitoring. AAP as an aromatic pollutant in the environment poses a great threat to human health. To evaluate the toxicity of AAP at the protein level, the effects of AAP on bovine serum albumin (BSA) were investigated by multiple spectroscopic techniques and molecular modeling. After the inner filter effect was eliminated, the experimental results showed that AAP effectively quenched the intrinsic fluorescence of BSA via static quenching. The number of binding sites, the binding constant, the thermodynamic parameters and binding subdomain were measured, and indicated that AAP could spontaneously bind with BSA on subdomain IIIA through electrostatic forces. Molecular docking results revealed that AAP interacted with the Glu 488 and Glu 502 residues of BSA. Furthermore, the conformation of BSA was demonstrably changed in the presence of AAP. The skeletal structure of BSA loosened, exposing internal hydrophobic aromatic ring amino acids and peptide strands to the solution.
Collapse
Affiliation(s)
- Yue Teng
- Shandong Key Laboratory of Water Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Shandong University, China-America CRC for Environment & Health, Shandong Province, 27# Shanda South Road, Jinan 250100, PR China
| | | | | | | | | |
Collapse
|
19
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
20
|
Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010; 11:283. [PMID: 20507547 PMCID: PMC3583236 DOI: 10.1186/1471-2105-11-283] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/27/2010] [Indexed: 11/23/2022] Open
Abstract
Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
Collapse
Affiliation(s)
- Jose M Duarte
- Max Planck Institute for Molecular Genetics, Ihnestr, Berlin, Germany.
| | | | | | | | | |
Collapse
|
21
|
Amos FF, Ndao M, Evans JS. Evidence of Mineralization Activity and Supramolecular Assembly by the N-Terminal Sequence of ACCBP, a Biomineralization Protein That Is Homologous to the Acetylcholine Binding Protein Family. Biomacromolecules 2009; 10:3298-305. [DOI: 10.1021/bm900893f] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Fairland F. Amos
- Center for Biomolecular Materials Spectroscopy, Laboratory for Chemical Physics, New York University, 345 East 24th Street, New York, New York 10010
| | - Moise Ndao
- Center for Biomolecular Materials Spectroscopy, Laboratory for Chemical Physics, New York University, 345 East 24th Street, New York, New York 10010
| | - John Spencer Evans
- Center for Biomolecular Materials Spectroscopy, Laboratory for Chemical Physics, New York University, 345 East 24th Street, New York, New York 10010
| |
Collapse
|
22
|
Abstract
Undertaker is a program designed to help predict protein structure using alignments to proteins of known structure and fragment assembly. The program generates conformations and uses cost functions to select the best structures from among the generated conformations. This paper describes the use of Undertaker's cost functions for model quality assessment. We achieve an accuracy that is similar to other methods, without using consensus-based techniques. Adding consensus-based features further improves our approach substantially. We report several correlation measures, including a new weighted version of Kendall's tau (tau(3)) and show model quality assessment results superior to previously published results on all correlation measures when using only models with no missing atoms.
Collapse
Affiliation(s)
- John Archie
- University of California at Santa Cruz, Biomolecular Engineering, Santa Cruz, CA, USA
| | | |
Collapse
|
23
|
Ghosh KS, Sen S, Sahoo BK, Dasgupta S. A spectroscopic investigation into the interactions of 3'-O-carboxy esters of thymidine with bovine serum albumin. Biopolymers 2009; 91:737-44. [PMID: 19402143 DOI: 10.1002/bip.21220] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Binding studies of 3'-O-carboxy esters of thymidine, reported inhibitors of ribonucleases, with bovine serum albumin (BSA) have been explored in this report. Fluorescence spectroscopy in combination with Fourier transform infrared (FTIR) and circular dichroism (CD) spectroscopy have been used to determine the nature and mode of binding. The binding and quenching parameters were determined from tryptophan fluorescence quenching by Scatchard plots and modified Stern-Volmer plots. The association constants are of the order of 10(4) M(-1) for both the ligands. Thermodynamic parameters suggest that apart from an initial hydrophobic association, hydrogen bonding and van der Waals interactions play a decisive role during protein-ligand complex formation. Minor changes were observed in the secondary structures of human serum albumin (HSA) as revealed by FTIR and CD. Docking studies suggest that the ligands are close to Trp 213, which causes fluorescence quenching.
Collapse
Affiliation(s)
- Kalyan Sundar Ghosh
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721 302, India
| | | | | | | |
Collapse
|
24
|
Müller-Santos M, de Souza EM, Pedrosa FDO, Mitchell DA, Longhi S, Carrière F, Canaan S, Krieger N. First evidence for the salt-dependent folding and activity of an esterase from the halophilic archaea Haloarcula marismortui. Biochim Biophys Acta Mol Cell Biol Lipids 2009; 1791:719-29. [DOI: 10.1016/j.bbalip.2009.03.006] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Revised: 03/03/2009] [Accepted: 03/05/2009] [Indexed: 10/21/2022]
|
25
|
Hol J, Küchler AM, Johansen FE, Dalhus B, Haraldsen G, Oynebråten I. Molecular requirements for sorting of the chemokine interleukin-8/CXCL8 to endothelial Weibel-Palade bodies. J Biol Chem 2009; 284:23532-9. [PMID: 19578117 DOI: 10.1074/jbc.m900874200] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Sorting of proteins to Weibel-Palade bodies (WPB) of endothelial cells allows rapid regulated secretion of leukocyte-recruiting P-selectin and chemokines as well as procoagulant von Willebrand factor (VWF). Here we show by domain swap studies that the exposed aspartic acid in loop 2 (Ser(44)-Asp(45)-Gly(46)) of the CXC chemokine interleukin (IL)-8 is crucial for targeting to WPB. Loop 2 also governs sorting of chemokines to alpha-granules of platelets, but the fingerprint of the loop 2 of these chemokines differs from that of IL-8. On the other hand, loop 2 of IL-8 closely resembles a surface-exposed sequence of the VWF propeptide, the region of VWF that directs sorting of the protein to WPB. We conclude that loop 2 of IL-8 constitutes a critical signal for sorting to WPB and propose a general role for this loop in the sorting of chemokines to compartments of regulated secretion.
Collapse
Affiliation(s)
- Johanna Hol
- Institute and University of Oslo, Rikshospitalet University Hospital, Sognsvannsveien 20, 0027 Oslo, Norway
| | | | | | | | | | | |
Collapse
|
26
|
Abstract
The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
Collapse
Affiliation(s)
- Kevin Karplus
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA.
| |
Collapse
|
27
|
Mooney C, Pollastri G. Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins 2009; 77:181-90. [DOI: 10.1002/prot.22429] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
28
|
Li Q, Zhou C, Liu H. Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins 2009; 74:820-36. [PMID: 18704928 DOI: 10.1002/prot.22191] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.
Collapse
Affiliation(s)
- Quan Li
- School of Life Sciences, and Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, Anhui 230027, China
| | | | | |
Collapse
|
29
|
Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 2009; 15:1093-108. [PMID: 19234730 PMCID: PMC2712621 DOI: 10.1007/s00894-009-0454-9] [Citation(s) in RCA: 207] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Accepted: 01/02/2009] [Indexed: 12/01/2022]
Abstract
The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Collapse
Affiliation(s)
- Elizabeth Durham
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 465 21st Ave South, Nashville, TN 37232-8725, USA
| | | | | | | | | |
Collapse
|
30
|
Malmstroem L, Hou L, Atkins WM, Goodlett DR. On the use of hydrogen/deuterium exchange mass spectrometry data to improve de novo protein structure prediction. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2009; 23:459-461. [PMID: 19125403 DOI: 10.1002/rcm.3882] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
|
31
|
Katzman S, Barrett C, Thiltgen G, Karchin R, Karplus K. PREDICT-2ND: a tool for generalized protein local structure prediction. ACTA ACUST UNITED AC 2008; 24:2453-9. [PMID: 18757875 DOI: 10.1093/bioinformatics/btn438] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMs), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (predict-2nd) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer. RESULTS Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets. AVAILABILITY Local structure prediction with the methods described here is available for use online at http://www.soe.ucsc.edu/compbio/SAM_T08/T08-query.html. The source code and example networks for PREDICT-2ND are available at http://www.soe.ucsc.edu/~karplus/predict-2nd/ A required C++ library is available at http://www.soe.ucsc.edu/~karplus/ultimate/
Collapse
Affiliation(s)
- Sol Katzman
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | | | | | | | | |
Collapse
|
32
|
Dong Q, Wang X, Lin L. Prediction of protein local structures and folding fragments based on building-block library. Proteins 2008; 72:353-66. [PMID: 18214964 DOI: 10.1002/prot.21931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.
Collapse
Affiliation(s)
- Qiwen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | | | | |
Collapse
|
33
|
Song J, Tan H, Takemoto K, Akutsu T. HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 2008; 24:1489-97. [DOI: 10.1093/bioinformatics/btn222] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
34
|
Dong Q, Wang X, Lin L, Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins 2008; 72:163-72. [DOI: 10.1002/prot.21904] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
35
|
Investigating the binding of curcumin derivatives to bovine serum albumin. Biophys Chem 2007; 132:81-8. [PMID: 18037556 DOI: 10.1016/j.bpc.2007.10.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 10/14/2007] [Accepted: 10/14/2007] [Indexed: 11/22/2022]
Abstract
The interaction of bovine serum albumin (BSA) with isoxazolcurcumin (IOC) and diacetylcurcumin (DAC) has been investigated. Binding constants obtained were found to be in the 10(5) M(-1) range. Minor conformational changes of BSA were observed from circular dichroism (CD) and Fourier transformed infrared (FT-IR) studies on binding. Based on Förster's theory of non-radiation energy transfer, the average binding distance, r between the donor (BSA) and acceptors IOC and DAC was found to be 3.79 and 4.27 nm respectively. Molecular docking of isoxazolcurcumin and diacetylcurcumin with bovine serum albumin indicated that they docked close to Trp 213, which is within the hydrophobic subdomain.
Collapse
|
36
|
Liu S, Zhang C, Liang S, Zhou Y. Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007; 68:636-45. [PMID: 17510969 DOI: 10.1002/prot.21459] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on http://sparks.informatics.iupui.edu/SP4.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
37
|
Dong QW, Wang XL, Lin L. Methods for optimizing the structure alphabet sequences of proteins. Comput Biol Med 2007; 37:1610-6. [PMID: 17493604 DOI: 10.1016/j.compbiomed.2007.03.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2006] [Accepted: 03/16/2007] [Indexed: 11/24/2022]
Abstract
Protein structure prediction based on fragment assemble has made great progress in recent years. Local protein structure prediction is receiving increased attention. One essential step of local protein structure prediction method is that the three-dimensional conformations must be compressed into one-dimensional series of letters of a structural alphabet. The traditional method assigns each structure fragment the structure alphabet that has the best local structure similarity. However, such locally optimal structure alphabet sequence does not guarantee to produce the globally optimal structure. This study presents two efficient methods trying to find the optimal structure alphabet sequence, which can model the native structures as accuracy as possible. First, a 28-letter structure alphabet is derived by clustering fragment in Cartesian space with fragment length of seven residues. The average quantization error of the 28 letters is 0.82 A in term of root mean square deviation. Then, two efficient methods are presented to encode the protein structures into series of structure alphabet letters, that is, the greedy and dynamic programming algorithm. They are tested on PDB database using the structure alphabet developed in Cartesian coordinates space (our structure alphabet) and in torsion angles space (the PB structure alphabet), respectively. The experimental results show that these two methods can find the approximately optimal structure alphabet sequences by searching a small fraction of the modeling space. The traditional local-optimization method achieves 26.27 A root mean square deviations between the reconstructed structures and the native one, while the modeling accuracy is improved to 3.28 A by the greedy algorithm. The results are helpful for local protein structure prediction.
Collapse
Affiliation(s)
- Qi-wen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | | | | |
Collapse
|
38
|
Chen CT, Lin HN, Sung TY, Hsu WL. HYPLOSP: a knowledge-based approach to protein local structure prediction. J Bioinform Comput Biol 2007; 4:1287-307. [PMID: 17245815 DOI: 10.1142/s0219720006002466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2006] [Revised: 08/31/2006] [Accepted: 08/31/2006] [Indexed: 11/18/2022]
Abstract
Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, the accuracy of existing methods is limited. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our method. Empirically, the accuracy of the method correlates positively with the local match rate; therefore, we employ it to predict the local structures of positions with a high local match rate. For positions with a low local match rate, we propose a neural network prediction method. To better utilize the knowledge-based and neural network methods, we design a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines both methods. To evaluate the performance of the proposed methods, we first perform cross-validation experiments by applying our knowledge-based method, a neural network method, and HYPLOSP to a large dataset of 3,925 protein chains. We test our methods extensively on three different structural alphabets and evaluate their performance by two widely used criteria, Maximum Deviation of backbone torsion Angle (MDA) and Q(N), which is similar to Q(3) in secondary structure prediction. We then compare HYPLOSP with three previous studies using a dataset of 56 new protein chains. HYPLOSP shows promising results in terms of MDA and Q(N) accuracy and demonstrates its alphabet-independent capability.
Collapse
Affiliation(s)
- Ching-Tai Chen
- Institute of Information Science, Academia Sinica, 128 Sec. 2, Academia Rd, Taipei, Taiwan, ROC.
| | | | | | | |
Collapse
|
39
|
Ishida T, Nakamura S, Shimizu K. Potential for assessing quality of protein structure based on contact number prediction. Proteins 2006; 64:940-7. [PMID: 16788993 DOI: 10.1002/prot.21047] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.
Collapse
Affiliation(s)
- Takashi Ishida
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| | | | | |
Collapse
|
40
|
Ohlson T, Aggarwal V, Elofsson A, MacCallum RM. Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps. BMC Bioinformatics 2006; 7:357. [PMID: 16869963 PMCID: PMC1562450 DOI: 10.1186/1471-2105-7-357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 07/25/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.
Collapse
Affiliation(s)
- Tomas Ohlson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Varun Aggarwal
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Division of Cell and Molecular Biology, Imperial College London, London, UK
| |
Collapse
|
41
|
Karplus K, Katzman S, Shackleford G, Koeva M, Draper J, Barnes B, Soriano M, Hughey R. SAM-T04: what is new in protein-structure prediction for CASP6. Proteins 2006; 61 Suppl 7:135-142. [PMID: 16187355 DOI: 10.1002/prot.20730] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The SAM-T04 method for predicting protein structures uses a single protocol across the entire range of targets, from comparative modeling to new folds. This protocol is similar to the SAM-T02 protocol used in CASP5, but has improvements in the iterative search for similar sequences in finding and aligning templates, in creating fragment libraries, in generating protein conformations, and in scoring the conformations. The automatic procedure made some improvements over simply selecting an alignment to the highest-scoring template, and human intervention made substantial improvements over the automatic procedure. The main improvements made by human intervention were from adding constraints to build (or retain) beta-sheets and from splitting multidomain proteins into separate domains. The uniform protocol was moderately successful across the entire range of target difficulty, but was somewhat less successful than other approaches in CASP6 on the comparative modeling targets.
Collapse
Affiliation(s)
- Kevin Karplus
- Biomolecular Engineering Department, University of California, Santa Cruz, California 95064, USA.
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Sander O, Sommer I, Lengauer T. Local protein structure prediction using discriminative models. BMC Bioinformatics 2006; 7:14. [PMID: 16405736 PMCID: PMC1368994 DOI: 10.1186/1471-2105-7-14] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Accepted: 01/11/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years protein structure prediction methods using local structure information have shown promising improvements. The quality of new fold predictions has risen significantly and in fold recognition incorporation of local structure predictions led to improvements in the accuracy of results. We developed a local structure prediction method to be integrated into either fold recognition or new fold prediction methods. For each local sequence window of a protein sequence the method predicts probability estimates for the sequence to attain particular local structures from a set of predefined local structure candidates. The first step is to define a set of local structure representatives based on clustering recurrent local structures. In the second step a discriminative model is trained to predict the local structure representative given local sequence information. RESULTS The step of clustering local structures yields an average RMSD quantization error of 1.19 A for 27 structural representatives (for a fragment length of 7 residues). In the prediction step the area under the ROC curve for detection of the 27 classes ranges from 0.68 to 0.88. CONCLUSION The described method yields probability estimates for local protein structure candidates, giving signals for all kinds of local structure. These local structure predictions can be incorporated either into fold recognition algorithms to improve alignment quality and the overall prediction accuracy or into new fold prediction methods.
Collapse
Affiliation(s)
- Oliver Sander
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| | - Ingolf Sommer
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| |
Collapse
|
43
|
Yuan Z. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics 2005; 6:248. [PMID: 16221309 PMCID: PMC1277819 DOI: 10.1186/1471-2105-6-248] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2005] [Accepted: 10/13/2005] [Indexed: 11/10/2022] Open
Abstract
Background Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of Cβ atoms in other residues within a sphere around the Cβ atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.
Collapse
Affiliation(s)
- Zheng Yuan
- Institute for Molecular Bioscience, ARC Centre in Bioinformatics, The University of Queensland, St. Lucia, 4072, Australia.
| |
Collapse
|
44
|
Karplus K, Karchin R, Shackelford G, Hughey R. Calibrating E-values for hidden Markov models using reverse-sequence null models. Bioinformatics 2005; 21:4107-15. [PMID: 16123115 DOI: 10.1093/bioinformatics/bti629] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Hidden Markov models (HMMs) calculate the probability that a sequence was generated by a given model. Log-odds scoring provides a context for evaluating this probability, by considering it in relation to a null hypothesis. We have found that using a reverse-sequence null model effectively removes biases owing to sequence length and composition and reduces the number of false positives in a database search. Any scoring system is an arbitrary measure of the quality of database matches. Significance estimates of scores are essential, because they eliminate model- and method-dependent scaling factors, and because they quantify the importance of each match. Accurate computation of the significance of reverse-sequence null model scores presents a problem, because the scores do not fit the extreme-value (Gumbel) distribution commonly used to estimate HMM scores' significance. RESULTS To get a better estimate of the significance of reverse-sequence null model scores, we derive a theoretical distribution based on the assumption of a Gumbel distribution for raw HMM scores and compare estimates based on this and other distribution families. We derive estimation methods for the parameters of the distributions based on maximum likelihood and on moment matching (least-squares fit for Student's t-distribution). We evaluate the modeled distributions of scores, based on how well they fit the tail of the observed distribution for data not used in the fitting and on the effects of the improved E-values on our HMM-based fold-recognition methods. The theoretical distribution provides some improvement in fitting the tail and in providing fewer false positives in the fold-recognition test. An ad hoc distribution based on assuming a stretched exponential tail does an even better job. The use of Student's t to model the distribution fits well in the middle of the distribution, but provides too heavy a tail. The moment-matching methods fit the tails better than maximum-likelihood methods. AVAILABILITY Information on obtaining the SAM program suite (free for academic use), as well as a server interface, is available at http://www.soe.ucsc.edu/research/compbio/sam.html and the open-source random sequence generator with varying compositional biases is available at http://www.soe.ucsc.edu/research/compbio/gen_sequence
Collapse
Affiliation(s)
- Kevin Karplus
- Department of Biomolecular Engineering, University of California, Santa Cruz, 95064, USA.
| | | | | | | |
Collapse
|
45
|
Hamelryck T. An amino acid has two sides: A new 2D measure provides a different view of solvent exposure. Proteins 2005; 59:38-48. [DOI: 10.1002/prot.20379] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|