1
|
Pallante L, Cannariato M, Androutsos L, Zizzi EA, Bompotas A, Hada X, Grasso G, Kalogeras A, Mavroudi S, Di Benedetto G, Theofilatos K, Deriu MA. VirtuousPocketome: a computational tool for screening protein-ligand complexes to identify similar binding sites. Sci Rep 2024; 14:6296. [PMID: 38491261 PMCID: PMC10943019 DOI: 10.1038/s41598-024-56893-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024] Open
Abstract
Protein residues within binding pockets play a critical role in determining the range of ligands that can interact with a protein, influencing its structure and function. Identifying structural similarities in proteins offers valuable insights into their function and activation mechanisms, aiding in predicting protein-ligand interactions, anticipating off-target effects, and facilitating the development of therapeutic agents. Numerous computational methods assessing global or local similarity in protein cavities have emerged, but their utilization is impeded by complexity, impractical automation for amino acid pattern searches, and an inability to evaluate the dynamics of scrutinized protein-ligand systems. Here, we present a general, automatic and unbiased computational pipeline, named VirtuousPocketome, aimed at screening huge databases of proteins for similar binding pockets starting from an interested protein-ligand complex. We demonstrate the pipeline's potential by exploring a recently-solved human bitter taste receptor, i.e. the TAS2R46, complexed with strychnine. We pinpointed 145 proteins sharing similar binding sites compared to the analysed bitter taste receptor and the enrichment analysis highlighted the related biological processes, molecular functions and cellular components. This work represents the foundation for future studies aimed at understanding the effective role of tastants outside the gustatory system: this could pave the way towards the rationalization of the diet as a supplement to standard pharmacological treatments and the design of novel tastants-inspired compounds to target other proteins involved in specific diseases or disorders. The proposed pipeline is publicly accessible, can be applied to any protein-ligand complex, and could be expanded to screen any database of protein structures.
Collapse
Affiliation(s)
- Lorenzo Pallante
- Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129, Torino, Italy
| | - Marco Cannariato
- Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129, Torino, Italy
| | | | - Eric A Zizzi
- Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129, Torino, Italy
| | - Agorakis Bompotas
- Industrial Systems Institute, Athena Research Center, 265 04, Patras, Greece
| | - Xhesika Hada
- Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129, Torino, Italy
| | - Gianvito Grasso
- Dalle Molle Institute for Artificial Intelligence IDSIA USI-SUPSI, 6962, Lugano-Viganello, Switzerland
| | | | - Seferina Mavroudi
- Department of Nursing, School of Health Rehabilitation Sciences, University of Patras, 265 04, Patras, Greece
| | | | | | - Marco A Deriu
- Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129, Torino, Italy.
| |
Collapse
|
2
|
Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach. COMPLEX SYSTEM MODELLING AND CONTROL THROUGH INTELLIGENT SOFT COMPUTATIONS 2015. [DOI: 10.1007/978-3-319-12883-2_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
3
|
Molloy K, Van MJ, Barbara D, Shehu A. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space. BMC Bioinformatics 2014; 15 Suppl 8:S4. [PMID: 25080993 PMCID: PMC4120149 DOI: 10.1186/1471-2105-15-s8-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.
Collapse
Affiliation(s)
- Kevin Molloy
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - M Jennifer Van
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Daniel Barbara
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- Department of Bioengineering, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- School of Systems Biology, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| |
Collapse
|
4
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
5
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
6
|
Hensen U, Meyer T, Haas J, Rex R, Vriend G, Grubmüller H. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function. PLoS One 2012; 7:e33931. [PMID: 22606222 PMCID: PMC3350514 DOI: 10.1371/journal.pone.0033931] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/20/2012] [Indexed: 12/25/2022] Open
Abstract
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.
Collapse
Affiliation(s)
- Ulf Hensen
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Tim Meyer
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Jürgen Haas
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - René Rex
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Gert Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Helmut Grubmüller
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| |
Collapse
|
7
|
Weeks AM, Chang MCY. Constructing de novo biosynthetic pathways for chemical synthesis inside living cells. Biochemistry 2011; 50:5404-18. [PMID: 21591680 DOI: 10.1021/bi200416g] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Living organisms have evolved a vast array of catalytic functions that make them ideally suited for the production of medicinally and industrially relevant small-molecule targets. Indeed, native metabolic pathways in microbial hosts have long been exploited and optimized for the scalable production of both fine and commodity chemicals. Our increasing capacity for DNA sequencing and synthesis has revealed the molecular basis for the biosynthesis of a variety of complex and useful metabolites and allows the de novo construction of novel metabolic pathways for the production of new and exotic molecular targets in genetically tractable microbes. However, the development of commercially viable processes for these engineered pathways is currently limited by our ability to quickly identify or engineer enzymes with the correct reaction and substrate selectivity as well as the speed by which metabolic bottlenecks can be determined and corrected. Efforts to understand the relationship among sequence, structure, and function in the basic biochemical sciences can advance these goals for synthetic biology applications while also serving as an experimental platform for elucidating the in vivo specificity and function of enzymes and reconstituting complex biochemical traits for study in a living model organism. Furthermore, the continuing discovery of natural mechanisms for the regulation of metabolic pathways has revealed new principles for the design of high-flux pathways with minimized metabolic burden and has inspired the development of new tools and approaches to engineering synthetic pathways in microbial hosts for chemical production.
Collapse
Affiliation(s)
- Amy M Weeks
- Department of Chemistry, University of California, Berkeley, California 94720-1460, USA
| | | |
Collapse
|
8
|
Kifer I, Nussinov R, Wolfson HJ. GOSSIP: a method for fast and accurate global alignment of protein structures. Bioinformatics 2011; 27:925-32. [PMID: 21296751 PMCID: PMC3065682 DOI: 10.1093/bioinformatics/btr044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Revised: 12/04/2010] [Accepted: 01/21/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The database of known protein structures (PDB) is increasing rapidly. This results in a growing need for methods that can cope with the vast amount of structural data. To analyze the accumulating data, it is important to have a fast tool for identifying similar structures and clustering them by structural resemblance. Several excellent tools have been developed for the comparison of protein structures. These usually address the task of local structure alignment, an important yet computationally intensive problem due to its complexity. It is difficult to use such tools for comparing a large number of structures to each other at a reasonable time. RESULTS Here we present GOSSIP, a novel method for a global all-against-all alignment of any set of protein structures. The method detects similarities between structures down to a certain cutoff (a parameter of the program), hence allowing it to detect similar structures at a much higher speed than local structure alignment methods. GOSSIP compares many structures in times which are several orders of magnitude faster than well-known available structure alignment servers, and it is also faster than a database scanning method. We evaluate GOSSIP both on a dataset of short structural fragments and on two large sequence-diverse structural benchmarks. Our conclusions are that for a threshold of 0.6 and above, the speed of GOSSIP is obtained with no compromise of the accuracy of the alignments or of the number of detected global similarities. AVAILABILITY A server, as well as an executable for download, are available at http://bioinfo3d.cs.tau.ac.il/gossip/.
Collapse
Affiliation(s)
- I Kifer
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Sackler Institute of Molecular Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
9
|
Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Curr Opin Struct Biol 2011; 21:180-8. [PMID: 21353529 DOI: 10.1016/j.sbi.2011.02.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 02/03/2011] [Indexed: 11/16/2022]
Abstract
Genomic centers discover increasingly many protein sequences and structures, but not necessarily their full biological functions. Thus, currently, less than one percent of proteins have experimentally verified biochemical activities. To fill this gap, function prediction algorithms apply metrics of similarity between proteins on the premise that those sufficiently alike in sequence, or structure, will perform identical functions. Although high sensitivity is elusive, network analyses that integrate these metrics together hold the promise of rapid gains in function prediction specificity.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | |
Collapse
|
10
|
Lin HN, Sung TY, Ho SY, Hsu WL. Improving protein secondary structure prediction based on short subsequences with local structure similarity. BMC Genomics 2010; 11 Suppl 4:S4. [PMID: 21143813 PMCID: PMC3005913 DOI: 10.1186/1471-2164-11-s4-s4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult. RESULTS In this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein's evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction.On a large non-redundant dataset of 8,297 protein chains (DsspNr-25), the average Q3 of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (EVA Set_1 and EVA_Set2), the average Q3 of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases. CONCLUSIONS Our experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at http://bio-cluster.iis.sinica.edu.tw/SymPred/.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
| | | | | | | |
Collapse
|
11
|
Wilkins AD, Lua R, Erdin S, Ward RM, Lichtarge O. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci 2010; 19:1296-311. [PMID: 20506260 DOI: 10.1002/pro.406] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top-ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top-ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure-function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.
Collapse
Affiliation(s)
- A D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|
12
|
Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. ACTA ACUST UNITED AC 2010; 14:1137-43. [PMID: 20570776 DOI: 10.1109/titb.2010.2053550] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e., amino acid residue-packing patterns shared by most family members but infrequent in other proteins. The function of a protein is inferred by identifying in it motifs characteristic of a known family. We employ these family-specific motifs to elucidate functional relationships between families in the GO and SCOP hierarchies. Specifically, we postulate that two families are functionally related if one family is statistically enriched by motifs characteristic of another family, i.e., if the number of proteins in a family containing a motif from another family is greater than expected by chance. This function-inference method can help annotate proteins of unknown function, establish functional neighbors of existing families, and help specify alternate functions for known proteins.
Collapse
Affiliation(s)
- Deepak Bandyopadhyay
- Department of Computational and Structural Chemistry, GlaxoSmithKline, Collegeville, PA UP12-210, USA.
| | | | | | | | | | | | | |
Collapse
|
13
|
Zhang ZH, Bharatham K, Sherman WA, Mihalek I. deconSTRUCT: general purpose protein database search on the substructure level. Nucleic Acids Res 2010; 38:W590-4. [PMID: 20522512 PMCID: PMC2896154 DOI: 10.1093/nar/gkq489] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore
| | | | | | | |
Collapse
|
14
|
Abstract
Motivation: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities. Results: We propose a new projection-based approach that can rapidly detect global as well as local structural similarities. Local structural search is enabled by a topology-inspired writhe decomposition protocol that produces a small number of fragments while ensuring that similar structures are cut in a similar manner. In benchmark tests, we show that our method, writher, improves accuracy over existing projection methods in terms of recognizing scop domains out of multi-domain proteins, while maintaining accuracy comparable with existing projection methods in a standard single-domain benchmark test. Availability: The source code is available at the following website: http://compbio.berkeley.edu/proj/writher/ Contact:dzhi@compbio.berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Degui Zhi
- Department of Plant and Microbial Biology, UC Berkeley and Physical Biosciences Division, LBNL, Berkeley, CA 94720, USA.
| | | | | |
Collapse
|
15
|
Zhang ZH, Lee HK, Mihalek I. Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity. BMC Bioinformatics 2010; 11:155. [PMID: 20338066 PMCID: PMC3098053 DOI: 10.1186/1471-2105-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 03/26/2010] [Indexed: 11/10/2022] Open
Abstract
Background Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within. Results Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions. Conclusions Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute, A*STAR, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | | | |
Collapse
|
16
|
Brylinski M, Skolnick J. Comparison of structure-based and threading-based approaches to protein functional annotation. Proteins 2010; 78:118-34. [PMID: 19731377 PMCID: PMC2804779 DOI: 10.1002/prot.22566] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To exploit the vast amount of sequence information provided by the Genomic revolution, the biological function of these sequences must be identified. As a practical matter, this is often accomplished by functional inference. Purely sequence-based approaches, particularly in the "twilight zone" of low sequence similarity levels, are complicated by many factors. For proteins, structure-based techniques aim to overcome these problems; however, most require high-quality crystal structures and suffer from complex and equivocal relations between protein fold and function. In this study, in extensive benchmarking, we consider a number of aspects of structure-based functional annotation: binding pocket detection, molecular function assignment and ligand-based virtual screening. We demonstrate that protein threading driven by a strong sequence profile component greatly improves the quality of purely structure-based functional annotation in the "twilight zone." By detecting evolutionarily related proteins, it considerably reduces the high false positive rate of function inference derived on the basis of global structure similarity alone. Combined evolution/structure-based function assignment emerges as a powerful technique that can make a significant contribution to comprehensive proteome annotation.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| |
Collapse
|
17
|
Abstract
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server atwww.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
Collapse
Affiliation(s)
- Thomas Margraf
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.
| | | | | |
Collapse
|
18
|
da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, Santoro MM. Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 2009; 74:727-43. [PMID: 18704933 DOI: 10.1002/prot.22187] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Carlos H da Silveira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, UFMG, Brazil.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008; 9:531. [PMID: 19077267 PMCID: PMC2628391 DOI: 10.1186/1471-2105-9-531] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 12/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. RESULTS To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. CONCLUSION Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.
Collapse
Affiliation(s)
- Shashi Bhushan Pandit
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, USA.
| | | |
Collapse
|
20
|
Lisewski AM. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication. PLoS One 2008; 3:e3110. [PMID: 18769673 PMCID: PMC2518838 DOI: 10.1371/journal.pone.0003110] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2008] [Accepted: 07/28/2008] [Indexed: 11/18/2022] Open
Abstract
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.
Collapse
Affiliation(s)
- Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America.
| |
Collapse
|
21
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
22
|
Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008; 3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein structure alignments are usually based on very different techniques to sequence alignments. We propose a method which treats sequence, structure and even combined sequence + structure in a single framework. Using a probabilistic approach, we calculate a similarity measure which can be applied to fragments containing only protein sequence, structure or both simultaneously. Results Proof-of-concept results are given for the different problems. For sequence alignments, the methodology is no better than conventional methods. For structure alignments, the techniques are very fast, reliable and tolerant of a range of alignment parameters. Combined sequence and structure alignments may provide a more reliable alignment for pairs of proteins where pure structural alignments can be misled by repetitive elements or apparent symmetries. Conclusion The probabilistic framework has an elegance in principle, merging sequence and structure descriptors into a single framework. It has a practical use in fast structural alignments and a potential use in finding those examples where sequence and structural similarities apparently disagree.
Collapse
|
23
|
Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics 2008; 9:17. [PMID: 18190718 PMCID: PMC2219985 DOI: 10.1186/1471-2105-9-17] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. RESULTS Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. CONCLUSION These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007; 8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 354] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
25
|
Shin H, Lisewski AM, Lichtarge O. Graph sharpening plus graph integration: a synergy that improves protein functional classification. ACTA ACUST UNITED AC 2007; 23:3217-24. [PMID: 17977886 DOI: 10.1093/bioinformatics/btm511] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Predicting protein function is a central problem in bioinformatics, and many approaches use partially or fully automated methods based on various combination of sequence, structure and other information on proteins or genes. Such information establishes relationships between proteins that can be modelled most naturally as edges in graphs. A priori, however, it is often unclear which edges from which graph may contribute most to accurate predictions. For that reason, one established strategy is to integrate all available sources, or graphs as in graph integration, in the hope that the positive signals will add to each other. However, in the problem of functional prediction, noise, i.e. the presence of inaccurate or false edges, can still be large enough that integration alone has little effect on prediction accuracy. In order to reduce noise levels and to improve integration efficiency, we present here a recent method in graph-based learning, graph sharpening, which provides a theoretically firm yet intuitive and practical approach for disconnecting undesirable edges from protein similarity graphs. This approach has several attractive features: it is quick, scalable in the number of proteins, robust with respect to errors and tolerant of very diverse types of protein similarity measures. RESULTS We tested the classification accuracy in a test set of 599 proteins with remote sequence homology spread over 20 Gene Ontology (GO) functional classes. When compared to integration alone, graph sharpening plus integration of four vastly different molecular similarity measures improved the overall classification by nearly 30% [0.17 average increase in the area under the ROC curve (AUC)]. Moreover, and partially through the increased sparsity of the graphs induced by sharpening, this gain in accuracy came at negligible computational cost: sharpening and integration took on average 4.66 (+/-4.44) CPU seconds. AVAILABILITY Software and Supplementary data will be available on http://mammoth.bcm.tmc.edu/
Collapse
Affiliation(s)
- Hyunjung Shin
- Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749, Suwon, Korea.
| | | | | |
Collapse
|