1
|
Sarkar T, Chen Y, Wang Y, Chen Y, Chen F, Reaux CR, Moore LE, Raghavan V, Xu W. Introducing mirror-image discrimination capability to the TSR-based method for capturing stereo geometry and understanding hierarchical structure relationships of protein receptor family. Comput Biol Chem 2023; 103:107824. [PMID: 36753783 PMCID: PMC9992349 DOI: 10.1016/j.compbiolchem.2023.107824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 01/17/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023]
Abstract
We have developed a Triangular Spatial Relationship (TSR)-based computational method for protein structure comparison and motif discovery that is both sequence and structure alignment-free. A protein 3D structure is modeled by all possible triangles that are constructed with every three Cα atoms of amino acids as vertices. Every triangle is represented using an integer (a key). The keys are calculated by a rule-based formula which is a function of a representative length, a representative angle, and the vertex labels associated with amino acids. A 3D structure is thereby represented by a vector of integers (TSR keys). Global or local structure comparisons are achieved by computing all keys or a set of keys, respectively. Many enzymatic reactions and notable marketed drugs are highly stereospecific. Thus, in this paper, we propose a modified key calculation formula by including a mechanism for discriminating mirror-image keys to capture stereo geometry. We assign a positive or a negative sign to the integers representing mirror-image keys. Applying the new key calculation function provides the ability to further discriminate mirror-image keys that were previously considered identical. As the result, applying the mirror-image discrimination capability (i) significantly increases the number of distinct keys; (ii) decreases the number of common keys; (iii) decreases structural similarity; (iv) increases the opportunity to identify specific keys for each type of the receptors. The specific keys identified in this study for the cases of without (not applying) and with (applying) mirror-image discrimination can be considered as the structure signatures that exclusively belong to a certain type of receptors. Applying mirror-image discrimination introduces stereospecificity to keys for allowing more precise modeling of ligand - target interactions. The development of mirror-image TSR keys of Cα atom, in conjunction with the integration of Cα TSR keys with all-atom TSR keys for amino acids and drugs, will lead to a new and promising computational method for aiding drug design and discovery.
Collapse
Affiliation(s)
- Titli Sarkar
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA; The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
| | - Yuwu Chen
- San Diego Supercomputer Center, University of California San Diego, Gilman Drive, La Jolla, CA 92093, USA
| | - Yu Wang
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Yixin Chen
- Department of Computer and Information Science, The University of Mississippi, MS 38677, USA
| | - Feng Chen
- High Performance Computing, Frey Computing Services Center, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Camille R Reaux
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Laura E Moore
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA.
| |
Collapse
|
2
|
Kondra S, Sarkar T, Raghavan V, Xu W. Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery. Front Chem 2021; 8:602291. [PMID: 33520934 PMCID: PMC7838567 DOI: 10.3389/fchem.2020.602291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 12/14/2020] [Indexed: 11/24/2022] Open
Abstract
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across proteins. A structure is thereby represented by a vector of integers. Our method is able to accurately quantify similarity of structure or substructure by matching numbers of identical keys between two proteins. The uniqueness of our method includes: (i) a unique way to represent structures to avoid performing structural superimposition; (ii) use of triangles to represent substructures as it is the simplest primitive to capture shape; (iii) complex structure comparison is achieved by matching integers corresponding to multiple TSRs. Every substructure of one protein is compared to every other substructure in a different protein. The method is used in the studies of proteases and kinases because they play essential roles in cell signaling, and a majority of these constitute drug targets. The new motifs or substructures we identified specifically for proteases and kinases provide a deeper insight into their structural relations. Furthermore, the method provides a unique way to study protein conformational changes. In addition, the results from CATH and SCOP data sets clearly demonstrate that our method can distinguish alpha helices from beta pleated sheets and vice versa. Our method has the potential to be developed into a powerful tool for efficient structure-BLAST search and comparison, just as BLAST is for sequence search and alignment.
Collapse
Affiliation(s)
- Sarika Kondra
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Titli Sarkar
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, Lafayette, LA, United States
| |
Collapse
|
3
|
Brinkjost T, Ehrt C, Koch O, Mutzel P. SCOT: Rethinking the classification of secondary structure elements. Bioinformatics 2020; 36:2417-2428. [PMID: 31742326 DOI: 10.1093/bioinformatics/btz826] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 10/02/2019] [Accepted: 11/16/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Secondary structure classification is one of the most important issues in structure-based analyses due to its impact on secondary structure prediction, structural alignment and protein visualization. There are still open challenges concerning helix and sheet assignments which are currently not addressed by a single multi-purpose software. RESULTS We introduce SCOT (Secondary structure Classification On Turns) as a novel secondary structure element assignment software which supports the assignment of turns, right-handed α-, 310- and π-helices, left-handed α- and 310-helices, 2.27- and polyproline II helices, β-sheets and kinks. We demonstrate that the introduction of helix Purity values enables a clear differentiation between helix classes. SCOT's unique strengths are highlighted by comparing it to six state-of-the-art methods (DSSP, STRIDE, ASSP, SEGNO, DISICL and SHAFT). The assignment approaches were compared concerning geometric consistency, protein structure quality and flexibility dependency and their impact on secondary structure element-based structural alignments. We show that only SCOT's combination of hydrogen bonds, geometric criteria and dihedral angles enables robust assignments independent of the structure quality and flexibility. We demonstrate that this combination and the elaborate kink detection lead to SCOT's clear superiority for protein alignments. As the resulting helices and strands are provided in a PDB conform output format, they can immediately be used for structure alignment algorithms. Taken together, the application of our new method and the straight-forward visualization using the accompanying PyMOL scripts enable the comprehensive analysis of regular backbone geometries in proteins. AVAILABILITY AND IMPLEMENTATION https://this-group.rocks. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tobias Brinkjost
- Department of Computer Science.,Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund 44227, Germany
| | - Christiane Ehrt
- Department of Computer Science.,Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund 44227, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund 44227, Germany
| | | |
Collapse
|
4
|
High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1245-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
5
|
HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.02.029] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
|
7
|
Mrozek D, Brożek M, Małysiak-Mrozek B. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J Mol Model 2014; 20:2067. [PMID: 24481593 PMCID: PMC3936136 DOI: 10.1007/s00894-014-2067-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 10/11/2013] [Indexed: 01/16/2023]
Abstract
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.
Collapse
Affiliation(s)
- Dariusz Mrozek
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland,
| | | | | |
Collapse
|
8
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
9
|
Protein structure alignment beyond spatial proximity. Sci Rep 2013; 3:1448. [PMID: 23486213 PMCID: PMC3596798 DOI: 10.1038/srep01448] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 02/25/2013] [Indexed: 11/08/2022] Open
Abstract
Protein structure alignment is a fundamental problem in computational structure biology. Many programs have been developed for automatic protein structure alignment, but most of them align two protein structures purely based upon geometric similarity without considering evolutionary and functional relationship. As such, these programs may generate structure alignments which are not very biologically meaningful from the evolutionary perspective. This paper presents a novel method DeepAlign for automatic pairwise protein structure alignment. DeepAlign aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary relationship and hydrogen-bonding similarity. Experimental results show that DeepAlign can generate structure alignments much more consistent with manually-curated alignments than other automatic tools especially when proteins under consideration are remote homologs. These results imply that in addition to geometric similarity, evolutionary information and hydrogen-bonding similarity are essential to aligning two protein structures.
Collapse
|
10
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
11
|
Maezato Y, Daugherty A, Dana K, Soo E, Cooper C, Tachdjian S, Kelly RM, Blum P. VapC6, a ribonucleolytic toxin regulates thermophilicity in the crenarchaeote Sulfolobus solfataricus. RNA (NEW YORK, N.Y.) 2011; 17:1381-1392. [PMID: 21622901 PMCID: PMC3138573 DOI: 10.1261/rna.2679911] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 04/15/2011] [Indexed: 05/30/2023]
Abstract
The phylum Crenarchaeota includes hyperthermophilic micro-organisms subjected to dynamic thermal conditions. Previous transcriptomic studies of Sulfolobus solfataricus identified vapBC6 as a heat-shock (HS)-inducible member of the Vap toxin-antitoxin gene family. In this study, the inactivation of the vapBC6 operon by targeted gene disruption produced two recessive phenotypes related to fitness, HS sensitivity and a heat-dependent reduction in the rate of growth. In-frame vapBC6 deletion mutants were analyzed to examine the respective roles of each protein. Since vapB6 transcript abundance was elevated in the vapC6 deletion, the VapC6 toxin appears to regulate abundance of its cognate antitoxin. In contrast, vapC6 transcript abundance was reduced in the vapB6 deletion. A putative intergenic terminator may underlie these observations by coordinating vapBC6 expression. As predicted by structural modeling, recombinant VapC6 produced using chaperone cosynthesis exhibited heat-dependent ribonucleolytic activity toward S. solfataricus total RNA. This activity could be blocked by addition of preheated recombinant VapB6. In vivo transcript targets were identified by assessing the relative expression of genes that naturally respond to thermal stress in VapBC6-deficient cells. Preferential increases were observed for dppB-1 and tetR, and preferential decreases were observed for rpoD and eIF2 gamma. Specific VapC6 ribonucleolytic action could also be demonstrated in vitro toward RNAs whose expression increased in the VapBC6-deficient strain during heat shock. These findings provide a biochemical mechanism and identify cellular targets underlying VapBC6-mediated control over microbial growth and survival at temperature extremes.
Collapse
Affiliation(s)
- Yukari Maezato
- Beadle Center for Genetics, School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588-0666, USA
| | - Amanda Daugherty
- Beadle Center for Genetics, School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588-0666, USA
| | - Karl Dana
- Beadle Center for Genetics, School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588-0666, USA
| | - Edith Soo
- Beadle Center for Genetics, School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588-0666, USA
| | - Charlotte Cooper
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695-7905, USA
| | - Sabrina Tachdjian
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695-7905, USA
| | - Robert M. Kelly
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695-7905, USA
| | - Paul Blum
- Beadle Center for Genetics, School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588-0666, USA
| |
Collapse
|
12
|
Malysiak-Mrozek B, Mrozek D. An Improved Method for Protein Similarity Searching by Alignment of Fuzzy Energy Signatures. INT J COMPUT INT SYS 2011. [DOI: 10.1080/18756891.2011.9727765] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
13
|
Abstract
Despite its apparent simplicity, the problem of quantifying the differences between two structures of the same protein or complex is nontrivial and continues evolving. In this chapter, we described several methods routinely used to compare computational models to experimental answers in several modeling assessments. The two major classes of measures, positional distance-based and contact-based, are presented, compared, and analyzed. The most popular measure of the first class, the global RMSD, is shown to be the least representative of the degree of structural similarity because it is dominated by the largest error. Several distance-dependent algorithms designed to attenuate the drawbacks of RMSD are described. Measures of the second class, contact-based, are shown to be more robust and relevant. We also illustrate the importance of using combined measures, utility-based measures, and the role of the distributions derived from the pairs of experimental structures in interpreting the results.
Collapse
Affiliation(s)
- Irina Kufareva
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | |
Collapse
|
14
|
Mrozek D, Wieczorek D, Malysiak-Mrozek B, Kozielski S. PSS-SQL: protein secondary structure - structured query language. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2010:1073-6. [PMID: 21096554 DOI: 10.1109/iembs.2010.5627303] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching a database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.
Collapse
Affiliation(s)
- Dariusz Mrozek
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.
| | | | | | | |
Collapse
|
15
|
Zhang ZH, Lee HK, Mihalek I. Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity. BMC Bioinformatics 2010; 11:155. [PMID: 20338066 PMCID: PMC3098053 DOI: 10.1186/1471-2105-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 03/26/2010] [Indexed: 11/10/2022] Open
Abstract
Background Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within. Results Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions. Conclusions Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute, A*STAR, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | | | |
Collapse
|
16
|
Improving Performance of Protein Structure Similarity Searching by Distributing Computations in Hierarchical Multi-Agent System. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-16693-8_34] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
17
|
Kim C, Tai CH, Lee B. Iterative refinement of structure-based sequence alignments by Seed Extension. BMC Bioinformatics 2009; 10:210. [PMID: 19589133 PMCID: PMC2753854 DOI: 10.1186/1471-2105-10-210] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/09/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. RESULTS RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. CONCLUSION RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.
Collapse
Affiliation(s)
- Changhoon Kim
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
18
|
Abstract
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server atwww.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
Collapse
Affiliation(s)
- Thomas Margraf
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.
| | | | | |
Collapse
|
19
|
Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008; 3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein structure alignments are usually based on very different techniques to sequence alignments. We propose a method which treats sequence, structure and even combined sequence + structure in a single framework. Using a probabilistic approach, we calculate a similarity measure which can be applied to fragments containing only protein sequence, structure or both simultaneously. Results Proof-of-concept results are given for the different problems. For sequence alignments, the methodology is no better than conventional methods. For structure alignments, the techniques are very fast, reliable and tolerant of a range of alignment parameters. Combined sequence and structure alignments may provide a more reliable alignment for pairs of proteins where pure structural alignments can be misled by repetitive elements or apparent symmetries. Conclusion The probabilistic framework has an elegance in principle, merging sequence and structure descriptors into a single framework. It has a practical use in fast structural alignments and a potential use in finding those examples where sequence and structural similarities apparently disagree.
Collapse
|
20
|
Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 2007; 8:355. [PMID: 17883866 PMCID: PMC2039753 DOI: 10.1186/1471-2105-8-355] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open
Abstract
Background Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. Results In this study, we evaluate CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA and VAST in terms of the accuracy of the sequence alignments they produce, using sequence alignments from NCBI's human-curated Conserved Domain Database (CDD) as the standard of truth. We find that 4 to 9% of the residues on average are either not aligned or aligned with more than 8 residues of shift error and that an additional 6 to 14% of residues on average are misaligned by 1–8 residues, depending on the program and the data set used. The fraction of correctly aligned residues generally decreases as the sequence similarity decreases or as the RMSD between the Cα positions of the two structures increases. It varies significantly across CDD superfamilies whether shift error is allowed or not. Also, alignments with different shift errors occur between proteins within the same CDD superfamily, leading to inconsistent alignments between superfamily members. In general, residue pairs that are more than 3.0 Å apart in the reference alignment are heavily (>= 25% on average) misaligned in the test alignments. In addition, each method shows a different pattern of relative weaknesses for different SCOP classes. CE gives relatively poor results for β-sheet-containing structures (all-β, α/β, and α+β classes), DaliLite for "others" class where all but the major four classes are combined, and LOCK2 and VAST for all-β and "others" classes. Conclusion When the sequence similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.
Collapse
Affiliation(s)
- Changhoon Kim
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute National Institutes of Health, Bethesda, Maryland, USA
| | - Byungkook Lee
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
21
|
Budiman ME, Knaggs MH, Fetrow JS, Alexander RW. Using molecular dynamics to map interaction networks in an aminoacyl-tRNA synthetase. Proteins 2007; 68:670-89. [PMID: 17510965 DOI: 10.1002/prot.21426] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Long-range functional communication is a hallmark of many enzymes that display allostery, or action-at-a-distance. Many aminoacyl-tRNA synthetases can be considered allosteric, in that their trinucleotide anticodons bind the enzyme at a site removed from their catalytic domains. Such is the case with E. coli methionyl-tRNA synthase (MetRS), which recognizes its cognate anticodon using a conserved tryptophan residue 50 A away from the site of tRNA aminoacylation. The lack of details regarding how MetRS and tRNA(Met) interact has limited efforts to deconvolute the long-range communication that occurs in this system. We have used molecular dynamics simulations to evaluate the mobility of wild-type MetRS and a Trp-461 variant shown previously by experiment to be deficient in tRNA aminoacylation. The simulations reveal that MetRS has significant mobility, particularly at structural motifs known to be involved in catalysis. Correlated motions are observed between residues in distant structural motifs, including the active site, zinc binding motif, and anticodon binding domain. Both mobility and correlated motions decrease significantly but not uniformly upon substitution at Trp-461. Mobility of some residues is essentially abolished upon removal of Trp-461, despite being tens of Angstroms away from the site of mutation and solvent exposed. This conserved residue does not simply participate in anticodon binding, as demonstrated experimentally, but appears to mediate the protein's distribution of structural ensembles. Finally, simulations of MetRS indicate that the ligand-free protein samples conformations similar to those observed in crystal structures with substrates and substrate analogs bound. Thus, there are low energetic barriers for MetRS to achieve the substrate-bound conformations previously determined by structural methods.
Collapse
Affiliation(s)
- Michael E Budiman
- Department of Chemistry, Wake Forest University, Winston-Salem, North Carolina 27109, USA
| | | | | | | |
Collapse
|
22
|
Madej T, Panchenko AR, Chen J, Bryant SH. Protein homologous cores and loops: important clues to evolutionary relationships between structurally similar proteins. BMC STRUCTURAL BIOLOGY 2007; 7:23. [PMID: 17425794 PMCID: PMC1852803 DOI: 10.1186/1472-6807-7-23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2006] [Accepted: 04/10/2007] [Indexed: 11/11/2022]
Abstract
Background To discover remote evolutionary relationships and functional similarities between proteins, biologists rely on comparative sequence analysis, and when structures are available, on structural alignments and various measures of structural similarity. The measures/scores that have most commonly been used for this purpose include: alignment length, percent sequence identity, superposition RMSD and their different combinations. More recently, we have introduced the "Homologous core structure overlap score" (HCS) and the "Loop Hausdorff Measure" (LHM). Along with these we also consider the "gapped structural alignment score" (GSAS), which was introduced earlier by other researchers. Results We analyze the performance of these and other conventional measures at the task of ranking structure neighbors by homology, and we show that the HCS, LHM, and GSAS scores display considerably improved performance over the conventional measures of sequence or structural similarity. Conclusion The HCS, LHM, and GSAS scores are easily computable quantities that allow users of structure-neighbor databases to more easily identify interesting structural similarities between proteins.
Collapse
Affiliation(s)
- Thomas Madej
- Computational Biology Branch, National Center for Biotechnology Information, Building 38A, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Anna R Panchenko
- Computational Biology Branch, National Center for Biotechnology Information, Building 38A, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Jie Chen
- Computational Biology Branch, National Center for Biotechnology Information, Building 38A, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Stephen H Bryant
- Computational Biology Branch, National Center for Biotechnology Information, Building 38A, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
23
|
Shih ESC, Gan RCR, Hwang MJ. OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006; 34:W95-8. [PMID: 16845117 PMCID: PMC1538888 DOI: 10.1093/nar/gkl264] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The large number of experimentally determined protein 3D structures is a rich resource for studying protein function and evolution, and protein structure comparison (PSC) is a key method for such studies. When comparing two protein structures, almost all currently available PSC servers report a single and sequential (i.e. topological) alignment, whereas the existence of good alternative alignments, including those involving permutations (i.e. non-sequential or non-topological alignments), is well known. We have recently developed a novel PSC method that can detect alternative alignments of statistical significance (alignment similarity P-value <10−5), including structural permutations at all levels of complexity. OPAAS, the server of this PSC method freely accessible at our website (), provides an easy-to-read hierarchical layout of output to display detailed information on all of the significant alternative alignments detected. Because these alternative alignments can offer a more complete picture on the structural, evolutionary and functional relationship between two proteins, OPAAS can be used in structural bioinformatics research to gain additional insight that is not readily provided by existing PSC servers.
Collapse
Affiliation(s)
| | | | - Ming-Jing Hwang
- To whom correspondence should be addressed. Tel: +886 2 2789 9033; Fax: +886 2 2788 7641;
| |
Collapse
|
24
|
Abstract
MOTIVATION With the increasing availability of protein structures, the generation of biologically meaningful 3D patterns from the simultaneous alignment of several protein structures is an exciting prospect: active sites could be better understood, protein functions and protein 3D structures could be predicted more accurately. Although patterns can already be generated at the fold and topological levels, no system produces high-resolution 3D patterns including atom and cavity positions. To address this challenge, our research focuses on generating patterns from proteins with rigid prosthetic groups. Since these groups are key elements of protein active sites, the generated 3D patterns are expected to be biologically meaningful. RESULTS In this paper, we present a new approach which allows the generation of 3D patterns from proteins with rigid prosthetic groups. Using 237 protein chains representing proteins containing porphyrin rings, our method was validated by comparing 3D templates generated from homologues with the 3D structure of the proteins they model. Atom positions were predicted reliably: 93% of them had an accuracy of 1.00 A or less. Moreover, similar results were obtained regarding chemical group and cavity positions. Results also suggested our system could contribute to the validation of 3D protein models. Finally, a 3D template was generated for the active site of human cytochrome P450 CYP17, the 3D structure of which is unknown. Its analysis showed that it is biologically meaningful: our method detected the main patterns of the cytochrome P450 superfamily and the motifs linked to catalytic reactions. The 3D template also suggested the position of a residue, which could be involved in a hydrogen bond with CYP17 substrates and the shape and location of a cavity. Comparisons with independently generated 3D models comforted these hypotheses. AVAILABILITY Alignment software (Nestor3D) is available at http://www.kingston.ac.uk/~ku33185/Nestor3D.html
Collapse
Affiliation(s)
- Jean-Christophe Nebel
- Faculty of Computing, Information Systems & Mathematics, Kingston University Kingston-upon-Thames, Surrey KT1 2EE, UK.
| |
Collapse
|
25
|
Cheng H, Grishin NV. DOM-fold: a structure with crossing loops found in DmpA, ornithine acetyltransferase, and molybdenum cofactor-binding domain. Protein Sci 2005; 14:1902-10. [PMID: 15937278 PMCID: PMC2253344 DOI: 10.1110/ps.051364905] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Understanding relationships between sequence, structure, and evolution is important for functional characterization of proteins. Here, we define a novel DOM-fold as a consensus structure of the domains in DmpA (L-aminopeptidase D-Ala-esterase/amidase), OAT (ornithine acetyltransferase), and MocoBD (molybdenum cofactor-binding domain), and discuss possible evolutionary scenarios of its origin. As shown by a comprehensive structure similarity search, DOM-fold distinguished by a two-layered beta/alpha architecture of a particular topology with unusual crossing loops is unique to those three protein families. DmpA and OAT are evolutionarily related as indicated by their sequence, structural, and functional similarities. Structural similarity between the DmpA/OAT superfamily and the MocoBD domains has not been reported before. Contrary to previous reports, we conclude that functional similarities between DmpA/OAT proteins and N-terminal nucleophile (Ntn) hydrolases are convergent and are unlikely to be inherited from a common ancestor.
Collapse
Affiliation(s)
- Hua Cheng
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, 75390-9050, USA
| | | |
Collapse
|
26
|
Bertaccini EJ, Shapiro J, Brutlag DL, Trudell JR. Homology Modeling of a Human Glycine Alpha 1 Receptor Reveals a Plausible Anesthetic Binding Site. J Chem Inf Model 2004; 45:128-35. [PMID: 15667138 DOI: 10.1021/ci0497399] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The superfamily of ligand-gated ion channels (LGICs) has been implicated in anesthetic and alcohol responses. Mutations within glycine and GABA receptors have demonstrated that possible sites of anesthetic action exist within the transmembrane subunits of these receptors. The exact molecular arrangement of this transmembrane region remains at intermediate resolution with current experimental techniques. Homology modeling methods were therefore combined with experimental data to produce a more exact model of this region. A consensus from multiple bioinformatics techniques predicted the topology within the transmembrane domain of a glycine alpha one receptor (GlyRa1) to be alpha helical. This fold information was combined with sequence information using the SeqFold algorithm to search for modeling templates. Independently, the FoldMiner algorithm was used to search for templates that had structural folds similar to published coordinates of the homologous nAChR (1OED). Both SeqFold and Foldminer identified the same modeling template. The GlyRa1 sequence was aligned with this template using multiple scoring criteria. Refinement of the alignment closed gaps to produce agreement with labeling studies carried out on the homologous receptors of the superfamily. Structural assignment and refinement was achieved using Modeler. The final structure demonstrated a cavity within the core of a four-helix bundle. Residues known to be involved in modulating anesthetic potency converge on and line this cavity. This suggests that the binding sites for volatile anesthetics in the LGICs are the cavities formed within the core of transmembrane four-helix bundles.
Collapse
Affiliation(s)
- Edward J Bertaccini
- Department of Anesthesia, Stanford University School of Medicine, Stanford, California 94305-5117, USA.
| | | | | | | |
Collapse
|