1
|
Herrera-Morande A, Castro-Fernández V, Merino F, Ramírez-Sarmiento CA, Fernández FJ, Vega MC, Guixé V. Protein topology determines substrate-binding mechanism in homologous enzymes. Biochim Biophys Acta Gen Subj 2018; 1862:2869-2878. [PMID: 30251675 DOI: 10.1016/j.bbagen.2018.09.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 08/21/2018] [Accepted: 09/11/2018] [Indexed: 10/28/2022]
Abstract
During evolution, some homologs proteins appear with different connectivity between secondary structures (different topology) but conserving the tridimensional arrangement of them (same architecture). These events can produce two types of arrangements; circular permutation or non-cyclic permutations. The first one results in the N and C terminus transferring to a different position on a protein sequence while the second refers to a more complex arrangement of the structural elements. In ribokinase superfamily, two different topologies can be identified, which are related to each other as a non-cyclic permutation occurred during the evolution. Interestingly, this change in topology is correlated with the nucleotide specificity of its members. Thereby, the connectivity of the secondary elements allows us to distinguish an ATP-dependent and an ADP-dependent topology. Here we address the impact of introducing the topology of a homologous ATP-dependent kinase in an ADP-dependent kinase (Thermococcus litoralis glucokinase) in the structure, nucleotide specificity, and substrate binding order of the engineered enzyme. Structural evidence demonstrates that rewiring the topology of TlGK leads to an active and soluble enzyme without modifications on its three-dimensional architecture. The permuted enzyme (PerGK) retains the nucleotide preference of the parent TlGK enzyme but shows a change in the substrate binding order. Our results illustrate how the rearrangement of the protein folding topology during the evolution of the ribokinase superfamily enzymes may have dictated the substrate-binding order in homologous enzymes of this superfamily.
Collapse
Affiliation(s)
| | | | - Felipe Merino
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | | | - Francisco J Fernández
- Centro de Investigaciones Biológicas (CIB-CSIC), Structural and Chemical Biology Dep., Madrid, Spain
| | - M Cristina Vega
- Centro de Investigaciones Biológicas (CIB-CSIC), Structural and Chemical Biology Dep., Madrid, Spain.
| | - Victoria Guixé
- Departamento de Biología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile.
| |
Collapse
|
2
|
Cui X, Naveed H, Gao X. Finding optimal interaction interface alignments between biological complexes. Bioinformatics 2015; 31:i133-41. [PMID: 26072475 PMCID: PMC4765866 DOI: 10.1093/bioinformatics/btv242] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation: Biological molecules perform their functions through interactions with other molecules. Structure alignment of interaction interfaces between biological complexes is an indispensable step in detecting their structural similarities, which are keys to understanding their evolutionary histories and functions. Although various structure alignment methods have been developed to successfully access the similarities of protein structures or certain types of interaction interfaces, existing alignment tools cannot directly align arbitrary types of interfaces formed by protein, DNA or RNA molecules. Specifically, they require a ‘blackbox preprocessing’ to standardize interface types and chain identifiers. Yet their performance is limited and sometimes unsatisfactory. Results: Here we introduce a novel method, PROSTA-inter, that automatically determines and aligns interaction interfaces between two arbitrary types of complex structures. Our method uses sequentially remote fragments to search for the optimal superimposition. The optimal residue matching problem is then formulated as a maximum weighted bipartite matching problem to detect the optimal sequence order-independent alignment. Benchmark evaluation on all non-redundant protein–DNA complexes in PDB shows significant performance improvement of our method over TM-align and iAlign (with the ‘blackbox preprocessing’). Two case studies where our method discovers, for the first time, structural similarities between two pairs of functionally related protein–DNA complexes are presented. We further demonstrate the power of our method on detecting structural similarities between a protein–protein complex and a protein–RNA complex, which is biologically known as a protein–RNA mimicry case. Availability and implementation: The PROSTA-inter web-server is publicly available at http://www.cbrc.kaust.edu.sa/prosta/. Contact:xin.gao@kaust.edu.sa
Collapse
Affiliation(s)
- Xuefeng Cui
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Hammad Naveed
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
3
|
Naveed H, Hameed US, Harrus D, Bourguet W, Arold ST, Gao X. An integrated structure- and system-based framework to identify new targets of metabolites and known drugs. Bioinformatics 2015; 31:3922-9. [PMID: 26286808 PMCID: PMC4673972 DOI: 10.1093/bioinformatics/btv477] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open
Abstract
Motivation: The inherent promiscuity of small molecules towards protein targets impedes our understanding of healthy versus diseased metabolism. This promiscuity also poses a challenge for the pharmaceutical industry as identifying all protein targets is important to assess (side) effects and repositioning opportunities for a drug. Results: Here, we present a novel integrated structure- and system-based approach of drug-target prediction (iDTP) to enable the large-scale discovery of new targets for small molecules, such as pharmaceutical drugs, co-factors and metabolites (collectively called ‘drugs’). For a given drug, our method uses sequence order–independent structure alignment, hierarchical clustering and probabilistic sequence similarity to construct a probabilistic pocket ensemble (PPE) that captures promiscuous structural features of different binding sites on known targets. A drug’s PPE is combined with an approximation of its delivery profile to reduce false positives. In our cross-validation study, we use iDTP to predict the known targets of 11 drugs, with 63% sensitivity and 81% specificity. We then predicted novel targets for these drugs—two that are of high pharmacological interest, the peroxisome proliferator-activated receptor gamma and the oncogene B-cell lymphoma 2, were successfully validated through in vitro binding experiments. Our method is broadly applicable for the prediction of protein-small molecule interactions with several novel applications to biological research and drug development. Availability and implementation: The program, datasets and results are freely available to academic users at http://sfb.kaust.edu.sa/Pages/Software.aspx. Contact:xin.gao@kaust.edu.sa and stefan.arold@kaust.edu.sa Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hammad Naveed
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center
| | - Umar S Hameed
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Deborah Harrus
- Inserm U1054, Centre de Biochimie Structurale and CNRS UMR5048, Universités Montpellier 1 & 2, Montpellier, France
| | - William Bourguet
- Inserm U1054, Centre de Biochimie Structurale and CNRS UMR5048, Universités Montpellier 1 & 2, Montpellier, France
| | - Stefan T Arold
- Computational Bioscience Research Center, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center
| |
Collapse
|
4
|
Unique Toll-Like Receptor 4 Activation by NAMPT/PBEF Induces NFκB Signaling and Inflammatory Lung Injury. Sci Rep 2015; 5:13135. [PMID: 26272519 PMCID: PMC4536637 DOI: 10.1038/srep13135] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 07/20/2015] [Indexed: 02/07/2023] Open
Abstract
Ventilator-induced inflammatory lung injury (VILI) is mechanistically linked to increased NAMPT transcription and circulating levels of nicotinamide phosphoribosyl-transferase (NAMPT/PBEF). Although VILI severity is attenuated by reduced NAMPT/PBEF bioavailability, the precise contribution of NAMPT/PBEF and excessive mechanical stress to VILI pathobiology is unknown. We now report that NAMPT/PBEF induces lung NFκB transcriptional activities and inflammatory injury via direct ligation of Toll-like receptor 4 (TLR4). Computational analysis demonstrated that NAMPT/PBEF and MD-2, a TLR4-binding protein essential for LPS-induced TLR4 activation, share ~30% sequence identity and exhibit striking structural similarity in loop regions critical for MD-2-TLR4 binding. Unlike MD-2, whose TLR4 binding alone is insufficient to initiate TLR4 signaling, NAMPT/PBEF alone produces robust TLR4 activation, likely via a protruding region of NAMPT/PBEF (S402-N412) with structural similarity to LPS. The identification of this unique mode of TLR4 activation by NAMPT/PBEF advances the understanding of innate immunity responses as well as the untoward events associated with mechanical stress-induced lung inflammation.
Collapse
|
5
|
Adjeroh D, Jiang Y, Jiang BH, Lin J. Network analysis of circular permutations in multidomain proteins reveals functional linkages for uncharacterized proteins. Cancer Inform 2015; 13:109-24. [PMID: 25741177 PMCID: PMC4338801 DOI: 10.4137/cin.s14059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/23/2014] [Accepted: 09/24/2014] [Indexed: 01/19/2023] Open
Abstract
Various studies have implicated different multidomain proteins in cancer. However, there has been little or no detailed study on the role of circular multidomain proteins in the general problem of cancer or on specific cancer types. This work represents an initial attempt at investigating the potential for predicting linkages between known cancer-associated proteins with uncharacterized or hypothetical multidomain proteins, based primarily on circular permutation (CP) relationships. First, we propose an efficient algorithm for rapid identification of both exact and approximate CPs in multidomain proteins. Using the circular relations identified, we construct networks between multidomain proteins, based on which we perform functional annotation of multidomain proteins. We then extend the method to construct subnetworks for selected cancer subtypes, and performed prediction of potential link-ages between uncharacterized multidomain proteins and the selected cancer types. We include practical results showing the performance of the proposed methods.
Collapse
Affiliation(s)
- Donald Adjeroh
- Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| | - Yue Jiang
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| | - Bing-Hua Jiang
- Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jie Lin
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| |
Collapse
|
6
|
Jimenez-Morales D, Adamian L, Shi D, Liang J. Lysine carboxylation: unveiling a spontaneous post-translational modification. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014; 70:48-57. [PMID: 24419378 PMCID: PMC3919261 DOI: 10.1107/s139900471302364x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/22/2013] [Indexed: 11/10/2022]
Abstract
The carboxylation of lysine residues is a post-translational modification (PTM) that plays a critical role in the catalytic mechanisms of several important enzymes. It occurs spontaneously under certain physicochemical conditions, but is difficult to detect experimentally. Its full impact is unknown. In this work, the signature microenvironment of lysine-carboxylation sites has been characterized. In addition, a computational method called Predictor of Lysine Carboxylation (PreLysCar) for the detection of lysine carboxylation in proteins with available three-dimensional structures has been developed. The likely prevalence of lysine carboxylation in the proteome was assessed through large-scale computations. The results suggest that about 1.3% of large proteins may contain a carboxylated lysine residue. This unexpected prevalence of lysine carboxylation implies an enrichment of reactions in which it may play functional roles. The results also suggest that by switching enzymes on and off under appropriate physicochemical conditions spontaneous PTMs may serve as an important and widely used efficient biological machinery for regulation.
Collapse
Affiliation(s)
- David Jimenez-Morales
- Department of Bioengineering, University of Illinois at Chicago, 851 South Morgan Street, Room 218, Chicago, IL 60607, USA
| | - Larisa Adamian
- Department of Bioengineering, University of Illinois at Chicago, 851 South Morgan Street, Room 218, Chicago, IL 60607, USA
| | - Dashuang Shi
- Children’s National Medical Center, Center for Genetic Medicine Research, 111 Michigan Avenue NW, Washington, DC 20010-2970, USA
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 851 South Morgan Street, Room 218, Chicago, IL 60607, USA
| |
Collapse
|
7
|
Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013; 14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Computational Science and Engineering, Nagoya University, Nagoya 464-8603, Japan
| | | | | |
Collapse
|
8
|
Yang Y, Zhan J, Zhao H, Zhou Y. A new size-independent score for pairwise protein structure alignment and its application to structure classification and nucleic-acid binding prediction. Proteins 2012; 80:2080-8. [PMID: 22522696 DOI: 10.1002/prot.24100] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Revised: 04/13/2012] [Accepted: 04/17/2012] [Indexed: 11/12/2022]
Abstract
A structure alignment program aligns two structures by optimizing a scoring function that measures structural similarity. It is highly desirable that such scoring function is independent of the sizes of proteins in comparison so that the significance of alignment across different sizes of the protein regions aligned is comparable. Here, we developed a new score called SP-score that fixes the cutoff distance at 4 Å and removed the size dependence using a normalization prefactor. We further built a program called SPalign that optimizes SP-score for structure alignment. SPalign was applied to recognize proteins within the same structure fold and having the same function of DNA or RNA binding. For fold discrimination, SPalign improves sensitivity over TMalign for the chain-level comparison by 12% and over DALI for the domain-level comparison by 13% at the same specificity of 99.6%. The difference between TMalign and SPalign at the chain level is due to the inability of TMalign to detect single domain similarity between multidomain proteins. For recognizing nucleic acid binding proteins, SPalign consistently improves over TMalign by 12% and DALI by 31% in average value of Mathews correlation coefficients for four datasets. SPalign with default setting is 14% faster than TMalign. SPalign is expected to be useful for function prediction and comparing structures with or without domains defined. The source code for SPalign and the server are available at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Yuedong Yang
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
9
|
Zhang J, Bian Y, Lin H, Wang W. RNA fragment modeling with a nucleobase discrete-state model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:021909. [PMID: 22463246 DOI: 10.1103/physreve.85.021909] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 12/30/2011] [Indexed: 05/24/2023]
Abstract
In this work we develop an approach for predicting the tertiary structures of RNA fragments by combining an RNA nucleobase discrete state (RNAnbds) model, a sequential Monte Carlo method, and a statistical potential. The RNAnbds model is designed for optimizing the configuration of nucleobases with respect to their preceding ones along the sequence and their spatial neighbors, in contrast to previous works that focus on RNA backbones. The tests of our approach with the fragments taken from a small RNA pseudoknot and a 23S ribosome RNA show that for short fragments (<10 nucleotides), the root mean square deviations (RMSDs) between the predicted and the experimental ones are generally smaller than 3 Å; for slightly longer fragments (10-15 nucleotides), most RMSDs are smaller than 4 Å. The comparison of our method with another physics-based predictor with a testing set containing nine loops shows that ours is superior in both accuracy and efficiency. Our approach is useful in facilitating RNA three-dimensional structure prediction as well as loop modeling. It also holds the promise of providing insight into the structural ensembles of RNA loops.
Collapse
Affiliation(s)
- Jian Zhang
- National Laboratory of Solid State Microstructure and School of Business, Nanjing University, China
| | | | | | | |
Collapse
|
10
|
Zemla AT, Lang DM, Kostova T, Andino R, Ecale Zhou CL. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase. BMC Bioinformatics 2011; 12:226. [PMID: 21635786 PMCID: PMC3121648 DOI: 10.1186/1471-2105-12-226] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Accepted: 06/02/2011] [Indexed: 12/15/2022] Open
Abstract
Background Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Results Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. Conclusions StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Collapse
Affiliation(s)
- Adam T Zemla
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA.
| | | | | | | | | |
Collapse
|
11
|
Teyra J, Hawkins J, Zhu H, Pisabarro MT. Studies on the inference of protein binding regions across fold space based on structural similarities. Proteins 2011; 79:499-508. [PMID: 21069715 DOI: 10.1002/prot.22897] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The emerging picture of a continuous protein fold space highlights the existence of non obvious structural similarities between proteins with apparent different topologies. The identification of structure resemblances across fold space and the analysis of similar recognition regions may be a valuable source of information towards protein structure-based functional characterization. In this work, we use non-sequential structural alignment methods (ns-SAs) to identify structural similarities between protein pairs independently of their SCOP hierarchy, and we calculate the significance of binding region conservation using the interacting residues overlap in the ns-SA. We cluster the binding inferences for each family to distinguish already known family binding regions from putative new ones. Our methodology exploits the enormous amount of data available in the PDB to identify binding region similarities within protein families and to propose putative binding regions. Our results indicate that there is a plethora of structurally common binding regions among proteins, independently of current fold classifications. We obtain a 6- to 8-fold enrichment of novel binding regions, and identify binding inferences for 728 protein families that so far lack binding information in the PDB. We explore binding mode analogies between ligands from commonly clustered binding regions to investigate the utility of our methodology. A comprehensive analysis of the obtained binding inferences may help in the functional characterization of protein recognition and assist rational engineering. The data obtained in this work is available in the download link at www.scowlp.org.
Collapse
Affiliation(s)
- Joan Teyra
- Structural Bioinformatics, BIOTEC, Technical University of Dresden, Tatzberg 47-51, 01307 Dresden, Germany.
| | | | | | | |
Collapse
|
12
|
Liu W, Srivastava A, Zhang J. A mathematical framework for protein structure comparison. PLoS Comput Biol 2011; 7:e1001075. [PMID: 21304929 PMCID: PMC3033361 DOI: 10.1371/journal.pcbi.1001075] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Accepted: 01/04/2011] [Indexed: 11/29/2022] Open
Abstract
Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set. Protein structure comparison is important for understanding the evolutionary relationships among proteins, predicting protein functions, and predicting protein structures. Despite its importance, there have been no rigorous mathematical or statistical frameworks for protein structure comparison. One notable issue in this field is that with many different similarity measures used in comparing protein structures, none of them are proper distances when protein structures of different sequences are compared. In this study, we develop a mathematical framework for protein structure comparison by treating protein structures as three dimensional curves. A formal distance, geodesic distance, can be computed for any two protein structures. In this framework, population-specific variations within protein families can be characterized through building probability distributions for structures of protein families. The mean and covariance computed from groups of protein structures can also help to improve the classifications of protein structures. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions.
Collapse
Affiliation(s)
- Wei Liu
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail: (AS); (JZ)
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail: (AS); (JZ)
| |
Collapse
|
13
|
Dundas J, Adamian L, Liang J. Structural signatures of enzyme binding pockets from order-independent surface alignment: a study of metalloendopeptidase and NAD binding proteins. J Mol Biol 2010; 406:713-29. [PMID: 21145898 DOI: 10.1016/j.jmb.2010.12.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 10/14/2010] [Accepted: 12/03/2010] [Indexed: 10/18/2022]
Abstract
Detecting similarities between local binding surfaces can facilitate identification of enzyme binding sites and prediction of enzyme functions, and aid in our understanding of enzyme mechanisms. Constructing a template of local surface characteristics for a specific enzyme function or binding activity is a challenging task, as the size and shape of the binding surfaces of a biochemical function often vary. Here we introduce the concept of signature binding pockets, which captures information on preserved and varied atomic positions at multiresolution levels. For proteins with complex enzyme binding and activity, multiple signatures arise naturally in our model, forming a signature basis set that characterizes this class of proteins. Both signatures and signature basis sets can be automatically constructed by a method called SOLAR (Signature Of Local Active Regions). This method is based on a sequence-order-independent alignment of computed binding surface pockets. SOLAR also provides a structure-based multiple sequence fragment alignment to facilitate the interpretation of computed signatures. By studying a family of evolutionarily related proteins, we show that for metzincin metalloendopeptidase, which has a broad spectrum of substrate binding, signature and basis set pockets can be used to discriminate metzincins from other enzymes, to predict the subclass of metzincins functions, and to identify specific binding surfaces. Studying unrelated proteins that have evolved to bind to the same NAD cofactor, we constructed signatures of NAD binding pockets and used them to predict NAD binding proteins and to locate NAD binding pockets. By measuring preservation ratio and location variation, our method can identify residues and atoms that are important for binding affinity and specificity. In both cases, we show that signatures and signature basis set reveal significant biological insight.
Collapse
Affiliation(s)
- Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 835 South Wolcott, Chicago, IL 60612, USA
| | | | | |
Collapse
|
14
|
Chu CH, Lo WC, Wang HW, Hsu YC, Hwang JK, Lyu PC, Pai TW, Tang CY. Detection and alignment of 3D domain swapping proteins using angle-distance image-based secondary structural matching techniques. PLoS One 2010; 5:e13361. [PMID: 20976204 PMCID: PMC2955075 DOI: 10.1371/journal.pone.0013361] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 09/13/2010] [Indexed: 11/18/2022] Open
Abstract
This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had “opened” their “closed” structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications.
Collapse
Affiliation(s)
- Chia-Han Chu
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Hsin-Wei Wang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Yen-Chu Hsu
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Ping-Chiang Lyu
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Tun-Wen Pai
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| |
Collapse
|
15
|
Temiz NA, Trapp A, Prokopyev OA, Camacho CJ. Optimization of minimum set of protein-DNA interactions: a quasi exact solution with minimum over-fitting. ACTA ACUST UNITED AC 2009; 26:319-25. [PMID: 19965883 PMCID: PMC2815656 DOI: 10.1093/bioinformatics/btp664] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger–DNA complexes with binding data from 11 mutants, 7 from EGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code. Results: Based on the structural models of feasible interaction networks for 35 mutants of EGR–DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein–DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes. Contact:ccamacho@pitt.edu; droleg@pitt.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- N A Temiz
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | | | | | |
Collapse
|
16
|
Zhang J, Dundas J, Lin M, Chen R, Wang W, Liang J. Prediction of geometrically feasible three-dimensional structures of pseudoknotted RNA through free energy estimation. RNA (NEW YORK, N.Y.) 2009; 15:2248-63. [PMID: 19864433 PMCID: PMC2779689 DOI: 10.1261/rna.1723609] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 09/05/2009] [Indexed: 05/07/2023]
Abstract
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner's energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner's model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner's rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | | | | | | | | | |
Collapse
|
17
|
Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 2009; 19:341-8. [PMID: 19481444 DOI: 10.1016/j.sbi.2009.04.003] [Citation(s) in RCA: 303] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 04/16/2009] [Indexed: 11/30/2022]
Abstract
Structure comparison opens a window into the distant past of protein evolution, which has been unreachable by sequence comparison alone. With 55,000 entries in the Protein Data Bank and about 500 new structures added each week, automated processing, comparison, and classification are necessary. A variety of methods use different representations, scoring functions, and optimization algorithms, and they generate contradictory results even for moderately distant structures. Sequence mutations, insertions, and deletions are accommodated by plastic deformations of the common core, retaining the precise geometry of the active site, and peripheral regions may refold completely. Therefore structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments. Active research directions include both the search for fold invariant features and the modeling of structural transitions in evolution. Advances have been made in algorithmic robustness, multiple alignment, and speeding up database searches.
Collapse
Affiliation(s)
- Hitomi Hasegawa
- Institute of Biotechnology, University of Helsinki, P.O. Box 56 (Viikinkaari 5), 00014 University of Helsinki, Finland
| | | |
Collapse
|
18
|
Functional discrimination of sea anemone neurotoxins using 3D-plotting. Open Life Sci 2009. [DOI: 10.2478/s11535-008-0064-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AbstractOne of the most important goals in structural biology is the identification of functional relationships among the structure of proteins and peptides. The purpose of this study was to (1) generate a model based on theoretical and computational considerations among amino acid sequences within select neurotoxin peptides, and (2) compare the relationship these values have to the various toxins tested. We employed isolated neurotoxins from sea anemones with established specific potential to act on voltage-dependent sodium and potassium channel activity as our model. Values were assigned to each amino acid in the peptide sequence of the neurotoxins tested using the Number of Lareo and Acevedo algorithm (NULA). Once the NULA number was obtained, it was then plotted using three dimensional space coordinates. The results of this study allow us to report, for the first time, that there is a different numerical and functional relationship between the sequences of amino acids from sea anemone neurotoxins, and the resulting numerical relationship for each peptide, or NULA number, has a unique location in three-dimensional space.
Collapse
|
19
|
Terzulli AJ, Kosman DJ. The Fox1 ferroxidase of Chlamydomonas reinhardtii: a new multicopper oxidase structural paradigm. J Biol Inorg Chem 2008; 14:315-25. [PMID: 19023602 DOI: 10.1007/s00775-008-0450-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Accepted: 11/06/2008] [Indexed: 12/19/2022]
Abstract
Multicopper oxidases (MCO) contain at least four copper atoms arrayed in three distinct ligand fields supported by two canonical structural features: (1) multiples of the cupredoxin fold and (2) four unique sequence elements that include the ten histidine and one cysteine ligands to the four copper atoms. Ferroxidases are a subfamily of MCO proteins that contain residues supporting a specific reactivity towards ferrous iron; these MCOs play a vital role in iron metabolism in bacteria, algae, fungi, and mammals. In contrast to the fungal ferroxidases, e.g., Fet3p from Saccharomyces cerevisiae, the mammalian ceruloplasmin (Cp) is twice as large (six vs. three cupredoxin domains) and contains three type 1, or "blue," copper sites. Chlamydomonas reinhardtii expresses a putative ferroxidase, Fox1, which has sequence similarity to human Cp (hCp). Eschewing the standard sequence-based modeling paradigm, we have constructed a function-based model of the Fox1 protein which replicates hCp's six copper-site ligand arrays with an overall root mean square deviation of 1.4 A. Analysis of this model has led also to assignment of motifs in Fox1 that are unique to ferroxidases, the strongest evidence to date that the well-characterized fungal high-affinity iron uptake system is essential to iron homeostasis in green algae. The model of Fox1 also establishes a subfamily of MCO proteins with a noncanonical copper-ligand organization. These diverse structures suggest alternative mechanisms for intramolecular electron transfer and require a new trajectory for the evolution of the MCO superfamily.
Collapse
Affiliation(s)
- Alaina J Terzulli
- Department of Biochemistry, School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | | |
Collapse
|
20
|
Dundas J, Binkowski TA, DasGupta B, Liang J. Topology independent protein structural alignment. BMC Bioinformatics 2007; 8:388. [PMID: 17937816 PMCID: PMC2096629 DOI: 10.1186/1471-2105-8-388] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Accepted: 10/15/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying structurally similar proteins with different chain topologies can aid studies in homology modeling, protein folding, protein design, and protein evolution. These include circular permuted protein structures, and the more general cases of non-cyclic permutations between similar structures, which are related by non-topological rearrangement beyond circular permutation. We present a method based on an approximation algorithm that finds sequence-order independent structural alignments that are close to optimal. We formulate the structural alignment problem as a special case of the maximum-weight independent set problem, and solve this computationally intensive problem approximately by iteratively solving relaxations of a corresponding integer programming problem. The resulting structural alignment is sequence order independent. Our method is also insensitive to insertions, deletions, and gaps. RESULTS Using a novel similarity score and a statistical model for significance p-value, we are able to discover previously unknown circular permuted proteins between nucleoplasmin-core protein and auxin binding protein, between aspartate rasemase and 3-dehydrogenate dehydralase, as well as between migration inhibition factor and arginine repressor which involves an additional strand-swapping. We also report the finding of non-cyclic permuted protein structures existing in nature between AML1/core binding factor and ribofalvin synthase. Our method can be used for large scale alignment of protein structures regardless of the topology. CONCLUSION The approximation algorithm introduced in this work can find good solutions for the problem of protein structure alignment. Furthermore, this algorithm can detect topological differences between two spatially similar protein structures. The alignment between MIF and the arginine repressor demonstrates our algorithm's ability to detect structural similarities even when spatial rearrangement of structural units has occurred. The effectiveness of our method is also demonstrated by the discovery of previously unknown circular permutations. In addition, we report in this study the finding of a naturally occurring non-cyclic permuted protein between AML1/Core Binding Factor chain F and riboflavin synthase chain A.
Collapse
Affiliation(s)
- Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7053, USA.
| | | | | | | |
Collapse
|