1
|
Identification of an ideal-like fingerprint for a protein fold using overlapped conserved residues based approach. Sci Rep 2014; 4:5643. [PMID: 25008052 PMCID: PMC4090624 DOI: 10.1038/srep05643] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 06/19/2014] [Indexed: 02/04/2023] Open
Abstract
Design of an efficient fingerprint that detects homologous proteins at distant sequence identity has been a great challenge. This paper proposes a strategy to extract an ideal-like fingerprint with high specificity and sensitivity from a group of sequences related to a fold. The approach is devised based on the assumptions that the critical residues for a protein fold may be conserved in three aspects, i.e. sequence, structure, and intramolecular interaction, and embedded in secondary structures. We hypothesized that the residues satisfying such conditions simultaneously may work as an efficient fingerprint. This idea was tested on protein folds of various classes, such as beta-strand rich, alpha + beta proteins and alpha/beta proteins with discrete sequence similarities. The fingerprint for each fold was generated by selecting the overlapped conserved residues (OCR) from the conserved residues obtained using independent three alignment methods, i.e. multiple sequence alignment, structure-based alignment, and alignment based on the interstrand hydrogen-bonds. The OCR fingerprints showed more than 90% detection efficiency for all the folds tested and were identified to be almost the minimal fingerprints composed of only critical residues. This study is expected to provide an important conceptual improvement in the identification or design of ideal fingerprints for a protein fold.
Collapse
|
2
|
Phelps C, Gburcik V, Suslova E, Dudek P, Forafonov F, Bot N, MacLean M, Fagan RJ, Picard D. Fungi and animals may share a common ancestor to nuclear receptors. Proc Natl Acad Sci U S A 2006; 103:7077-81. [PMID: 16636289 PMCID: PMC1459020 DOI: 10.1073/pnas.0510080103] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Nuclear receptors (NRs) are a large family of transcription factors. One hallmark of this family is the ligand-binding domain (LBD), for its primary sequence, structure, and regulatory function. To date, NRs have been found exclusively in animals and sponges, which has led to the generally accepted notion that they arose with them. We have overcome the limitations of primary sequence searches by combining sequence profile searches with structural predictions at a genomic scale, and have discovered that the heterodimeric transcription factors Oaf1/Pip2 of the budding yeast Saccharomyces cerevisiae contain putative LBDs resembling those of animal NRs. Although the Oaf1/Pip2 LBDs are embedded in an entirely different architecture, the regulation and function of these transcription factors are strikingly similar to those of the mammalian NR heterodimer peroxisome proliferator-activated receptor alpha/retinoid X receptor (PPAR alpha/RXR). We demonstrate that the induction of Oaf1/Pip2 activity by the fatty acid oleate depends on oleate's direct binding to the Oaf1 LBD. The alteration of two amino acids in the predicted ligand-binding pocket of Oaf1 abolishes both ligand binding and the transcriptional response. Hence, LBDs may have arisen as allosteric switches, for example, to respond to nutritional and metabolic ligands, before the animal and fungal lineages diverged.
Collapse
Affiliation(s)
- Chris Phelps
- *Inpharmatica Ltd., 60 Charlotte Street, London W1T 2NU, United Kingdom
| | - Valentina Gburcik
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Elena Suslova
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Peter Dudek
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Fedor Forafonov
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Nathalie Bot
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Morag MacLean
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
| | - Richard J. Fagan
- *Inpharmatica Ltd., 60 Charlotte Street, London W1T 2NU, United Kingdom
| | - Didier Picard
- Département de Biologie Cellulaire, Université de Genève, Sciences III, 30 quai Ernest-Ansermet, 1211 Genève 4, Switzerland; and
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
3
|
Gill MB, Murphy JE, Fingeroth JD. Functional divergence of Kaposi's sarcoma-associated herpesvirus and related gamma-2 herpesvirus thymidine kinases: novel cytoplasmic phosphoproteins that alter cellular morphology and disrupt adhesion. J Virol 2006; 79:14647-59. [PMID: 16282465 PMCID: PMC1287549 DOI: 10.1128/jvi.79.23.14647-14659.2005] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The nucleoside kinase encoded by Kaposi's sarcoma-associated herpesvirus (KSHV) is a relatively inefficient enzyme with substrate specificity for thymidine alone, unlike alphaherpesvirus thymidine kinases (TKs). Similar to all gammaherpesvirus TKs, KSHV TK is composed of two distinct domains, a conserved C-terminal kinase and a novel and uncharacterized N terminus. Ectopic expression of KSHV TK in adherent cells induced striking morphological changes and anchorage independence although cells survived, a property shared with the related rhadinovirus TKs of rhesus monkey rhadinovirus and herpesvirus saimiri. To determine whether KSHV TK served alternate functions relevant to the rhadinovirus life cycle and to reveal the contribution of the N terminus, an enhanced green fluorescent protein-tagged fusion protein and serial mutants were generated for investigation of intracellular localization and cell biology. Analysis of truncation mutants showed that a proline-rich region located within the N terminus cooperated with the conserved C-terminal kinase to tether KSHV TK to a reticular network in the cytoplasm and to induce morphological change. Fusion of the KSHV N terminus to herpes simplex virus type 1 TK, a nucleus-localized enzyme, similarly resulted in cytoplasmic redistribution of the chimeric protein but did not alter cell shape or adhesion. Unlike other human herpesvirus TKs, KSHV TKs and related rhadinovirus TKs are constitutively tyrosine phosphorylated; a KSHV TK mutant that was hypophosphorylated failed to detach and grow in suspension. Loss of adhesion may enhance terminal differentiation, viral replication, and egress at the cellular level and at the organism level may facilitate detachment and distant migration of KSHV-replicating cells within body fluids--promoting oropharyngeal transmission and perhaps contributing to the multifocal lesions that characterize KS.
Collapse
Affiliation(s)
- Michael B Gill
- Division of Infectious Disease, Beth Israel Deaconess Medical Center, Boston, MA 02115, USA
| | | | | |
Collapse
|
4
|
Chung R, Yona G. Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics 2004; 5:183. [PMID: 15563734 PMCID: PMC544344 DOI: 10.1186/1471-2105-5-183] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2004] [Accepted: 11/25/2004] [Indexed: 11/17/2022] Open
Abstract
Background This paper presents a simple method to increase the sensitivity of protein family comparisons by incorporating secondary structure (SS) information. We build upon the effective information theory approach towards profile-profile comparison described in [Yona & Levitt 2002]. Our method augments profile columns using PSIPRED secondary structure predictions and assesses statistical similarity using information theoretical principles. Results Our tests show that this tool detects more similarities between protein families of distant homology than the previous primary sequence-based method. A very significant improvement in performance is observed when the real secondary structure is used. Conclusions Integration of primary and secondary structure information can substantially improve detection of relationships between remotely related protein families.
Collapse
Affiliation(s)
- Richard Chung
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Golan Yona
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| |
Collapse
|
5
|
Binkowski TA, Adamian L, Liang J. Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol 2003; 332:505-26. [PMID: 12948498 DOI: 10.1016/s0022-2836(03)00882-9] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7052, USA
| | | | | |
Collapse
|
6
|
Abstract
We have constructed, in a completely automated fashion, a new structure template library for threading that represents 358 distinct SCOP folds where each model is mathematically represented as a Hidden Markov model (HMM). Because the large number of models in the library can potentially dilute the prediction measure, a new triage method for fold prediction is employed. In the first step of the triage method, the most probable structural class is predicted using a set of manually constructed, high-level, generalized structural HMMs that represent seven general protein structural classes: all-alpha, all-beta, alpha/beta, alpha+beta, irregular small metal-binding, transmembrane beta-barrel, and transmembrane alpha-helical. In the second step, only those fold models belonging to the determined structural class are selected for the final fold prediction. This triage method gave more predictions as well as more correct predictions compared with a simple prediction method that lacks the initial classification step. Two different schemes of assigning Bayesian model priors are presented and discussed.
Collapse
Affiliation(s)
- Hongxian He
- BioMolecular Engineering Research Center, Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
| | | | | |
Collapse
|
7
|
Abstract
Various sequence-motif and sequence-cluster databases have been integrated into a new resource known as InterPro. Because the contributing databases have different clustering principles and scoring sensitivities, the combined assignments complement each other for grouping protein families and delineating domains. InterPro and new developments in the analysis of both the phylogenetic profiles of protein families and domain fusion events improve the prediction of specific functions for numerous proteins.
Collapse
Affiliation(s)
- E V Kriventseva
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridgeshire CB10 1SD, Hinxton, UK
| | | | | |
Collapse
|
8
|
|
9
|
Abstract
It is now possible to identify over 30 functional subfamilies among the WD-repeat-containing proteins found in the completed genomes. The majority of these subfamilies have at least one member for which experimental data allow assignment to a cellular pathway or process. Half of the 63 WD-repeat-containing proteins in Saccharomyces cerevisiae, half of the 70 in Caenorhabditis elegans, and a third of the 100 plus predicted in Drosophila can be assigned to 23 of these functional subfamilies. Perhaps indicative of the future, 33 WD-repeat-containing proteins from the partial genome of Arabidopsis thaliana can now be assigned to 18 of these subfamilies. These assignments have been made possible by combining traditional sequence similarity with an implied common beta propeller structural context to obtain measures of protein-protein surface similarity. The beta propeller structural context is represented in the form of a Hidden Markov Model. The procedure is completely automated.
Collapse
Affiliation(s)
- L Yu
- BioMolecular Engineering Research Center, College of Engineering, Boston University, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
10
|
Abstract
We present a protein fold-recognition method that uses a comprehensive statistical interpretation of structural Hidden Markov Models (HMMs). The structure/fold recognition is done by summing the probabilities of all sequence-to-structure alignments. The optimal alignment can be defined as the most probable, but suboptimal alignments may have comparable probabilities. These suboptimal alignments can be interpreted as optimal alignments to the "other" structures from the ensemble or optimal alignments under minor fluctuations in the scoring function. Summing probabilities for all alignments gives a complete estimate of sequence-model compatibility. In the case of HMMs that produce a sequence, this reflects the fact that due to our indifference to exactly how the HMM produced the sequence, we should sum over all possibilities. We have built a set of structural HMMs for 188 protein structures and have compared two methods for identifying the structure compatible with a sequence: by the optimal alignment probability and by the total probability. Fold recognition by total probability was 40% more accurate than fold recognition by the optimal alignment probability. Proteins 2000;40:451-462.
Collapse
Affiliation(s)
- J R Bienkowska
- BioMolecular Engineering Research Center, College of Engineering, Boston University, Boston, Massachusetts 02215, USA.
| | | | | | | | | |
Collapse
|
11
|
Cosenza L, Rosenbach A, White JV, Murphy JR, Smith T. Comparative model building of interleukin-7 using interleukin-4 as a template: a structural hypothesis that displays atypical surface chemistry in helix D important for receptor activation. Protein Sci 2000; 9:916-26. [PMID: 10850801 PMCID: PMC2144647 DOI: 10.1110/ps.9.5.916] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Using a combination of theoretical sequence structure recognition predictions and experimental disulfide bond assignments, a three-dimensional (3D) model of human interleukin-7 (hIL-7) was constructed that predicts atypical surface chemistry in helix D that is important for receptor activation. A 3D model of hIL-7 was built using the X-ray crystal structure of interleukin-4 (IL-4) as a template (Walter MR et al., 1992, J Mol Biol. 224:1075-1085; Walter MR et al., 1992, J Biol Chem 267:20371-20376). Core secondary structures were constructed from sequences of hIL-7 predicted to form helices. The model was constructed by superimposing IL-7 helices onto the IL-4 template and connecting them together in an up-up down-down topology. The model was finished by incorporating the disulfide bond assignments (Cys3, Cys142), (Cys35, Cys130), and (Cys48, Cys93), which were determined by MALDI mass spectroscopy and site-directed mutagenesis (Cosenza L, Sweeney E, Murphy JR, 1997, J Biol Chem 272:32995-33000). Quality analysis of the hIL-7 model identified poor structural features in the carboxyl terminus that, when further studied using hydrophobic moment analysis, detected an atypical structural property in helix D, which contains Cys 130 and Cys142. This analysis demonstrated that helix D had a hydrophobic surface exposed to bulk solvent that accounted for the poor quality of the model, but was suggestive of a region in IL-7 that maybe important for protein interactions. Alanine (Ala) substitution scanning mutagenesis was performed to test if the predicted atypical surface chemistry of helix D in the hIL-7 model is important for receptor activation. This analysis resulted in the construction, purification, and characterization of four hIL-7 variants, hIL-7(K121A), hIL-7(L136A), hIL-7(K140A), and hIL-7(W143A), that displayed reduced or abrogated ability to stimulate a murine IL-7 dependent pre-B cell proliferation. The mutant hIL-7(W143A), which is biologically inactive and displaces [125I]-hIL-7, is the first reported IL-7R system antagonist.
Collapse
Affiliation(s)
- L Cosenza
- Evans Department of Clinical Research, Boston University School of Medicine, Massachusetts 02118-2393, USA
| | | | | | | | | |
Collapse
|
12
|
Skolnick J, Fetrow JS, Kolinski A. Structural genomics and its importance for gene function analysis. Nat Biotechnol 2000; 18:283-7. [PMID: 10700142 DOI: 10.1038/73723] [Citation(s) in RCA: 161] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Structural genomics projects aim to solve the experimental structures of all possible protein folds. Such projects entail a conceptual shift from traditional structural biology in which structural information is obtained on known proteins to one in which the structure of a protein is determined first and the function assigned only later. Whereas the goal of converting protein structure into function can be accomplished by traditional sequence motif-based approaches, recent studies have shown that assignment of a protein's biochemical function can also be achieved by scanning its structure for a match to the geometry and chemical identity of a known active site. Importantly, this approach can use low-resolution structures provided by contemporary structure prediction methods. When applied to genomes, structural information (either experimental or predicted) is likely to play an important role in high-throughput function assignment.
Collapse
Affiliation(s)
- J Skolnick
- Laboratory of Computational Genomics, The Danforth Plant Science Center, 893 N, Warson Rd., St. Louis, MO 63141, USA.
| | | | | |
Collapse
|
13
|
Skolnick J, Fetrow JS. From genes to protein structure and function: novel applications of computational approaches in the genomic era. Trends Biotechnol 2000; 18:34-9. [PMID: 10631780 DOI: 10.1016/s0167-7799(99)01398-0] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
Collapse
Affiliation(s)
- J Skolnick
- Danforth Plant Science Center, Laboratory of Computational Genomics, St Louis, MO 63108, USA.
| | | |
Collapse
|
14
|
|
15
|
Roos DS, Crawford MJ, Donald RG, Kissinger JC, Klimczak LJ, Striepen B. Origin, targeting, and function of the apicomplexan plastid. Curr Opin Microbiol 1999; 2:426-32. [PMID: 10458993 DOI: 10.1016/s1369-5274(99)80075-7] [Citation(s) in RCA: 121] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The discovery of a plastid in Plasmodium, Toxoplasma and related protozoan parasites provides a satisfying resolution to several long-standing mysteries: the mechanism of action for various surprisingly effective antibiotics; the subcellular location of an enigmatic 35 kb episomal DNA; and the nature of an unusual intracellular structure containing multiple membranes. The apicomplexan plastid highlights the importance of lateral genetic transfer in evolution and provides an accessible system for the investigation of protein targeting to secondary endosymbiotic organelles. Combining molecular genetic identification of targeting signals with whole genome analysis promises to yield a complete picture of organellar metabolic pathways and new targets for drug design.
Collapse
Affiliation(s)
- D S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104-6018, USA.
| | | | | | | | | | | |
Collapse
|
16
|
Affiliation(s)
- T F Smith
- BioMolecular Engineering Research Center, College of Engineering, Boston University, 36 Cummington Street, Boston, MA 02215, USA.
| |
Collapse
|