1
|
Abstract
The limitation of most HMMs is their inherent high dimensionality. Therefore we developed several variations of low complexity models that can be applied even to protein families with a few members. In this chapter we present these variations. All of them include the use of a hidden Markov model (HMM), with a small number of states (called reduced state-space HMM), which is trained with both amino acid sequence and secondary structure of proteins whose 3D structure is known and it is used for protein fold classification. We used data from Protein Data Bank and annotation from SCOP database for training and evaluation of the proposed HMM variations for a number of protein folds that belong to major structural classes. Results indicate that the variations have similar performance, or even better in some cases, on classifying proteins than SAM, which is a widely used HMM-based method for protein classification. The major advantage of the proposed variations is that we employed a small number of states and the algorithms used for training and scoring are of low complexity and thus relatively fast. The main variations examined include a version of the reduced state-space HMM with seven states (7-HMM), a version of the reduced state-space HMM with three states (3-HMM) and an optimized version of the reduced state-space HMM with three states, where an optimization process is applied to its scores (optimized 3-HMM).
Collapse
Affiliation(s)
- Christos Lampros
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece
| | - Costas Papaloukas
- Department of Biological Applications and Technology, University of Ioannina, Ioannina, Greece
| | - Themis Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece.
| |
Collapse
|
2
|
Lampros C, Simos T, Exarchos TP, Exarchos KP, Papaloukas C, Fotiadis DI. Assessment of optimized Markov models in protein fold classification. J Bioinform Comput Biol 2014; 12:1450016. [PMID: 25152041 DOI: 10.1142/s0219720014500164] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein fold classification is a challenging task strongly associated with the determination of proteins' structure. In this work, we tested an optimization strategy on a Markov chain and a recently introduced Hidden Markov Model (HMM) with reduced state-space topology. The proteins with unknown structure were scored against both these models. Then the derived scores were optimized following a local optimization method. The Protein Data Bank (PDB) and the annotation of the Structural Classification of Proteins (SCOP) database were used for the evaluation of the proposed methodology. The results demonstrated that the fold classification accuracy of the optimized HMM was substantially higher compared to that of the Markov chain or the reduced state-space HMM approaches. The proposed methodology achieved an accuracy of 41.4% on fold classification, while Sequence Alignment and Modeling (SAM), which was used for comparison, reached an accuracy of 38%.
Collapse
Affiliation(s)
- Christos Lampros
- Department of Materials Science and Engineering, Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, GR 45110 Ioannina, Greece
| | | | | | | | | | | |
Collapse
|
3
|
Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 2009; 39:907-14. [DOI: 10.1016/j.compbiomed.2009.07.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2008] [Revised: 07/10/2009] [Accepted: 07/13/2009] [Indexed: 11/19/2022]
|
4
|
Exarchos TP, Papaloukas C, Lampros C, Fotiadis DI. Mining sequential patterns for protein fold recognition. J Biomed Inform 2007; 41:165-79. [PMID: 17573243 DOI: 10.1016/j.jbi.2007.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Revised: 04/06/2007] [Accepted: 05/05/2007] [Indexed: 10/23/2022]
Abstract
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.
Collapse
Affiliation(s)
- Themis P Exarchos
- Department of Medical Physics, Medical School, University of Ioannina, GR 45110 Ioannina, Greece
| | | | | | | |
Collapse
|
5
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
6
|
Constantine KL, Krystek SR, Healy MD, Doyle ML, Siemers NO, Thanassi J, Yan N, Xie D, Goldfarb V, Yanchunas J, Tao L, Dougherty BA, Farmer BT. Structural and functional characterization of CFE88: evidence that a conserved and essential bacterial protein is a methyltransferase. Protein Sci 2005; 14:1472-84. [PMID: 15929997 PMCID: PMC2253378 DOI: 10.1110/ps.051389605] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2005] [Revised: 03/11/2005] [Accepted: 03/12/2005] [Indexed: 10/25/2022]
Abstract
CFE88 is a conserved essential gene product from Streptococcus pneumoniae. This 227-residue protein has minimal sequence similarity to proteins of known 3D structure. Sequence alignment models and computational protein threading studies suggest that CFE88 is a methyltransferase. Characterization of the conformation and function of CFE88 has been performed by using several techniques. Backbone atom and limited side-chain atom NMR resonance assignments have been obtained. The data indicate that CFE88 has two domains: an N-terminal domain with 163 residues and a C-terminal domain with 64 residues. The C-terminal domain is primarily helical, while the N-terminal domain has a mixed helical/extended (Rossmann) fold. By aligning the experimentally observed elements of secondary structure, an initial unrefined model of CFE88 has been constructed based on the X-ray structure of ErmC' methyltransferase (Protein Data Bank entry 1QAN). NMR and biophysical studies demonstrate binding of S-adenosyl-L-homocysteine (SAH) to CFE88; these interactions have been localized by NMR to the predicted active site in the N-terminal domain. Mutants that target this predicted active site (H26W, E46R, and E46W) have been constructed and characterized. Overall, our results both indicate that CFE88 is a methyltransferase and further suggest that the methyltransferase activity is essential for bacterial survival.
Collapse
Affiliation(s)
- Keith L Constantine
- Bristol-Myers Squibb Pharmaceutical Research Institute, P.O. Box 4000, Princeton, NJ 08543.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Sippl MJ, Lackner P, Domingues FS, Prlić A, Malik R, Andreeva A, Wiederstein M. Assessment of the CASP4 fold recognition category. Proteins 2002; Suppl 5:55-67. [PMID: 11835482 DOI: 10.1002/prot.10006] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We present the assessment of the CASP4 fold recognition category. The tasks we had to execute include the splitting of multidomain targets into single domains, the classification of target domains in terms of prediction categories, the numerical evaluation of predictions, the mapping of numerical scores to quality indices, the ranking of predictors, the selection of top-performing groups, and the analysis and critical discussion of the state of the art in this field. The 125 fold recognition groups were assessed by a total score that summarizes their performance over all targets and a quality score reflecting the average quality of the submitted models. Most of the top-performing groups achieved respectable results on both scores simultaneously. Several groups submitted models that were much closer to the respective target structures than any of the known folds in the Protein Data Bank. The CASP4 assessment included the automated servers of the parallel CAFASP experiment. For the total score, the highest rank achieved by a fully automated server is 12. Two thirds of the predictors have rather low scores.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Salzburg, Austria.
| | | | | | | | | | | | | |
Collapse
|
8
|
Skelton NJ, Russell S, de Sauvage F, Cochran AG. Amino acid determinants of beta-hairpin conformation in erythropoeitin receptor agonist peptides derived from a phage display library. J Mol Biol 2002; 316:1111-25. [PMID: 11884148 DOI: 10.1006/jmbi.2002.5410] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Display of peptide libraries on filamentous phage has led to the identification of peptides of the form X(2-5)CX(2)GPXTWXCX(2-5) (where X is a variable residue) that bind to the extra-cellular portion of the erythropoietin receptor (EPO-R). These peptides adopt beta-hairpin conformations when co-crystallized with EPO-R. Solution NMR studies reveal that the peptide is conformationally heterogeneous in the absence of receptor due to cis-trans isomerization about the Gly-Pro peptide bond. Replacement of the conserved threonine residue with glycine at the turn i+3 position produces a stable beta-hairpin conformation in solution, although this peptide no longer has activity in an EPO-R-dependent cell proliferation assay. A truncated form of the EPO-R-binding peptide (containing the i+3 glycine residue) also forms a highly populated, monomeric beta-hairpin. In contrast, phage-derived peptide antagonists of insulin-like growth factor binding protein 1 (IGFBP-1) have a high level of sequence identity with the truncated EPO-R peptide (eight of 12 residues) yet adopt a turn-alpha-helix conformation in solution. Peptides containing all possible pairwise amino acid substitutions between the EPO-R and IGFBP-1 peptides have been analyzed to assess the degree to which the non-conserved residues stabilize the hairpin or helix conformation. All four residues present in the original sequence are required for maximum population of either the beta-hairpin or alpha-helix conformation, although some substitutions have a more dominant effect. The results demonstrate that, within a given sequence, the observed conformation can be dictated by a small subset of the residues (in this case four out of 12).
Collapse
Affiliation(s)
- Nicholas J Skelton
- Department of Protein Engineering, Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA.
| | | | | | | |
Collapse
|
9
|
Yan M, Zhang Z, Brady JR, Schilbach S, Fairbrother WJ, Dixit VM. Identification of a novel death domain-containing adaptor molecule for ectodysplasin-A receptor that is mutated in crinkled mice. Curr Biol 2002; 12:409-13. [PMID: 11882293 DOI: 10.1016/s0960-9822(02)00687-5] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Hypohydrotic Ectodermal Dysplasia (HED) is a genetic disease seen in humans and mice. It is characterized by loss of hair, sweat glands, and teeth. The predominant X-linked form results from mutations in ectodysplasin-A (EDA), a TNF-like ligand. A phenotypically indistinguishable autosomal form of the disease results from mutations in the receptor for EDA (EDAR). EDAR is a NF-kappaB-activating, death domain-containing member of the TNF receptor family. crinkled, a distinct autosomal form of HED, was discovered in a mouse strain in which both the ligand (EDA) and receptor (EDAR) were wild-type, suggestive of a disruption further downstream in the signaling pathway. Employing a forward genetic approach, we have cloned crinkled (CR) and find it to encode a novel death domain-containing adaptor. crinkled binds EDAR through a homotypic death domain interaction and mediates engagement of the NF-kappaB pathway, possibly by recruiting TRAF2 to the receptor-signaling complex. This is an unprecedented example of naturally occurring mutations in ligand, receptor, or adaptor giving rise to the same phenotypic disease characterized by a defect in the proper development of epidermal appendages.
Collapse
Affiliation(s)
- Minhong Yan
- Department of Molecular Oncology, Genentech Inc., South San Francisco, CA 94080, USA
| | | | | | | | | | | |
Collapse
|
10
|
Fairbrother WJ, Gordon NC, Humke EW, O'Rourke KM, Starovasnik MA, Yin JP, Dixit VM. The PYRIN domain: a member of the death domain-fold superfamily. Protein Sci 2001; 10:1911-8. [PMID: 11514682 PMCID: PMC2253208 DOI: 10.1110/ps.13801] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
PYRIN domains were identified recently as putative protein-protein interaction domains at the N-termini of several proteins thought to function in apoptotic and inflammatory signaling pathways. The approximately 95 residue PYRIN domains have no statistically significant sequence homology to proteins with known three-dimensional structure. Using secondary structure prediction and potential-based fold recognition methods, however, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of protein-protein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. An homology model of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1, a member of the Apaf-1/Ced-4 family of proteins, was constructed using the three-dimensional structures of the FADD and p75 neurotrophin receptor DDs, and of the Apaf-1 and caspase-9 CARDs, as templates. Validation of the model using a variety of computational techniques indicates that the fold prediction is consistent with the sequence. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction.
Collapse
Affiliation(s)
- W J Fairbrother
- Department of Protein Engineering, Genentech, Inc., South San Francisco, California 94080, USA.
| | | | | | | | | | | | | |
Collapse
|
11
|
Mann PA, Xiong L, Mankin AS, Chau AS, Mendrick CA, Najarian DJ, Cramer CA, Loebenberg D, Coates E, Murgolo NJ, Aarestrup FM, Goering RV, Black TA, Hare RS, McNicholas PM. EmtA, a rRNA methyltransferase conferring high-level evernimicin resistance. Mol Microbiol 2001; 41:1349-56. [PMID: 11580839 DOI: 10.1046/j.1365-2958.2001.02602.x] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Enterococcus faecium strain 9631355 was isolated from animal sources on the basis of its resistance to the growth promotant avilamycin. The strain also exhibited high-level resistance to evernimicin, a drug undergoing evaluation as a therapeutic agent in humans. Ribosomes from strain 9631355 exhibited a dramatic reduction in evernimicin binding, shown by both cell-free translation assays and direct-binding assays. The resistance determinant was cloned from strain 9631355; sequence alignments suggested it was a methyltransferase and therefore it was designated emtA for evernimicin methyltransferase. Evernimicin resistance was transmissible and emtA was localized to a plasmid-borne insertion element. Purified EmtA methylated 50S subunits from an evernimicin-sensitive strain 30-fold more efficiently than those from a resistant strain. Reverse transcription identified a pause site that was unique to the 23S rRNA extracted from resistant ribosomes. The pause corresponded to methylation of residue G2470 (Escherichia coli numbering). RNA footprinting revealed that G2470 is located within the evernimicin-binding site on the ribosome, thus providing an explanation for the reduced binding of the drug to methylated ribosomes.
Collapse
MESH Headings
- Aminoglycosides
- Animals
- Anti-Bacterial Agents/metabolism
- Anti-Bacterial Agents/pharmacology
- Base Sequence
- Binding Sites
- Cloning, Molecular
- DNA Transposable Elements/genetics
- DNA, Bacterial/genetics
- Drug Resistance, Bacterial/genetics
- Drug Resistance, Bacterial/physiology
- Enterococcus faecium/drug effects
- Enterococcus faecium/enzymology
- Enterococcus faecium/genetics
- Genes, Bacterial
- Humans
- Methyltransferases/genetics
- Methyltransferases/metabolism
- Molecular Sequence Data
- Nucleic Acid Conformation
- Plasmids/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Ribosomal/metabolism
- Ribosomes/metabolism
Collapse
Affiliation(s)
- P A Mann
- Schering Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Novotny J, Rigoutsos I, Coleman D, Shenk T. In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome. J Mol Biol 2001; 310:1151-66. [PMID: 11502002 DOI: 10.1006/jmbi.2001.4798] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-layer-alphabetaalpha, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 3beta1alpha fold with one or two disulfide bridges, present in otherwise unrelated proteins.
Collapse
Affiliation(s)
- J Novotny
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia.
| | | | | | | |
Collapse
|
13
|
Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson LI, Fetrow JS. Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 2001; 134:232-45. [PMID: 11551182 DOI: 10.1006/jsbi.2001.4391] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In order to circumvent limitations of sequence based methods in the process of making functional predictions for proteins, we have developed a methodology that uses a sequence-to-structure-to-function paradigm. First, an approximate three-dimensional structure is predicted. Then, a three-dimensional descriptor of the functional site, termed a Fuzzy Functional Form, or FFF, is used to screen the structure for the presence of the functional site of interest (Fetrow et al., 1998; Fetrow and Skolnick, 1998). Previously, a disulfide oxidoreductase FFF was developed and applied to predicted structures obtained from a small structural database. Here, using a substantially larger structural database, we expand the analysis of the disulfide oxidoreductase FFF to the B. subtilis genome. To ascertain the performance of the FFF, its results are compared to those obtained using both the sequence alignment method BLAST and three local sequence motif databases: PRINTS, Prosite, and Blocks. The FFF method is then compared in detail to Blocks and it is shown that the FFF is more flexible and sensitive in finding a specific function in a set of unknown proteins. In addition, the estimated false positive rate of function prediction is significantly lower using the FFF structural motif, rather than the standard sequence motif methods. We also present a second FFF and describe a specific example of the results of its whole-genome application to D. melanogaster using a newer threading algorithm. Our results from all of these studies indicate that the addition of three-dimensional structural information adds significant value in the prediction of biochemical function of genomic sequences.
Collapse
Affiliation(s)
- J A Di Gennaro
- GeneFormatics, Incorporated, 5830 Oberlin Drive, Suite 200, San Diego, California 92121, USA.
| | | | | | | | | | | | | |
Collapse
|
14
|
Prlić A, Domingues FS, Sippl MJ. Structure-derived substitution matrices for alignment of distantly related sequences. PROTEIN ENGINEERING 2000; 13:545-50. [PMID: 10964983 DOI: 10.1093/protein/13.8.545] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.
Collapse
Affiliation(s)
- A Prlić
- Center of Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob-Haringerstrasse 3, A-5020 Salzburg, Austria
| | | | | |
Collapse
|
15
|
Holcomb IN, Kabakoff RC, Chan B, Baker TW, Gurney A, Henzel W, Nelson C, Lowman HB, Wright BD, Skelton NJ, Frantz GD, Tumas DB, Peale FV, Shelton DL, Hébert CC. FIZZ1, a novel cysteine-rich secreted protein associated with pulmonary inflammation, defines a new gene family. EMBO J 2000; 19:4046-55. [PMID: 10921885 PMCID: PMC306596 DOI: 10.1093/emboj/19.15.4046] [Citation(s) in RCA: 475] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Bronchoalveolar lavage fluid from mice with experimentally induced allergic pulmonary inflammation contains a novel 9.4 kDa cysteine-rich secreted protein, FIZZ1 (found in inflammatory zone). Murine (m) FIZZ1 is the founding member of a new gene family including two other murine genes expressed, respectively, in intestinal crypt epithelium and white adipose tissue, and two related human genes. In control mice, FIZZ1 mRNA and protein expression occur at low levels in a subset of bronchial epithelial cells and in non-neuronal cells adjacent to neurovascular bundles in the peribronchial stroma, and in the wall of the large and small bowel. During allergic pulmonary inflammation, mFIZZ1 expression markedly increases in hypertrophic, hyperplastic bronchial epithelium and appears in type II alveolar pneumocytes. In vitro, recombinant mFIZZ1 inhibits the nerve growth factor (NGF)-mediated survival of rat embryonic day 14 dorsal root ganglion (DRG) neurons and NGF-induced CGRP gene expression in adult rat DRG neurons. In vivo, FIZZ1 may modulate the function of neurons innervating the bronchial tree, thereby altering the local tissue response to allergic pulmonary inflammation.
Collapse
Affiliation(s)
- I N Holcomb
- Department of Pathology, Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Jung J, Lee B. Use of residue pairs in protein sequence-sequence and sequence-structure alignments. Protein Sci 2000; 9:1576-88. [PMID: 10975579 PMCID: PMC2144723 DOI: 10.1110/ps.9.8.1576] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.
Collapse
Affiliation(s)
- J Jung
- Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | |
Collapse
|
17
|
Abstract
Proteins might have considerable structural similarities even when no evolutionary relationship of their sequences can be detected. This property is often referred to as the proteins sharing only a "fold". Of course, there are also sequences of common origin in each fold, called a "superfamily", and in them groups of sequences with clear similarities, designated "family". Developing algorithms to reliably identify proteins related at any level is one of the most important challenges in the fast growing field of bioinformatics today. However, it is not at all certain that a method proficient at finding sequence similarities performs well at the other levels, or vice versa.Here, we have compared the performance of various search methods on these different levels of similarity. As expected, we show that it becomes much harder to detect proteins as their sequences diverge. For family related sequences the best method gets 75% of the top hits correct. When the sequences differ but the proteins belong to the same superfamily this drops to 29%, and in the case of proteins with only fold similarity it is as low as 15%. We have made a more complete analysis of the performance of different algorithms than earlier studies, also including threading methods in the comparison. Using this method a more detailed picture emerges, showing multiple sequence information to improve detection on the two closer levels of relationship. We have also compared the different methods of including this information in prediction algorithms. For lower specificities, the best scheme to use is a linking method connecting proteins through an intermediate hit. For higher specificities, better performance is obtained by PSI-BLAST and some procedures using hidden Markov models. We also show that a threading method, THREADER, performs significantly better than any other method at fold recognition.
Collapse
Affiliation(s)
- E Lindahl
- Royal Institute of Technology, Stockholm, SE-100 44, Sweden
| | | |
Collapse
|
18
|
|
19
|
Domingues FS, Koppensteiner WA, Jaritz M, Prlic A, Weichenberger C, Wiederstein M, Floeckner H, Lackner P, Sippl MJ. Sustained performance of knowledge-based potentials in fold recognition. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(1999)37:3+<112::aid-prot15>3.0.co;2-r] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
20
|
Mirny LA, Shakhnovich EI. Protein structure prediction by threading. Why it works and why it does not. J Mol Biol 1998; 283:507-26. [PMID: 9769221 DOI: 10.1006/jmbi.1998.2092] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We developed a novel Monte Carlo threading algorithm which allows gaps and insertions both in the template structure and threaded sequence. The algorithm is able to find the optimal sequence-structure alignment and sample suboptimal alignments. Using our algorithm we performed sequence-structure alignments for a number of examples for three protein folds (ubiquitin, immunoglobulin and globin) using both "ideal" set of potentials (optimized to provide the best Z-score for a given protein) and more realistic knowledge-based potentials. Two physically different scenarios emerged. If a template structure is similar to the native one (within 2 A RMS), then (i) the optimal threading alignment is correct and robust with respect to deviations of the potential from the "ideal" one; (ii) suboptimal alignments are very similar to the optimal one; (iii) as Monte Carlo temperature decreases a sharp cooperative transition to the optimal alignment is observed. In contrast, if the template structure is only moderately close to the native structure (RMS greater than 3.5 A), then (i) the optimal alignment changes dramatically when an "ideal" potential is substituted by the real one; (ii) the structures of suboptimal alignments are very different from the optimal one, reducing the reliability of the alignment; (iii) the transition to the apparently optimal alignment is non-cooperative. In the intermediate cases when the RMS between the template and the native conformations is in the range between 2 A and 3.5 A, the success of threading alignment may depend on the quality of potentials used. These results are rationalized in terms of a threading free energy landscape. Possible ways to overcome the fundamental limitations of threading are discussed briefly.
Collapse
Affiliation(s)
- L A Mirny
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | | |
Collapse
|