Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mehta PK, Heringa J, Argos P. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%. Protein Sci 1995;4:2517-25. [PMID: 8580842 PMCID: PMC2143048 DOI: 10.1002/pro.5560041208] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

For:	Mehta PK, Heringa J, Argos P. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%. Protein Sci 1995;4:2517-25. [PMID: 8580842 PMCID: PMC2143048 DOI: 10.1002/pro.5560041208] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Number

Cited by Other Article(s)

Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023;25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open

Nacar C. Propensities of Some Amino Acid Pairings in α-Helices Vary with Length. Protein J 2022;41:551-562. [PMID: 36169766 DOI: 10.1007/s10930-022-10076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2022] [Indexed: 11/29/2022]

Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019;20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 223] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here.

RESULTS

We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis.

CONCLUSION

Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.

Collapse

Affiliation(s)

Michael Heinzinger Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Ahmed Elnaggar Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Yu Wang Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Christian Dallago Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Florian Matthes TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Burkhard Rost Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA

Collapse

Depth dependent amino acid substitution matrices and their use in predicting deleterious mutations. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017;128:14-23. [DOI: 10.1016/j.pbiomolbio.2017.02.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 01/06/2017] [Accepted: 02/07/2017] [Indexed: 12/31/2022]

Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013;1834:1520-31. [PMID: 23665455 DOI: 10.1016/j.bbapap.2013.04.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/12/2013] [Accepted: 04/15/2013] [Indexed: 12/15/2022]

Glembo TJ, Farrell DW, Gerek ZN, Thorpe MF, Ozkan SB. Collective dynamics differentiates functional divergence in protein evolution. PLoS Comput Biol 2012;8:e1002428. [PMID: 22479170 PMCID: PMC3315450 DOI: 10.1371/journal.pcbi.1002428] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 01/30/2012] [Indexed: 12/29/2022] Open

Abstract

Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function.

Proteins are remarkable machines of the living systems that show diverse biochemical functions. Biochemical diversity has grown over time via molecular evolution. In order to understand how diversity arose, it is fundamental to understand how the earliest proteins evolved and served as templates for the present diverse proteome. The one sequence - one structure - one function paradigm is being extended to a new view: an ensemble of different conformations in equilibrium can evolve new function and the analysis of inherent structural dynamics is crucial to give a more complete understanding of protein evolution. Therefore, we aim to bring structural dynamics into protein evolution through our zipping and assembly method with FRODA. (ZAMF). We apply ZAMF to simultaneously obtain structures and structural dynamics of three ancestral sequences of steroid receptor proteins. By comparative dynamics analysis among the three ancestral steroid hormone receptors: (i) we show that changes in the structural dynamics indicates functional divergence and (ii) we identify all functionally critical and most of the permissive mutations necessary to evolve new function. Overall, all these findings suggest that conformational dynamics may play an important role where new functions evolve through novel molecular interactions.

Collapse

Li D, Li T, Cong P, Xiong W, Sun J. A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics 2011;28:32-9. [DOI: 10.1093/bioinformatics/btr611] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Schröder A, Eichner J, Supper J, Eichner J, Wanke D, Henneges C, Zell A. Predicting DNA-binding specificities of eukaryotic transcription factors. PLoS One 2010;5:e13876. [PMID: 21152420 PMCID: PMC2994704 DOI: 10.1371/journal.pone.0013876] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Accepted: 10/14/2010] [Indexed: 11/18/2022] Open

Probing protein fold space with a simplified model. J Mol Biol 2007;375:920-33. [PMID: 18054792 DOI: 10.1016/j.jmb.2007.10.087] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 10/15/2007] [Accepted: 10/31/2007] [Indexed: 11/24/2022]

Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC. An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol 2005;235:555-65. [PMID: 15935173 DOI: 10.1016/j.jtbi.2005.02.008] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2004] [Revised: 12/13/2004] [Accepted: 02/09/2005] [Indexed: 11/26/2022]

Huang JT, Wang MT. Secondary structural wobble: the limits of protein prediction accuracy. Biochem Biophys Res Commun 2002;294:621-5. [PMID: 12056813 DOI: 10.1016/s0006-291x(02)00545-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Przybylski D, Rost B. Alignments grow, secondary structure prediction improves. Proteins 2002;46:197-205. [PMID: 11807948 DOI: 10.1002/prot.10029] [Citation(s) in RCA: 142] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001;310:243-57. [PMID: 11419950 DOI: 10.1006/jmbi.2001.4762] [Citation(s) in RCA: 922] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 2001;270:17-30. [PMID: 11403999 DOI: 10.1016/s0378-1119(01)00461-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jennings AJ, Edge CM, Sternberg MJ. An approach to improving multiple alignments of protein sequences using predicted secondary structure. PROTEIN ENGINEERING 2001;14:227-31. [PMID: 11391014 DOI: 10.1093/protein/14.4.227] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000;40:502-11. [PMID: 10861942 DOI: 10.1002/1097-0134(20000815)40:3<502::aid-prot170>3.0.co;2-q] [Citation(s) in RCA: 606] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Henikoff S, Henikoff JG. Amino acid substitution matrices. ADVANCES IN PROTEIN CHEMISTRY 2000;54:73-97. [PMID: 10829225 DOI: 10.1016/s0065-3233(00)54003-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Zhang CT, Zhang R. A graphic approach to evaluate algorithms of secondary structure prediction. J Biomol Struct Dyn 2000;17:829-42. [PMID: 10798528 DOI: 10.1080/07391102.2000.10506572] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Mugilan SA, Veluraja K. Generation of deviation parameters for amino acid singlets, doublets and triplets from three-dimentional structures of proteins and its implications for secondary structure prediction from amino acid sequences. J Biosci 2000;25:81-91. [PMID: 10824202 DOI: 10.1007/bf02985185] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Gotoh O. Multiple sequence alignment: algorithms and applications. ADVANCES IN BIOPHYSICS 1999;36:159-206. [PMID: 10463075 DOI: 10.1016/s0065-227x(99)80007-0] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Heringa J. Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. COMPUTERS & CHEMISTRY 1999;23:341-64. [PMID: 10404624 DOI: 10.1016/s0097-8485(99)00012-1] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Jermutus L, Guez V, Bedouelle H. Disordered C-terminal domain of tyrosyl-tRNA synthetase: secondary structure prediction. Biochimie 1999;81:235-44. [PMID: 10385005 DOI: 10.1016/s0300-9084(99)80057-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Liu Z, Song D, Kramer A, Martin AC, Dandekar T, Schneider-Mergener J, Bautz EK, Dübel S. Fine mapping of the antigen-antibody interaction of scFv215, a recombinant antibody inhibiting RNA polymerase II from Drosophila melanogaster. J Mol Recognit 1999;12:103-11. [PMID: 10398401 DOI: 10.1002/(sici)1099-1352(199903/04)12:2<103::aid-jmr447>3.0.co;2-b] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Baxevanis AD, Landsman D. Predictive methods using protein sequences. METHODS OF BIOCHEMICAL ANALYSIS 1998;39:246-67. [PMID: 9707934 DOI: 10.1002/9780470110607.ch11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Padilla-Zúñiga AJ, Rojo-Domínguez A. Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms. FOLDING & DESIGN 1998;3:271-84. [PMID: 9710573 DOI: 10.1016/s1359-0278(98)00038-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Pascarella S, De Persio R, Bossa F, Argos P. Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins 1998. [DOI: 10.1002/(sici)1097-0134(19980801)32:2<190::aid-prot5>3.0.co;2-p] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Kozmin SG, Schaaper RM, Shcherbakova PV, Kulikov VN, Noskov VN, Guetsova ML, Alenin VV, Rogozin IB, Makarova KS, Pavlov YI. Multiple antimutagenesis mechanisms affect mutagenic activity and specificity of the base analog 6-N-hydroxylaminopurine in bacteria and yeast. Mutat Res 1998;402:41-50. [PMID: 9675240 DOI: 10.1016/s0027-5107(97)00280-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Macheroux P, Hill S, Austin S, Eydmann T, Jones T, Kim SO, Poole R, Dixon R. Electron donation to the flavoprotein NifL, a redox-sensing transcriptional regulator. Biochem J 1998;332 ( Pt 2):413-9. [PMID: 9601070 PMCID: PMC1219496 DOI: 10.1042/bj3320413] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Li-Chan EC. Methods to monitor process-induced changes in food proteins. An overview. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 1998;434:5-23. [PMID: 9598186 DOI: 10.1007/978-1-4899-1925-0_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Peelman F, Vinaimont N, Verhee A, Vanloo B, Verschelde JL, Labeur C, Seguret-Mace S, Duverger N, Hutchinson G, Vandekerckhove J, Tavernier J, Rosseneu M. A proposed architecture for lecithin cholesterol acyl transferase (LCAT): identification of the catalytic triad and molecular modeling. Protein Sci 1998;7:587-99. [PMID: 9541390 PMCID: PMC2143955 DOI: 10.1002/pro.5560070307] [Citation(s) in RCA: 81] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Perozich J, Hempel J, Morris SM. Roles of conserved residues in the arginase family. BIOCHIMICA ET BIOPHYSICA ACTA 1998;1382:23-37. [PMID: 9507056 DOI: 10.1016/s0167-4838(97)00131-3] [Citation(s) in RCA: 72] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Benner SA, Cannarozzi G, Gerloff D, Turcotte M, Chelvanayagam G. Bona Fide Predictions of Protein Secondary Structure Using Transparent Analyses of Multiple Sequence Alignments. Chem Rev 1997;97:2725-2844. [PMID: 11851479 DOI: 10.1021/cr940469a] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Malik HS, Eickbush TH, Goldfarb DS. Evolutionary specialization of the nuclear targeting apparatus. Proc Natl Acad Sci U S A 1997;94:13738-42. [PMID: 9391096 PMCID: PMC28376 DOI: 10.1073/pnas.94.25.13738] [Citation(s) in RCA: 94] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/1997] [Accepted: 10/02/1997] [Indexed: 02/05/2023] Open

Seidel G, Adermann K, Schindler T, Ejchart A, Jaenicke R, Forssmann WG, Rösch P. Solution structure of porcine delta sleep-inducing peptide immunoreactive peptide A homolog of the shortsighted gene product. J Biol Chem 1997;272:30918-27. [PMID: 9388238 DOI: 10.1074/jbc.272.49.30918] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Abstract

The 77-residue delta sleep-inducing peptide immunoreactive peptide (DIP) is a close homolog of the Drosophila melanogaster shortsighted gene product. Porcine DIP (pDIP) and a peptide containing a leucine zipper-related partial sequence of pDIP, pDIP(9-46), was synthesized and studied by circular dichroism and nuclear magnetic resonance spectroscopy in combination with molecular dynamics calculations. Ultracentrifugation, size exclusion chromatography, and model calculations indicated that pDIP forms a dimer. This was confirmed by the observation of concentration-dependent thermal folding-unfolding transitions. From CD spectroscopy and thermal folding-unfolding transitions of pDIP(9-46), it was concluded that the dimerization of pDIP is a result of interaction between helical structures localized in the leucine zipper motif. The three-dimensional structure of the protein was determined with a modified simulated annealing protocol using experimental data derived from nuclear magnetic resonance spectra and a modeling approach based on an established strategy for coiled coil structures. The left-handed super helical structure of the leucine zipper type sequence resulting from the modeling approach is in agreement with known leucine zipper structures. In addition to the hydrophobic interactions between the amino acids at the heptade positions a and d, the structure of pDIP is stabilized by the formation of interhelical i to i' + 5 salt bridges. This result was confirmed by the pH dependence of the thermal-folding transitions. In addition to the amphipatic helix of the leucine zipper, a second helix is formed in the NH2-terminal part of pDIP. This helix exhibits more 310-helix character and is less stable than the leucine zipper helix. For the COOH-terminal region of pDIP no elements of regular secondary structure were observed.

Collapse

Thompson MJ, Goldstein RA. Predicting protein secondary structure with probabilistic schemata of evolutionarily derived information. Protein Sci 1997;6:1963-75. [PMID: 9300496 PMCID: PMC2143796 DOI: 10.1002/pro.5560060917] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract

We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.

Collapse

Hipp WM, Pott AS, Thum-Schmitz N, Faath I, Dahl C, Trüper HG. Towards the phylogeny of APS reductases and sirohaem sulfite reductases in sulfate-reducing and sulfur-oxidizing prokaryotes. MICROBIOLOGY (READING, ENGLAND) 1997;143 ( Pt 9):2891-2902. [PMID: 9308173 DOI: 10.1099/00221287-143-9-2891] [Citation(s) in RCA: 123] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Rink R, Fennema M, Smids M, Dehmel U, Janssen DB. Primary structure and catalytic mechanism of the epoxide hydrolase from Agrobacterium radiobacter AD1. J Biol Chem 1997;272:14650-7. [PMID: 9169427 DOI: 10.1074/jbc.272.23.14650] [Citation(s) in RCA: 135] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Salamov AA, Solovyev VV. Protein secondary structure prediction using local alignments. J Mol Biol 1997;268:31-6. [PMID: 9149139 DOI: 10.1006/jmbi.1997.0958] [Citation(s) in RCA: 86] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Frishman D, Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins 1997;27:329-35. [PMID: 9094735 DOI: 10.1002/(sici)1097-0134(199703)27:3<329::aid-prot1>3.0.co;2-8] [Citation(s) in RCA: 283] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Osuna J, Soberón X, Morett E. A proposed architecture for the central domain of the bacterial enhancer-binding proteins based on secondary structure prediction and fold recognition. Protein Sci 1997;6:543-55. [PMID: 9070437 PMCID: PMC2143673 DOI: 10.1002/pro.5560060304] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Abstract

The expression of genes transcribed by the RNA polymerase with the alternative sigma factor sigma 54 (E sigma 54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and NifA, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (@ontiers of protein structure prediction," IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alpha/beta topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain. ATPase activity of the E sigma 54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitution that alter the function of the E sigma 54 activators, leaving intact the Central domain ATPase activity, are mapped on region proposed to play an equivalent role as the effector region of the GTPase superfamily.

Collapse

Pedersen JT, Moult J. Ab initio protein folding simulations with genetic algorithms: Simulations on the complete sequence of small proteins. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(1997)1+<179::aid-prot23>3.0.co;2-k] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Frishman D, Argos P. The future of protein secondary structure prediction accuracy. FOLDING & DESIGN 1997;2:159-62. [PMID: 9218953 DOI: 10.1016/s1359-0278(97)00022-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Boutin JA. Myristoylation. Cell Signal 1997;9:15-35. [PMID: 9067626 DOI: 10.1016/s0898-6568(96)00100-3] [Citation(s) in RCA: 329] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Chebrou H, Bigey F, Arnaud A, Galzy P. Study of the amidase signature group. BIOCHIMICA ET BIOPHYSICA ACTA 1996;1298:285-93. [PMID: 8980653 DOI: 10.1016/s0167-4838(96)00145-8] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

King RD, Sternberg MJ. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 1996;5:2298-310. [PMID: 8931148 PMCID: PMC2143286 DOI: 10.1002/pro.5560051116] [Citation(s) in RCA: 338] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Abstract

A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.

Collapse

Henikoff S. Scores for sequence searches and alignments. Curr Opin Struct Biol 1996;6:353-60. [PMID: 8804821 DOI: 10.1016/s0959-440x(96)80055-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Beissinger M, Paulus C, Bayer P, Wolf H, Rösch P, Wagner R. Sequence-specific resonance assignments of the 1H-NMR spectra and structural characterization in solution of the HIV-1 transframe protein p6. EUROPEAN JOURNAL OF BIOCHEMISTRY 1996;237:383-92. [PMID: 8647076 DOI: 10.1111/j.1432-1033.1996.0383k.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Bazan JF. Helical fold prediction for the cyclin box. Proteins 1996;24:1-17. [PMID: 8628726 DOI: 10.1002/(sici)1097-0134(199601)24:1<1::aid-prot1>3.0.co;2-o] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]