1
|
Ho CT, Huang YW, Chen TR, Lo CH, Lo WC. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:1627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
Affiliation(s)
- Chia-Tzu Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Chia-Hua Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
2
|
Lima I, Cino EA. Sequence similarity in 3D for comparison of protein families. J Mol Graph Model 2021; 106:107906. [PMID: 33848948 DOI: 10.1016/j.jmgm.2021.107906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 11/26/2022]
Abstract
Homologous proteins are often compared by pairwise sequence alignment, and structure superposition if the atomic coordinates are available. Unification of sequence and structure data is an important task in structural biology. Here, we present the Sequence Similarity 3D (SS3D) method of integrating sequence and structure information. SS3D is a distance and substitution matrix-based method for straightforward visualization of regions of similarity and difference between homologous proteins. This work details the SS3D approach, and demonstrates its utility through case studies comparing members of several protein families. The examples show that SS3D can effectively highlight biologically important regions of similarity and dissimilarity. We anticipate that the method will be useful for numerous structural biology applications, including, but not limited to, studies of binding specificity, structure-function relationships, and evolutionary pathways. SS3D is available with a manual and tutorial at https://github.com/0x462e41/SS3D/.
Collapse
Affiliation(s)
- Igor Lima
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Elio A Cino
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| |
Collapse
|
3
|
Buchholz PCF, Ferrario V, Pohl M, Gardossi L, Pleiss J. Navigating within thiamine diphosphate-dependent decarboxylases: Sequences, structures, functional positions, and binding sites. Proteins 2019; 87:774-785. [PMID: 31070804 DOI: 10.1002/prot.25706] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 04/23/2019] [Accepted: 05/05/2019] [Indexed: 11/10/2022]
Abstract
Thiamine diphosphate-dependent decarboxylases catalyze both cleavage and formation of CC bonds in various reactions, which have been assigned to different homologous sequence families. This work compares 53 ThDP-dependent decarboxylases with known crystal structures. Both sequence and structural information were analyzed synergistically and data were analyzed for global and local properties by means of statistical approaches (principle component analysis and principal coordinate analysis) enabling complexity reduction. The different results obtained both locally and globally, that is, individual positions compared with the overall protein sequence or structure, revealed challenges in the assignment of separated homologous families. The methods applied herein support the comparison of enzyme families and the identification of functionally relevant positions. The findings for the family of ThDP-dependent decarboxylases underline that global sequence identity alone is not sufficient to distinguish enzyme function. Instead, local sequence similarity, defined by comparisons of structurally equivalent positions, allows for a better navigation within several groups of homologous enzymes. The differentiation between homologous sequences is further enhanced by taking structural information into account, such as BioGPS analysis of the active site properties or pairwise structural superimpositions. The methods applied herein are expected to be transferrable to other enzyme families, to facilitate family assignments for homologous protein sequences.
Collapse
Affiliation(s)
- Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Valerio Ferrario
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany.,Laboratory of Applied and Computational Biocatalysis, Department of Chemical and Pharmaceutical Sciences, Università degli Studi di Trieste, Trieste, Italy
| | - Martina Pohl
- Forschungszentrum Jülich GmbH, IBG-1: Biotechnology, Jülich, Germany
| | - Lucia Gardossi
- Laboratory of Applied and Computational Biocatalysis, Department of Chemical and Pharmaceutical Sciences, Università degli Studi di Trieste, Trieste, Italy
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| |
Collapse
|
4
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Barrows D, Schoenfeld SM, Hodakoski C, Silkov A, Honig B, Couvillon A, Shymanets A, Nürnberg B, Asara JM, Parsons R. p21-activated Kinases (PAKs) Mediate the Phosphorylation of PREX2 Protein to Initiate Feedback Inhibition of Rac1 GTPase. J Biol Chem 2015; 290:28915-31. [PMID: 26438819 DOI: 10.1074/jbc.m115.668244] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Indexed: 11/06/2022] Open
Abstract
Phosphatidylinositol 3,4,5-trisphosphate (PIP3)-dependent Rac exchanger 2 (PREX2) is a guanine nucleotide exchange factor (GEF) for the Ras-related C3 botulinum toxin substrate 1 (Rac1) GTPase, facilitating the exchange of GDP for GTP on Rac1. GTP-bound Rac1 then activates its downstream effectors, including p21-activated kinases (PAKs). PREX2 and Rac1 are frequently mutated in cancer and have key roles within the insulin-signaling pathway. Rac1 can be inactivated by multiple mechanisms; however, negative regulation by insulin is not well understood. Here, we show that in response to being activated after insulin stimulation, Rac1 initiates its own inactivation by decreasing PREX2 GEF activity. Following PREX2-mediated activation of Rac1 by the second messengers PIP3 or Gβγ, we found that PREX2 was phosphorylated through a PAK-dependent mechanism. PAK-mediated phosphorylation of PREX2 reduced GEF activity toward Rac1 by inhibiting PREX2 binding to PIP3 and Gβγ. Cell fractionation experiments also revealed that phosphorylation prevented PREX2 from localizing to the cellular membrane. Furthermore, the onset of insulin-induced phosphorylation of PREX2 was delayed compared with AKT. Altogether, we propose that second messengers activate the Rac1 signal, which sets in motion a cascade whereby PAKs phosphorylate and negatively regulate PREX2 to decrease Rac1 activation. This type of regulation would allow for transient activation of the PREX2-Rac1 signal and may be relevant in multiple physiological processes, including diseases such as diabetes and cancer when insulin signaling is chronically activated.
Collapse
Affiliation(s)
- Douglas Barrows
- From the Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, the Department of Pharmacology, Columbia University, New York, New York 10032
| | - Sarah M Schoenfeld
- From the Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029
| | - Cindy Hodakoski
- From the Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029
| | - Antonina Silkov
- the Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York, New York 10032
| | - Barry Honig
- the Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York, New York 10032
| | | | - Aliaksei Shymanets
- the Department of Pharmacology and Experimental Therapy, Institute of Experimental and Clinical Pharmacology and Toxicology, Eberhard Karls University Hospitals and Clinics, and Interfaculty Center of Pharmacogenomics and Pharmaceutical Research, University of Tübingen, 72074 Tübingen, Germany
| | - Bernd Nürnberg
- the Department of Pharmacology and Experimental Therapy, Institute of Experimental and Clinical Pharmacology and Toxicology, Eberhard Karls University Hospitals and Clinics, and Interfaculty Center of Pharmacogenomics and Pharmaceutical Research, University of Tübingen, 72074 Tübingen, Germany
| | - John M Asara
- the Division of Signal Transduction, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02115, and the Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Ramon Parsons
- From the Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029,
| |
Collapse
|
6
|
Minami S, Sawada K, Chikenji G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. PLoS One 2014; 9:e107959. [PMID: 25243952 PMCID: PMC4171485 DOI: 10.1371/journal.pone.0107959] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/β packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Complex Systems Science, Nagoya University, Nagoya, Aichi, Japan
| | - Kengo Sawada
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - George Chikenji
- Department of Computational Science and Engineering, Nagoya University, Nagoya, Aichi, Japan
- * E-mail:
| |
Collapse
|
7
|
Brylinski M. The utility of artificially evolved sequences in protein threading and fold recognition. J Theor Biol 2013; 328:77-88. [PMID: 23542050 DOI: 10.1016/j.jtbi.2013.03.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/24/2013] [Accepted: 03/18/2013] [Indexed: 12/23/2022]
Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
8
|
Dickson RJ, Gloor GB. Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors. PLoS One 2012; 7:e37645. [PMID: 22715369 PMCID: PMC3371027 DOI: 10.1371/journal.pone.0037645] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Accepted: 04/26/2012] [Indexed: 11/19/2022] Open
Abstract
The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/
Collapse
Affiliation(s)
| | - Gregory B. Gloor
- Department of Biochemistry, The University of Western Ontario, London, Canada
- * E-mail:
| |
Collapse
|