1
|
Decoding an Amino Acid Sequence to Extract Information on Protein Folding. Molecules 2022; 27:molecules27093020. [PMID: 35566370 PMCID: PMC9106047 DOI: 10.3390/molecules27093020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/01/2022] [Accepted: 05/05/2022] [Indexed: 01/27/2023] Open
Abstract
Protein folding is a complicated phenomenon including various time scales (μs to several s), and various structural indices are required to analyze it. The methodologies used to study this phenomenon also have a wide variety and employ various experimental and computational techniques. Thus, a simple speculation does not serve to understand the folding mechanism of a protein. In the present review, we discuss the recent studies conducted by the author and their colleagues to decode amino acid sequences to obtain information on protein folding. We investigate globin-like proteins, ferredoxin-like fold proteins, IgG-like beta-sandwich fold proteins, lysozyme-like fold proteins and β-trefoil-like fold proteins. Our techniques are based on statistics relating to the inter-residue average distance, and our studies performed so far indicate that the information obtained from these analyses includes data on the protein folding mechanism. The relationships between our results and the actual protein folding phenomena are also discussed.
Collapse
|
2
|
McBride JM, Tlusty T. Slowest-first protein translation scheme: Structural asymmetry and co-translational folding. Biophys J 2021; 120:5466-5477. [PMID: 34813729 PMCID: PMC8715247 DOI: 10.1016/j.bpj.2021.11.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 09/30/2021] [Accepted: 11/17/2021] [Indexed: 11/19/2022] Open
Abstract
Proteins are translated from the N to the C terminus, raising the basic question of how this innate directionality affects their evolution. To explore this question, we analyze 16,200 structures from the Protein Data Bank (PDB). We find remarkable enrichment of α helices at the C terminus and β strands at the N terminus. Furthermore, this α-β asymmetry correlates with sequence length and contact order, both determinants of folding rate, hinting at possible links to co-translational folding (CTF). Hence, we propose the "slowest-first" scheme, whereby protein sequences evolved structural asymmetry to accelerate CTF: the slowest of the cooperatively folding segments are positioned near the N terminus so they have more time to fold during translation. A phenomenological model predicts that CTF can be accelerated by asymmetry in folding rate, up to double the rate, when folding time is commensurate with translation time; analysis of the PDB predicts that structural asymmetry is indeed maximal in this regime. This correspondence is greater in prokaryotes, which generally require faster protein production. Altogether, this indicates that accelerating CTF is a substantial evolutionary force whose interplay with stability and functionality is encoded in secondary structure asymmetry.
Collapse
Affiliation(s)
- John M McBride
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, South Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, South Korea; Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology, Ulsan, South Korea.
| |
Collapse
|
3
|
Alav I, Kobylka J, Kuth MS, Pos KM, Picard M, Blair JMA, Bavro VN. Structure, Assembly, and Function of Tripartite Efflux and Type 1 Secretion Systems in Gram-Negative Bacteria. Chem Rev 2021; 121:5479-5596. [PMID: 33909410 PMCID: PMC8277102 DOI: 10.1021/acs.chemrev.1c00055] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Indexed: 12/11/2022]
Abstract
Tripartite efflux pumps and the related type 1 secretion systems (T1SSs) in Gram-negative organisms are diverse in function, energization, and structural organization. They form continuous conduits spanning both the inner and the outer membrane and are composed of three principal components-the energized inner membrane transporters (belonging to ABC, RND, and MFS families), the outer membrane factor channel-like proteins, and linking the two, the periplasmic adaptor proteins (PAPs), also known as the membrane fusion proteins (MFPs). In this review we summarize the recent advances in understanding of structural biology, function, and regulation of these systems, highlighting the previously undescribed role of PAPs in providing a common architectural scaffold across diverse families of transporters. Despite being built from a limited number of basic structural domains, these complexes present a staggering variety of architectures. While key insights have been derived from the RND transporter systems, a closer inspection of the operation and structural organization of different tripartite systems reveals unexpected analogies between them, including those formed around MFS- and ATP-driven transporters, suggesting that they operate around basic common principles. Based on that we are proposing a new integrated model of PAP-mediated communication within the conformational cycling of tripartite systems, which could be expanded to other types of assemblies.
Collapse
Affiliation(s)
- Ilyas Alav
- Institute
of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Jessica Kobylka
- Institute
of Biochemistry, Biocenter, Goethe Universität
Frankfurt, Max-von-Laue-Straße 9, D-60438 Frankfurt, Germany
| | - Miriam S. Kuth
- Institute
of Biochemistry, Biocenter, Goethe Universität
Frankfurt, Max-von-Laue-Straße 9, D-60438 Frankfurt, Germany
| | - Klaas M. Pos
- Institute
of Biochemistry, Biocenter, Goethe Universität
Frankfurt, Max-von-Laue-Straße 9, D-60438 Frankfurt, Germany
| | - Martin Picard
- Laboratoire
de Biologie Physico-Chimique des Protéines Membranaires, CNRS
UMR 7099, Université de Paris, 75005 Paris, France
- Fondation
Edmond de Rothschild pour le développement de la recherche
Scientifique, Institut de Biologie Physico-Chimique, 75005 Paris, France
| | - Jessica M. A. Blair
- Institute
of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Vassiliy N. Bavro
- School
of Life Sciences, University of Essex, Colchester, CO4 3SQ United Kingdom
| |
Collapse
|
4
|
Kimura R, Aumpuchin P, Hamaue S, Shimomura T, Kikuchi T. Analyses of the folding sites of irregular β-trefoil fold proteins through sequence-based techniques and Gō-model simulations. BMC Mol Cell Biol 2020; 21:28. [PMID: 32295515 PMCID: PMC7477875 DOI: 10.1186/s12860-020-00271-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 03/31/2020] [Indexed: 02/07/2023] Open
Abstract
Background The details of the folding mechanisms have not yet been fully understood for many proteins, and it is believed that the information on the folding mechanism of a protein is encoded in its amino acid sequence. β-trefoil proteins are known to have the same 3D scaffold, namely, a three-fold symmetric scaffold, despite the proteins’ low sequence identity among superfamilies. In this study, we extract an initial folding unit from the amino acid sequences of irregular β-trefoil proteins by constructing an average distance map (ADM) and utilizing inter-residue average distance statistics to determine the relative contact frequencies for residue pairs in terms of F values. We compare our sequence-based prediction results with the packing between hydrophobic residues in native 3D structures and a Gō-model simulation. Results The ADM and F-value analyses predict that the N-terminal and C-terminal regions are compact and that the hydrophobic residues at the central region can be regarded as an interaction center with other residues. These results correspond well to those of the Gō-model simulations. Moreover, our results indicate that the irregular parts in the β-trefoil proteins do not hinder the protein formation. Conserved hydrophobic residues on the β5 strand are always the interaction center of packing between the conserved hydrophobic residues in both regular and irregular β-trefoil proteins. Conclusions We revealed that the β5 strand plays an important role in β-trefoil protein structure construction. The sequence-based methods used in this study can extract the protein folding information from only amino acid sequence data, and well corresponded to 3D structure-based Gō-model simulation and available experimental results.
Collapse
Affiliation(s)
- Risako Kimura
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
| | - Panyavut Aumpuchin
- National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong Luang, Pathumthani, 12120, Thailand
| | - Shoya Hamaue
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
| | - Takumi Shimomura
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan.
| |
Collapse
|
5
|
Shimomura T, Nishijima K, Kikuchi T. A new technique for predicting intrinsically disordered regions based on average distance map constructed with inter-residue average distance statistics. BMC STRUCTURAL BIOLOGY 2019; 19:3. [PMID: 30727987 PMCID: PMC6366092 DOI: 10.1186/s12900-019-0101-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Accepted: 01/23/2019] [Indexed: 01/03/2023]
Abstract
Background It had long been thought that a protein exhibits its specific function through its own specific 3D-structure under physiological conditions. However, subsequent research has shown that there are many proteins without specific 3D-structures under physiological conditions, so-called intrinsically disordered proteins (IDPs). This study presents a new technique for predicting intrinsically disordered regions in a protein, based on our average distance map (ADM) technique. The ADM technique was developed to predict compact regions or structural domains in a protein. In a protein containing partially disordered regions, a domain region is likely to be ordered, thus it is unlikely that a disordered region would be part of any domain. Therefore, the ADM technique is expected to also predict a disordered region between domains. Results The results of our new technique are comparable to the top three performing techniques in the community-wide CASP10 experiment. We further discuss the case of p53, a tumor-suppressor protein, which is the most significant protein among cell cycle regulatory proteins. This protein exhibits a disordered character as a monomer but an ordered character when two p53s form a dimer. Conclusion Our technique can predict the location of an intrinsically disordered region in a protein with an accuracy comparable to the best techniques proposed so far. Furthermore, it can also predict a core region of IDPs forming definite 3D structures through interactions, such as dimerization. The technique in our study may also serve as a means of predicting a disordered region which would become an ordered structure when binding to another protein. Electronic supplementary material The online version of this article (10.1186/s12900-019-0101-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Takumi Shimomura
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
| | - Kohki Nishijima
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577, Japan.
| |
Collapse
|
6
|
Aumpuchin P, Kikuchi T. Prediction of folding mechanisms for Ig-like beta sandwich proteins based on inter-residue average distance statistics methods. Proteins 2018; 87:120-135. [PMID: 30520530 DOI: 10.1002/prot.25637] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 10/05/2018] [Accepted: 11/29/2018] [Indexed: 11/11/2022]
Abstract
To understand the folding mechanism of a protein is one of the goals in bioinformatics study. Nowadays, it is enigmatic and difficult to extract folding information from amino acid sequence using standard bioinformatics techniques or even experimental protocols which can be time consuming. To overcome these problems, we aim to extract the initial folding unit for titin protein (Ig and fnIII domains) by means of inter-residue average distance statistics, Average Distance Map (ADM) and contact frequency analysis (F-value). TI I27 and TNfn3 domains are used to represent the Ig-domain and fnIII-domain, respectively. Beta-strands 2, 3, 5, and 6 are significant for the initial folding processes of TI I27. The central strands of TNfn3 were predicted as a primary folding segment. Known 3D structure and unknown 3D structure domains were investigated by structure or non-structure based multiple sequence alignment, respectively, to learn the conserved hydrophobic residues and predicted compact region relevant to evolution. Our results show good correspondence to experimental data, phi-value and protection factor from H-D exchange experiments. The significance of conserved hydrophobic residues near F-value peaks for structural stability using hydrophobic packing is confirmed. Our prediction methods once again could extract a folding mechanism only knowing the amino acid sequence.
Collapse
Affiliation(s)
- Panyavut Aumpuchin
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan
| |
Collapse
|
7
|
Xia X, Longo LM, Sutherland MA, Blaber M. Evolution of a protein folding nucleus. Protein Sci 2015; 25:1227-40. [PMID: 26610273 DOI: 10.1002/pro.2848] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 11/10/2015] [Indexed: 12/22/2022]
Abstract
The folding nucleus (FN) is a cryptic element within protein primary structure that enables an efficient folding pathway and is the postulated heritable element in the evolution of protein architecture; however, almost nothing is known regarding how the FN structurally changes as complex protein architecture evolves from simpler peptide motifs. We report characterization of the FN of a designed purely symmetric β-trefoil protein by ϕ-value analysis. We compare the structure and folding properties of key foldable intermediates along the evolutionary trajectory of the β-trefoil. The results show structural acquisition of the FN during gene fusion events, incorporating novel turn structure created by gene fusion. Furthermore, the FN is adjusted by circular permutation in response to destabilizing functional mutation. FN plasticity by way of circular permutation is made possible by the intrinsic C3 cyclic symmetry of the β-trefoil architecture, identifying a possible selective advantage that helps explain the prevalence of cyclic structural symmetry in the proteome.
Collapse
Affiliation(s)
- Xue Xia
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, Florida, 32306-4300
| | - Liam M Longo
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, Florida, 32306-4300.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Mason A Sutherland
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, Florida, 32306-4300
| | - Michael Blaber
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, Florida, 32306-4300
| |
Collapse
|
8
|
Sugita M, Matsuoka M, Kikuchi T. Topological and sequence information predict that foldons organize a partially overlapped and hierarchical structure. Proteins 2015; 83:1900-13. [PMID: 26248725 DOI: 10.1002/prot.24874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 06/23/2015] [Accepted: 07/29/2015] [Indexed: 11/09/2022]
Abstract
It has been suggested that proteins have substructures, called foldons, which can cooperatively fold into the native structure. However, several prior investigations define foldons in various ways, citing different foldon characteristics, thereby making the concept of a foldon ambiguous. In this study, we perform a Gō model simulation and analyze the characteristics of substructures that cooperatively fold into the native-like structure. Although some results do not agree well with the experimental evidence due to the simplicity of our coarse-grained model, our results strongly suggest that cooperatively folding units sometimes organize a partially overlapped and hierarchical structure. This view makes us easy to interpret some different proposal about the foldon as a difference of the hierarchical structure. On the basis of this finding, we present a new method to assign foldons and their hierarchy, using structural and sequence information. The results show that the foldons assigned by our method correspond to the intermediate structures identified by some experimental techniques. The new method makes it easy to predict whether a protein folds sequentially into the native structure or whether some foldons fold into the native structure in parallel.
Collapse
Affiliation(s)
- Masatake Sugita
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Masanari Matsuoka
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan
| |
Collapse
|
9
|
Verma D, Grigoryan G, Bailey-Kellogg C. Structure-based design of combinatorial mutagenesis libraries. Protein Sci 2015; 24:895-908. [PMID: 25611189 DOI: 10.1002/pro.2642] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 12/14/2014] [Accepted: 01/11/2015] [Indexed: 01/27/2023]
Abstract
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called "Structure-based Optimization of Combinatorial Mutagenesis" (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states.
Collapse
Affiliation(s)
- Deeptak Verma
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire
| | | | | |
Collapse
|
10
|
Matsuoka M, Sugita M, Kikuchi T. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures. BMC Res Notes 2014; 7:654. [PMID: 25231773 PMCID: PMC4180342 DOI: 10.1186/1756-0500-7-654] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 09/05/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. RESULTS It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. CONCLUSIONS The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Collapse
Affiliation(s)
| | | | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, Japan.
| |
Collapse
|