1
|
Pereira de Araújo AF. Sequence-dependent and -independent information in a combined random energy model for protein folding and coding. Proteins 2024; 92:679-687. [PMID: 38158239 DOI: 10.1002/prot.26658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/11/2023] [Accepted: 12/15/2023] [Indexed: 01/03/2024]
Abstract
Random energy models (REMs) provide a simple description of the energy landscapes that guide protein folding and evolution. The requirement of a large energy gap between the native structure and unfolded conformations, considered necessary for cooperative, protein-like, folding behavior, indicates that proteins differ markedly from random heteropolymers. It has been suggested, therefore, that natural selection might have acted to choose nonrandom amino acid sequences satisfying this particular condition, implying that a large fraction of possible, unselected random sequences, would not fold to any structure. From an informational perspective, however, this scenario could indicate that protein structures, regarded as messages to be transmitted through a communication channel, would not be efficiently encoded in amino acid sequences, regarded as the communication channel for this transmission, since a large fraction of possible channel states would not be used. Here, we use a combined REM for conformations and sequences, with previously estimated parameters for natural proteins, to explore an alternative possibility in which the appropriate shape of the landscape results mainly from the deviation from randomness of possible native structures instead of sequences. We observe that this situation emerges naturally if the distribution of conformational energies happens to arise from two independent contributions corresponding to sequence-dependent and -independent terms. This construction is consistent with the hypothesis of a protein burial folding code, with native structures being determined by a modest amount of sequence-dependent atomic burial information with sequence-independent constraints imposed by unspecific hydrogen bond formation. More generally, an appropriate combination of sequence-dependent and -independent information accommodates the possibility of an efficient structural encoding with the main physical requirement for folding, providing possible insight not only on the folding process but also on several aspects sequence evolution such as neutral networks, conformational coverage, and de novo gene emergence.
Collapse
Affiliation(s)
- Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica, Departamento de Biologia Celular, Universidade de Brasília, Brasília, Brazil
| |
Collapse
|
2
|
Sánchez IE, Galpern EA, Garibaldi MM, Ferreiro DU. Molecular Information Theory Meets Protein Folding. J Phys Chem B 2022; 126:8655-8668. [PMID: 36282961 DOI: 10.1021/acs.jpcb.2c04532] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold ∼2.2 ± 0.3 bits/(site·operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human-built macroscopic machines. We propose a simple mapping between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy, and the energetics of protein folding.
Collapse
Affiliation(s)
- Ignacio E Sánchez
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Ezequiel A Galpern
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Martín M Garibaldi
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Diego U Ferreiro
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| |
Collapse
|
3
|
van der Linden MG, Ferreira DC, Pereira de Araújo AF. Constrained Layer Assignment for the Protein Burial Folding Code Accounting for Chain Connectivity. J Phys Chem B 2022; 126:6159-6170. [PMID: 35952378 DOI: 10.1021/acs.jpcb.2c03931] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The connection between protein sequences and tertiary structures has intrigued investigators for decades. A plausible hypothesis for the coding scheme postulates that atomic burial information obtainable from the sequence could be sufficient for structural determination when combined to sequence-independent constraints. Accordingly, folding simulations using native burial information expressed by atomic central distances, discretized into a small number L of equiprobable burial layers, have indeed been successful in reaching and distinguishing the native structure of several globular proteins. Attempted predictions of layers from sequence, however, turned out to be insufficiently accurate for most proteins. Here we explore the possibility that a nonuniform assignment of layers, which is intended to account for constraints imposed by chain connectivity, might provide a more efficient burial encoding of tertiary structures. We consider the condition that adjacent Cα-atoms along the sequence cannot occupy nonadjacent layers, in which case the information required to specify sequences of burials would be smaller. It is shown that appropriate folding behavior can still be observed in this explicitly more constrained scenario with a structure-dependent assignment intended to produce the thinnest possible layers still compatible with the imposed burial constraint. This thinnest assignment turns out to be sufficiently restrictive for the observed examples and provides appropriately thinner layers or, equivalently, a larger number of layers, for examples previously observed to indeed require more restrictive constraints when compared to counterparts of similar size, as well as the appropriate increase in number of layers for larger proteins. Implications for the general understanding of the protein folding code are discussed.
Collapse
Affiliation(s)
- Marx G van der Linden
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília - UnB, Brasília-DF 70910-900, Brazil.,Instituto Federal de Educação, Ciência e Tecnologia de Brasília - IFB, SGAN quadra 610 Módulos D, E, F, G, Brasília-DF 70830-450, Brazil
| | - Diogo C Ferreira
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília - UnB, Brasília-DF 70910-900, Brazil
| | - Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica e Computacional, Departamento de Biologia Celular, Universidade de Brasília - UnB, Brasília-DF 70910-900, Brazil
| |
Collapse
|
4
|
Gadzała M, Dułak D, Kalinowska B, Baster Z, Bryliński M, Konieczny L, Banach M, Roterman I. The aqueous environment as an active participant in the protein folding process. J Mol Graph Model 2018; 87:227-239. [PMID: 30580160 DOI: 10.1016/j.jmgm.2018.12.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 12/05/2018] [Accepted: 12/12/2018] [Indexed: 01/27/2023]
Abstract
Existing computational models applied in the protein structure prediction process do not sufficiently account for the presence of the aqueous solvent. The solvent is usually represented by a predetermined number of H2O molecules in the bounding box which contains the target chain. The fuzzy oil drop (FOD) model, presented in this paper, follows an alternative approach, with the solvent assuming the form of a continuous external hydrophobic force field, with a Gaussian distribution. The effect of this force field is to guide hydrophobic residues towards the center of the protein body, while promoting exposure of hydrophilic residues on its surface. This work focuses on the following sample proteins: Engrailed homeodomain (RCSB: 1enh), Chicken villin subdomain hp-35, n68h (RCSB: 1yrf), Chicken villin subdomain hp-35, k65(nle), n68h, k70(nle) (RCSB: 2f4k), Thermostable subdomain from chicken villin headpiece (RCSB: 1vii), de novo designed single chain three-helix bundle (a3d) (RCSB: 2a3d), albumin-binding domain (RCSB: 1prb) and lambda repressor-operator complex (RCSB: 1lmb).
Collapse
Affiliation(s)
| | - Dawid Dułak
- ABB Business Services Sp. z o.o. ul. Żegańska 1, 04-713, Warszawa, Poland.
| | - Barbara Kalinowska
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 11 Łojasiewicza Street, Kraków, Poland; Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland
| | - Zbigniew Baster
- Department of Molecular and Interfacial Biophysics, Faculty of Physics, Astronomy, Applied Computer Science Jagiellonian University, 11 Łojasiewicza Street, Kraków, Poland; Markey Cancer Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, USA
| | - Michał Bryliński
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA; Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Jagiellonian University - Medical College, Kopernika 7E, 31-034, Kraków, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland.
| |
Collapse
|