Berezovsky IN, Kirzhner A, Kirzhner VM, Rosenfeld VR, Trifonov EN. Protein Sequences Yield a Proteomic Code.
J Biomol Struct Dyn 2003;
21:317-25. [PMID:
14616028 DOI:
10.1080/07391102.2003.10506928]
[Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Analysis of crystallized protein structures suggests that globular proteins are organized as consecutively connected units of 25-35 residues. These units are closed loops, that is returns of the polypeptide chain trajectory to a close contact with itself. This universal feature of apparently polymer-statistical nature is a basis for a principally novel view on the globular proteins as loop fold structures. The same unit size has been detected in protein sequences translated from complete prokaryotic genomes by positional autocorrelation analysis, which strongly indicates the evolutionary connection of the units. The units are further characterized by prototype sequences matching to their numerous derivatives in the translated genomes. The matches to five strongest prokaryotic prototypes and three prototypes of C. elegans are identified in the sequences of crystallized proteins, and their structures analyzed. Corresponding segments of the polypeptide chains in majority of cases form closed loops, though evolutionary fate of every prototype element is shown to be rather diverse. Then loop ends can be separated by a sequence-wise distant segments and stabilized by the spatial interactions in the context of the overall globular structure. The units belong to a presumably limited spectrum of the sequence prototypes, full repertoire of which would constitute a proteomic code.
Collapse