Probabilistic grammatical model for helix-helix contact site classification.
Algorithms Mol Biol 2013;
8:31. [PMID:
24350601 PMCID:
PMC3892132 DOI:
10.1186/1748-7188-8-31]
[Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 11/28/2013] [Indexed: 11/25/2022] Open
Abstract
Background
Hidden Markov Models power many state‐of‐the‐art tools in
the field of protein bioinformatics. While excelling in their tasks, these
methods of protein analysis do not convey directly information on
medium‐ and long‐range residue‐residue interactions. This
requires an expressive power of at least context‐free grammars.
However, application of more powerful grammar formalisms to protein analysis
has been surprisingly limited.
Results
In this work, we present a probabilistic grammatical framework for
problem‐specific protein languages and apply it to classification of
transmembrane helix‐helix pairs configurations. The core of the model
consists of a probabilistic context‐free grammar, automatically
inferred by a genetic algorithm from only a generic set of
expert‐based rules and positive training samples. The model was
applied to produce sequence based descriptors of four classes of
transmembrane helix‐helix contact site configurations. The highest
performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability
of representing structural features of helix‐helix contact sites.
Conclusions
We demonstrated that our probabilistic context‐free framework for
analysis of protein sequences outperforms the state of the art in the task
of helix‐helix contact site classification. However, this is achieved
without necessarily requiring modeling long range dependencies between
interacting residues. A significant feature of our approach is that grammar
rules and parse trees are human‐readable. Thus they could provide
biologically meaningful information for molecular biologists.
Collapse