Kiebel SJ, von Kriegstein K, Daunizeau J, Friston KJ. Recognizing sequences of sequences.
PLoS Comput Biol 2009;
5:e1000464. [PMID:
19680429 PMCID:
PMC2714976 DOI:
10.1371/journal.pcbi.1000464]
[Citation(s) in RCA: 89] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Accepted: 07/10/2009] [Indexed: 11/17/2022] Open
Abstract
The brain's decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain.
Despite tremendous advances in neuroscience, we cannot yet build machines that recognize the world as effortlessly as we do. One reason might be that there are computational approaches to recognition that have not yet been exploited. Here, we demonstrate that the ability to recognize temporal sequences might play an important part. We show that an artificial decoding device can extract natural speech sounds from sound waves if speech is generated as dynamic and transient sequences of sequences. In principle, this means that artificial recognition can be implemented robustly and online using dynamic systems theory and Bayesian inference.
Collapse