1
|
Straka Z, Svoboda T, Hoffmann M. PreCNet: Next-Frame Video Prediction Based on Predictive Coding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10353-10367. [PMID: 37022810 DOI: 10.1109/tnnls.2023.3240857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next-frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, and SSIM) was further improved when a larger training set (2M images from BDD100k) pointed to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based on a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.
Collapse
|
2
|
Huang YT, Wu CT, Koike S, Chao ZC. Dissecting Mismatch Negativity: Early and Late Subcomponents for Detecting Deviants in Local and Global Sequence Regularities. eNeuro 2024; 11:ENEURO.0050-24.2024. [PMID: 38702187 PMCID: PMC11103647 DOI: 10.1523/eneuro.0050-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/11/2024] [Accepted: 04/26/2024] [Indexed: 05/06/2024] Open
Abstract
Mismatch negativity (MMN) is commonly recognized as a neural signal of prediction error evoked by deviants from the expected patterns of sensory input. Studies show that MMN diminishes when sequence patterns become more predictable over a longer timescale. This implies that MMN is composed of multiple subcomponents, each responding to different levels of temporal regularities. To probe the hypothesized subcomponents in MMN, we record human electroencephalography during an auditory local-global oddball paradigm where the tone-to-tone transition probability (local regularity) and the overall sequence probability (global regularity) are manipulated to control temporal predictabilities at two hierarchical levels. We find that the size of MMN is correlated with both probabilities and the spatiotemporal structure of MMN can be decomposed into two distinct subcomponents. Both subcomponents appear as negative waveforms, with one peaking early in the central-frontal area and the other late in a more frontal area. With a quantitative predictive coding model, we map the early and late subcomponents to the prediction errors that are tied to local and global regularities, respectively. Our study highlights the hierarchical complexity of MMN and offers an experimental and analytical platform for developing a multitiered neural marker applicable in clinical settings.
Collapse
Affiliation(s)
- Yiyuan Teresa Huang
- International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Tokyo 113-0033, Japan
- School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei 100, Taiwan
- Department of Multidisciplinary Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo 153-8902, Japan
| | - Chien-Te Wu
- International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Tokyo 113-0033, Japan
- School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei 100, Taiwan
| | - Shinsuke Koike
- International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Tokyo 113-0033, Japan
- Department of Multidisciplinary Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo 153-8902, Japan
- University of Tokyo Institute for Diversity & Adaptation of Human Mind (UTIDAHM), Tokyo 113-0033, Japan
| | - Zenas C Chao
- International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
3
|
Fresco N, Elber-Dorozko L. Scientists Invent New Hypotheses, Do Brains? Cogn Sci 2024; 48:e13400. [PMID: 38196160 DOI: 10.1111/cogs.13400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 10/19/2023] [Accepted: 12/19/2023] [Indexed: 01/11/2024]
Abstract
How are new Bayesian hypotheses generated within the framework of predictive processing? This explanatory framework purports to provide a unified, systematic explanation of cognition by appealing to Bayes rule and hierarchical Bayesian machinery alone. Given that the generation of new hypotheses is fundamental to Bayesian inference, the predictive processing framework faces an important challenge in this regard. By examining several cognitive-level and neurobiological architecture-inspired models of hypothesis generation, we argue that there is an essential difference between the two types of models. Cognitive-level models do not specify how they can be implemented in brains and include structures and assumptions that are external to the predictive processing framework. By contrast, neurobiological architecture-inspired models, which aim to better resemble brain processes, fail to explain important capacities of cognition, such as categorization and few-shot learning. The "scaling-up" challenge for proponents of predictive processing is to explain the relationship between these two types of models using only the theoretical and conceptual machinery of Bayesian inference.
Collapse
Affiliation(s)
- Nir Fresco
- Departments of Cognitive & Brain Sciences and Philosophy, Ben-Gurion University of the Negev
| | - Lotem Elber-Dorozko
- The Humanities and Arts Department, Technion - Israel Institute of Technology
- The Center for Philosophy of Science, University of Pittsburgh
| |
Collapse
|
4
|
Brucklacher M, Bohté SM, Mejias JF, Pennartz CMA. Local minimization of prediction errors drives learning of invariant object representations in a generative network model of visual perception. Front Comput Neurosci 2023; 17:1207361. [PMID: 37818157 PMCID: PMC10561268 DOI: 10.3389/fncom.2023.1207361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/31/2023] [Indexed: 10/12/2023] Open
Abstract
The ventral visual processing hierarchy of the cortex needs to fulfill at least two key functions: perceived objects must be mapped to high-level representations invariantly of the precise viewing conditions, and a generative model must be learned that allows, for instance, to fill in occluded information guided by visual experience. Here, we show how a multilayered predictive coding network can learn to recognize objects from the bottom up and to generate specific representations via a top-down pathway through a single learning rule: the local minimization of prediction errors. Trained on sequences of continuously transformed objects, neurons in the highest network area become tuned to object identity invariant of precise position, comparable to inferotemporal neurons in macaques. Drawing on this, the dynamic properties of invariant object representations reproduce experimentally observed hierarchies of timescales from low to high levels of the ventral processing stream. The predicted faster decorrelation of error-neuron activity compared to representation neurons is of relevance for the experimental search for neural correlates of prediction errors. Lastly, the generative capacity of the network is confirmed by reconstructing specific object images, robust to partial occlusion of the inputs. By learning invariance from temporal continuity within a generative model, the approach generalizes the predictive coding framework to dynamic inputs in a more biologically plausible way than self-supervised networks with non-local error-backpropagation. This was achieved simply by shifting the training paradigm to dynamic inputs, with little change in architecture and learning rule from static input-reconstructing Hebbian predictive coding networks.
Collapse
Affiliation(s)
- Matthias Brucklacher
- Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Sander M Bohté
- Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
- Machine Learning Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
| | - Jorge F Mejias
- Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Cyriel M A Pennartz
- Cognitive and Systems Neuroscience Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
5
|
Dodgson DB, Raymond JE. Banknote authenticity is signalled by rapid neural responses. Sci Rep 2022; 12:2076. [PMID: 35136115 PMCID: PMC8827094 DOI: 10.1038/s41598-022-05972-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 01/18/2022] [Indexed: 11/17/2022] Open
Abstract
Authenticating valuable objects is widely assumed to involve protracted scrutiny for detection of reproduction flaws. Yet, accurate authentication of banknotes is possible within one second of viewing, suggesting that rapid neural processes may underpin counterfeit detection. To investigate, we measured event-related brain potentials (ERPs) in response to briefly viewed genuine or forensically recovered counterfeit banknotes presented in a visual oddball counterfeit detection task. Three ERP components, P1, P3, and extended P3, were assessed for each combination of banknote type (genuine, counterfeit) and overt response (“real”, “fake”). P1 amplitude was greater for oddballs, demonstrating that the initial feedforward sweep of visual processing yields the essential information for differentiating genuine from counterfeit. A similar oddball effect was found for P3. The magnitude of this P3 effect was positively correlated with behavioural counterfeit sensitivity, although the corresponding correlation for P1 was not. For the extended P3, amplitude was greatest for correctly detected counterfeits and similarly small for missed counterfeits, incorrectly and correctly categorised genuine banknotes. These results show that authentication of complex stimuli involves a cascade of neural processes that unfolds in under a second, beginning with a very rapid sensory analysis, followed by a later decision stage requiring higher level processing.
Collapse
Affiliation(s)
- Daniel B Dodgson
- School of Psychology, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Jane E Raymond
- School of Psychology, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.
| |
Collapse
|
6
|
Teichmann M, Larisch R, Hamker FH. Performance of biologically grounded models of the early visual system on standard object recognition tasks. Neural Netw 2021; 144:210-228. [PMID: 34507042 DOI: 10.1016/j.neunet.2021.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/05/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Computational neuroscience models of vision and neural network models for object recognition are often framed by different research agendas. Computational neuroscience mainly aims at replicating experimental data, while (artificial) neural networks target high performance on classification tasks. However, we propose that models of vision should be validated on object recognition tasks. At some point, mechanisms of realistic neuro-computational models of the visual cortex have to convince in object recognition as well. In order to foster this idea, we report the recognition accuracy for two different neuro-computational models of the visual cortex on several object recognition datasets. The models were trained using unsupervised Hebbian learning rules on natural scene inputs for the emergence of receptive fields comparable to their biological counterpart. We assume that the emerged receptive fields result in a general codebook of features, which should be applicable to a variety of visual scenes. We report the performances on datasets with different levels of difficulty, ranging from the simple MNIST to the more complex CIFAR-10 or ETH-80. We found that both networks show good results on simple digit recognition, comparable with previously published biologically plausible models. We also observed that our deeper layer neurons provide for naturalistic datasets a better recognition codebook. As for most datasets, recognition results of biologically grounded models are not available yet, our results provide a broad basis of performance values to compare methodologically similar models.
Collapse
Affiliation(s)
- Michael Teichmann
- Chemnitz University of Technology, Str. der Nationen, 62, 09111, Chemnitz, Germany.
| | - René Larisch
- Chemnitz University of Technology, Str. der Nationen, 62, 09111, Chemnitz, Germany.
| | - Fred H Hamker
- Chemnitz University of Technology, Str. der Nationen, 62, 09111, Chemnitz, Germany.
| |
Collapse
|
7
|
Parr T, Sajid N, Da Costa L, Mirza MB, Friston KJ. Generative Models for Active Vision. Front Neurorobot 2021; 15:651432. [PMID: 33927605 PMCID: PMC8076554 DOI: 10.3389/fnbot.2021.651432] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/15/2021] [Indexed: 11/13/2022] Open
Abstract
The active visual system comprises the visual cortices, cerebral attention networks, and oculomotor system. While fascinating in its own right, it is also an important model for sensorimotor networks in general. A prominent approach to studying this system is active inference-which assumes the brain makes use of an internal (generative) model to predict proprioceptive and visual input. This approach treats action as ensuring sensations conform to predictions (i.e., by moving the eyes) and posits that visual percepts are the consequence of updating predictions to conform to sensations. Under active inference, the challenge is to identify the form of the generative model that makes these predictions-and thus directs behavior. In this paper, we provide an overview of the generative models that the brain must employ to engage in active vision. This means specifying the processes that explain retinal cell activity and proprioceptive information from oculomotor muscle fibers. In addition to the mechanics of the eyes and retina, these processes include our choices about where to move our eyes. These decisions rest upon beliefs about salient locations, or the potential for information gain and belief-updating. A key theme of this paper is the relationship between "looking" and "seeing" under the brain's implicit generative model of the visual world.
Collapse
Affiliation(s)
- Thomas Parr
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| | - Noor Sajid
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| | - Lancelot Da Costa
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - M. Berk Mirza
- Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Karl J. Friston
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, London, United Kingdom
| |
Collapse
|
8
|
Peyrin C, Roux-Sibilon A, Trouilloud A, Khazaz S, Joly M, Pichat C, Boucart M, Krainik A, Kauffmann L. Semantic and Physical Properties of Peripheral Vision Are Used for Scene Categorization in Central Vision. J Cogn Neurosci 2021; 33:799-813. [PMID: 33571079 DOI: 10.1162/jocn_a_01689] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Theories of visual recognition postulate that our ability to understand our visual environment at a glance is based on the extraction of the gist of the visual scene, a first global and rudimentary visual representation. Gist perception would be based on the rapid analysis of low spatial frequencies in the visual signal and would allow a coarse categorization of the scene. We aimed to study whether the low spatial resolution information available in peripheral vision could modulate the processing of visual information presented in central vision. We combined behavioral measures (Experiments 1 and 2) and fMRI measures (Experiment 2). Participants categorized a scene presented in central vision (artificial vs. natural categories) while ignoring another scene, either semantically congruent or incongruent, presented in peripheral vision. The two scenes could either share the same physical properties (similar amplitude spectrum and spatial configuration) or not. Categorization of the central scene was impaired by a semantically incongruent peripheral scene, in particular when the two scenes were physically similar. This semantic interference effect was associated with increased activation of the inferior frontal gyrus. When the two scenes were semantically congruent, the dissimilarity of their physical properties impaired the categorization of the central scene. This effect was associated with increased activation in occipito-temporal areas. In line with the hypothesis of predictive mechanisms involved in visual recognition, results suggest that semantic and physical properties of the information coming from peripheral vision would be automatically used to generate predictions that guide the processing of signal in central vision.
Collapse
|
9
|
Boutin V, Franciosini A, Chavane F, Ruffier F, Perrinet L. Sparse deep predictive coding captures contour integration capabilities of the early visual system. PLoS Comput Biol 2021; 17:e1008629. [PMID: 33497381 PMCID: PMC7864399 DOI: 10.1371/journal.pcbi.1008629] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 02/05/2021] [Accepted: 12/12/2020] [Indexed: 11/20/2022] Open
Abstract
Both neurophysiological and psychophysical experiments have pointed out the crucial role of recurrent and feedback connections to process context-dependent information in the early visual cortex. While numerous models have accounted for feedback effects at either neural or representational level, none of them were able to bind those two levels of analysis. Is it possible to describe feedback effects at both levels using the same model? We answer this question by combining Predictive Coding (PC) and Sparse Coding (SC) into a hierarchical and convolutional framework applied to realistic problems. In the Sparse Deep Predictive Coding (SDPC) model, the SC component models the internal recurrent processing within each layer, and the PC component describes the interactions between layers using feedforward and feedback connections. Here, we train a 2-layered SDPC on two different databases of images, and we interpret it as a model of the early visual system (V1 & V2). We first demonstrate that once the training has converged, SDPC exhibits oriented and localized receptive fields in V1 and more complex features in V2. Second, we analyze the effects of feedback on the neural organization beyond the classical receptive field of V1 neurons using interaction maps. These maps are similar to association fields and reflect the Gestalt principle of good continuation. We demonstrate that feedback signals reorganize interaction maps and modulate neural activity to promote contour integration. Third, we demonstrate at the representational level that the SDPC feedback connections are able to overcome noise in input images. Therefore, the SDPC captures the association field principle at the neural level which results in a better reconstruction of blurred images at the representational level.
Collapse
Affiliation(s)
- Victor Boutin
- Aix Marseille Univ, CNRS, INT, Inst Neurosci Timone, Marseille, France
- Aix Marseille Univ, CNRS, ISM, Marseille, France
| | | | - Frederic Chavane
- Aix Marseille Univ, CNRS, INT, Inst Neurosci Timone, Marseille, France
| | | | - Laurent Perrinet
- Aix Marseille Univ, CNRS, INT, Inst Neurosci Timone, Marseille, France
| |
Collapse
|
10
|
Boutin V, Franciosini A, Ruffier F, Perrinet L. Effect of Top-Down Connections in Hierarchical Sparse Coding. Neural Comput 2020; 32:2279-2309. [PMID: 32946716 DOI: 10.1162/neco_a_01325] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Hierarchical sparse coding (HSC) is a powerful model to efficiently represent multidimensional, structured data such as images. The simplest solution to solve this computationally hard problem is to decompose it into independent layer-wise subproblems. However, neuroscientific evidence would suggest interconnecting these subproblems as in predictive coding (PC) theory, which adds top-down connections between consecutive layers. In this study, we introduce a new model, 2-layer sparse predictive coding (2L-SPC), to assess the impact of this interlayer feedback connection. In particular, the 2L-SPC is compared with a hierarchical Lasso (Hi-La) network made out of a sequence of independent Lasso layers. The 2L-SPC and a 2-layer Hi-La networks are trained on four different databases and with different sparsity parameters on each layer. First, we show that the overall prediction error generated by 2L-SPC is lower thanks to the feedback mechanism as it transfers prediction error between layers. Second, we demonstrate that the inference stage of the 2L-SPC is faster to converge and generates a refined representation in the second layer compared to the Hi-La model. Third, we show that the 2L-SPC top-down connection accelerates the learning process of the HSC problem. Finally, the analysis of the emerging dictionaries shows that the 2L-SPC features are more generic and present a larger spatial extension.
Collapse
Affiliation(s)
- Victor Boutin
- CNRS, INT, Institut de Neurosciences de la Timone, Aix-Marseille Université, Marseille, France and CNRS, ISM, Aix Marseille Université, Marseille, France
| | - Angelo Franciosini
- CNRS, Institut de Neurosciences de la Timone, Aix-Marseille Université, 13005 Marseille, France
| | - Franck Ruffier
- CNRS, Institut des Sciences du Mouvement, Aix-Marseille Université, 13009 Marseille, France
| | - Laurent Perrinet
- CNRS, Institut de Neurosciences de la Timone, Aix-Marseille Université, 13005 Marseille, France
| |
Collapse
|
11
|
Shalev I. Motivated Cue-Integration and Emotion Regulation: Awareness of the Association Between Interoceptive and Exteroceptive Embodied Cues and Personal Need Creates an Emotion Goal. Front Psychol 2020; 11:1630. [PMID: 32754097 PMCID: PMC7367138 DOI: 10.3389/fpsyg.2020.01630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 06/16/2020] [Indexed: 02/04/2023] Open
Abstract
Research on emotion suggests that individuals regulate their emotions to attain hedonic or instrumental goals. However, little is known of emotion regulation under low emotional clarity. The theory of motivated cue integration (MCI) suggests that emotion regulation under low emotional clarity should be understood as dissociation between a high-level individual hierarchical system of goals and low level interoceptive and exteroceptive embodied cues. MCI conceptualizes low emotional clarity as the product of low access to signals of emotion that result in prediction error associated with mismatch between the current bodily state and the predicted state. This deficit in emotional processing could be understood as a problem of means substitution, suggesting that use of alternative multisensory data may facilitate situational evaluation. Based on this reasoning, a new perspective on emotion regulation under low emotional clarity is presented, according to which interchangeable attention to multisensory data associated with words, associations, and images may help in cue integration, enabling the creation of a link between concrete bodily cues, abstract mental representation, and a more accurate prediction. Based on the idea that emotional episodes are conceptualized as special types of goal-directed action episodes, this process will lead to the creation of broader integrative meaning, results in the development of emotion goal.
Collapse
Affiliation(s)
- Idit Shalev
- Laboratory for Embodiment and Self-Regulation, Department of Psychology, Ariel University, Ariel, Israel
| |
Collapse
|
12
|
Fields C, Glazebrook JF. Do Process-1 simulations generate the epistemic feelings that drive Process-2 decision making? Cogn Process 2020; 21:533-553. [PMID: 32607801 DOI: 10.1007/s10339-020-00981-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 06/09/2020] [Indexed: 11/24/2022]
Abstract
We apply previously developed Chu space and Channel Theory methods, focusing on the construction of Cone-Cocone Diagrams (CCCDs), to study the role of epistemic feelings, particularly feelings of confidence, in dual process models of problem solving. We specifically consider "Bayesian brain" models of probabilistic inference within a global neuronal workspace architecture. We develop a formal representation of Process-1 problem solving in which a solution is reached if and only if a CCCD is completed. We show that in this representation, Process-2 problem solving can be represented as multiply iterated Process-1 problem solving and has the same formal solution conditions. We then model the generation of explicit, reportable subjective probabilities from implicit, experienced confidence as a simulation-based, reverse engineering process and show that this process can also be modeled as a CCCD construction.
Collapse
Affiliation(s)
| | - James F Glazebrook
- Department of Mathematics and Computer Science, Eastern Illinois University, 600 Lincoln Ave., Charleston, IL, 61920-3099, USA.,Department of Mathematics, University of Illinois at Urbana, Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
13
|
Dense-CaptionNet: a Sentence Generation Architecture for Fine-grained Description of Image Semantics. Cognit Comput 2020. [DOI: 10.1007/s12559-019-09697-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Valuch C, Kulke L. Predictive context biases binocular rivalry in children and adults with no positive relation to two measures of social cognition. Sci Rep 2020; 10:2059. [PMID: 32029863 PMCID: PMC7005192 DOI: 10.1038/s41598-020-58921-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 01/21/2020] [Indexed: 11/09/2022] Open
Abstract
Integration of prior experience and contextual information can help to resolve perceptually ambiguous situations and might support the ability to understand other peoples' thoughts and intentions, called Theory of Mind. We studied whether the readiness to incorporate contextual information for resolving binocular rivalry is positively associated with Theory-of-Mind-related social cognitive abilities. In children (12 to 13 years) and adults (18 to 25 years), a predictive temporal context reliably modulated the onset of binocular rivalry to a similar degree. In contrast, adult participants scored better on measures of Theory of Mind compared to children. We observed considerable interindividual differences regarding the influence of a predictive context on binocular rivalry, which were associated with differences in sensory eye dominance. The absence of a positive association between predictive effects on perception and Theory of Mind performance suggests that predictive effects on binocular rivalry and higher-level Theory-of-Mind-related abilities stem from different neurocognitive mechanisms. We conclude that the influence of predictive contextual information on basic visual processes is fully developed at an earlier age, whereas social cognitive skills continue to evolve from adolescence to adulthood.
Collapse
Affiliation(s)
- Christian Valuch
- Department of Experimental Psychology, University of Goettingen, Goettingen, Germany.
- Leibniz ScienceCampus Primate Cognition, Goettingen, Germany.
| | - Louisa Kulke
- Department of Affective Neuroscience and Psychophysiology, University of Goettingen, Goettingen, Germany.
- Leibniz ScienceCampus Primate Cognition, Goettingen, Germany.
| |
Collapse
|
15
|
Han K, Wen H, Shi J, Lu KH, Zhang Y, Fu D, Liu Z. Variational autoencoder: An unsupervised model for encoding and decoding fMRI activity in visual cortex. Neuroimage 2019; 198:125-136. [PMID: 31103784 PMCID: PMC6592726 DOI: 10.1016/j.neuroimage.2019.05.039] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Revised: 04/13/2019] [Accepted: 05/15/2019] [Indexed: 01/21/2023] Open
Abstract
Goal-driven and feedforward-only convolutional neural networks (CNN) have been shown to be able to predict and decode cortical responses to natural images or videos. Here, we explored an alternative deep neural network, variational auto-encoder (VAE), as a computational model of the visual cortex. We trained a VAE with a five-layer encoder and a five-layer decoder to learn visual representations from a diverse set of unlabeled images. Using the trained VAE, we predicted and decoded cortical activity observed with functional magnetic resonance imaging (fMRI) from three human subjects passively watching natural videos. Compared to CNN, VAE could predict the video-evoked cortical responses with comparable accuracy in early visual areas, but relatively lower accuracy in higher-order visual areas. The distinction between CNN and VAE in terms of encoding performance was primarily attributed to their different learning objectives, rather than their different model architecture or number of parameters. Despite lower encoding accuracies, VAE offered a more convenient strategy for decoding the fMRI activity to reconstruct the video input, by first converting the fMRI activity to the VAE's latent variables, and then converting the latent variables to the reconstructed video frames through the VAE's decoder. This strategy was more advantageous than alternative decoding methods, e.g. partial least squares regression, for being able to reconstruct both the spatial structure and color of the visual input. Such findings highlight VAE as an unsupervised model for learning visual representation, as well as its potential and limitations for explaining cortical responses and reconstructing naturalistic and diverse visual experiences.
Collapse
Affiliation(s)
- Kuan Han
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Haiguang Wen
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Junxing Shi
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Kun-Han Lu
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Yizhen Zhang
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Di Fu
- School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA
| | - Zhongming Liu
- Weldon School of Biomedical Engineering, USA; School of Electrical and Computer Engineering, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
16
|
Spratling MW. Fitting predictive coding to the neurophysiological data. Brain Res 2019; 1720:146313. [PMID: 31265817 DOI: 10.1016/j.brainres.2019.146313] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 06/18/2019] [Accepted: 06/27/2019] [Indexed: 02/02/2023]
Abstract
Recent neurophysiological data showing the effects of locomotion on neural activity in mouse primary visual cortex has been interpreted as providing strong support for the predictive coding account of cortical function. Specifically, this work has been interpreted as providing direct evidence that prediction-error, a distinguishing property of predictive coding, is encoded in cortex. This article evaluates these claims and highlights some of the discrepancies between the proposed predictive coding model and the neuro-biology. Furthermore, it is shown that the model can be modified so as to fit the empirical data more successfully.
Collapse
Affiliation(s)
- M W Spratling
- King's College London, Department of Informatics, London, UK.
| |
Collapse
|
17
|
Zhukova NA, Andriyanova NR. Cognitive Monitoring of Distributed Objects. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS 2019. [DOI: 10.3103/s0005105519010084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Yue Z, Gao F, Xiong Q, Wang J, Huang T, Yang E, Zhou H. A Novel Semi-Supervised Convolutional Neural Network Method for Synthetic Aperture Radar Image Recognition. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09639-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
|
20
|
Sanchez-Giraldo LG, Laskar MNU, Schwartz O. Normalization and pooling in hierarchical models of natural images. Curr Opin Neurobiol 2019; 55:65-72. [PMID: 30785005 DOI: 10.1016/j.conb.2019.01.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 12/29/2018] [Accepted: 01/13/2019] [Indexed: 11/17/2022]
Abstract
Divisive normalization and subunit pooling are two canonical classes of computation that have become widely used in descriptive (what) models of visual cortical processing. Normative (why) models from natural image statistics can help constrain the form and parameters of such classes of models. We focus on recent advances in two particular directions, namely deriving richer forms of divisive normalization, and advances in learning pooling from image statistics. We discuss the incorporation of such components into hierarchical models. We consider both hierarchical unsupervised learning from image statistics, and discriminative supervised learning in deep convolutional neural networks (CNNs). We further discuss studies on the utility and extensions of the convolutional architecture, which has also been adopted by recent descriptive models. We review the recent literature and discuss the current promises and gaps of using such approaches to gain a better understanding of how cortical neurons represent and process complex visual stimuli.
Collapse
Affiliation(s)
- Luis G Sanchez-Giraldo
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States.
| | - Md Nasir Uddin Laskar
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| | - Odelia Schwartz
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| |
Collapse
|
21
|
Betz N, Hoemann K, Barrett LF. Words are a context for mental inference. ACTA ACUST UNITED AC 2019; 19:1463-1477. [PMID: 30628815 DOI: 10.1037/emo0000510] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Accumulating evidence indicates that context has an important impact on inferring emotion in facial configurations. In this paper, we report on three studies examining whether words referring to mental states contribute to mental inference in images from the Reading the Mind in the Eyes Test (Study 1), Baron-Cohen et al. (2001) in static emoji (Study 2), and in animated emoji (Study 3). Across all three studies, we predicted and found that perceivers were more likely to infer mental states when relevant words were embedded in the experimental context (i.e., in a forced-choice task) versus when those words were absent (i.e., in a free-labeling task). We discuss the implications of these findings for the widespread conclusion that faces or parts of faces "display" emotions or other mental states, as well as for psychology's continued reliance on forced-choice methods. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
22
|
Fields C, Glazebrook JF. A mosaic of Chu spaces and Channel Theory II: applications to object identification and mereological complexity. J EXP THEOR ARTIF IN 2018. [DOI: 10.1080/0952813x.2018.1544285] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
| | - James F. Glazebrook
- Department of Mathematics and Computer Science, Eastern Illinois University, Charleston, IL, USA
- Adjunct Faculty, Department of Mathematics, University of Illinois at Urbana–Champaign, Urbana, IL, USA
| |
Collapse
|
23
|
Abstract
Multiple sciences have converged, in the past two decades, on a hitherto mostly unremarked question: what is observation? Here, I examine this evolution, focusing on three sciences: physics, especially quantum information theory, developmental biology, especially its molecular and “evo-devo” branches, and cognitive science, especially perceptual psychology and robotics. I trace the history of this question to the late 19th century, and through the conceptual revolutions of the 20th century. I show how the increasing interdisciplinary focus on the process of extracting information from an environment provides an opportunity for conceptual unification, and sketch an outline of what such a unification might look like.
Collapse
|
24
|
|
25
|
A No-Reference Image Quality Measure for Blurred and Compressed Images Using Sparsity Features. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9562-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
26
|
Abe Y, Fujita K, Kashimori Y. Visual and Category Representations Shaped by the Interaction Between Inferior Temporal and Prefrontal Cortices. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9570-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
27
|
End-to-End ConvNet for Tactile Recognition Using Residual Orthogonal Tiling and Pyramid Convolution Ensemble. Cognit Comput 2018. [DOI: 10.1007/s12559-018-9568-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
28
|
Denham SL, Winkler I. Predictive coding in auditory perception: challenges and unresolved questions. Eur J Neurosci 2018; 51:1151-1160. [PMID: 29250827 DOI: 10.1111/ejn.13802] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/03/2017] [Accepted: 11/20/2017] [Indexed: 11/30/2022]
Abstract
Predictive coding is arguably the currently dominant theoretical framework for the study of perception. It has been employed to explain important auditory perceptual phenomena, and it has inspired theoretical, experimental and computational modelling efforts aimed at describing how the auditory system parses the complex sound input into meaningful units (auditory scene analysis). These efforts have uncovered some vital questions, addressing which could help to further specify predictive coding and clarify some of its basic assumptions. The goal of the current review is to motivate these questions and show how unresolved issues in explaining some auditory phenomena lead to general questions of the theoretical framework. We focus on experimental and computational modelling issues related to sequential grouping in auditory scene analysis (auditory pattern detection and bistable perception), as we believe that this is the research topic where predictive coding has the highest potential for advancing our understanding. In addition to specific questions, our analysis led us to identify three more general questions that require further clarification: (1) What exactly is meant by prediction in predictive coding? (2) What governs which generative models make the predictions? and (3) What (if it exists) is the correlate of perceptual experience within the predictive coding framework?
Collapse
Affiliation(s)
- Susan L Denham
- School of Psychology, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, UK
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
29
|
|
30
|
Ren P, Sun W, Luo C, Hussain A. Clustering-Oriented Multiple Convolutional Neural Networks for Single Image Super-Resolution. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9512-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
31
|
Reducing and Stretching Deep Convolutional Activation Features for Accurate Image Classification. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9515-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|