1
|
Grossberg S. How children learn to understand language meanings: a neural model of adult-child multimodal interactions in real-time. Front Psychol 2023; 14:1216479. [PMID: 37599779 PMCID: PMC10435915 DOI: 10.3389/fpsyg.2023.1216479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 06/28/2023] [Indexed: 08/22/2023] Open
Abstract
This article describes a biological neural network model that can be used to explain how children learn to understand language meanings about the perceptual and affective events that they consciously experience. This kind of learning often occurs when a child interacts with an adult teacher to learn language meanings about events that they experience together. Multiple types of self-organizing brain processes are involved in learning language meanings, including processes that control conscious visual perception, joint attention, object learning and conscious recognition, cognitive working memory, cognitive planning, emotion, cognitive-emotional interactions, volition, and goal-oriented actions. The article shows how all of these brain processes interact to enable the learning of language meanings to occur. The article also contrasts these human capabilities with AI models such as ChatGPT. The current model is called the ChatSOME model, where SOME abbreviates Self-Organizing MEaning.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Boston University, Boston, MA, United States
| |
Collapse
|
2
|
Schendan HE. Memory influences visual cognition across multiple functional states of interactive cortical dynamics. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.07.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
3
|
Grossberg S. How Does the Cerebral Cortex Work? Development, Learning, Attention, and 3-D Vision by Laminar Circuits of Visual Cortex. ACTA ACUST UNITED AC 2016; 2:47-76. [PMID: 17715598 DOI: 10.1177/1534582303002001003] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A key goal of behavioral and cognitive neuroscience is to link brain mechanisms to behavioral functions. The present article describes recent progress toward explaining how the visual cortex sees. Visual cortex, like many parts of perceptual and cognitive neocortex, is organized into six main layers of cells, as well as characteristic sublamina. Here it is proposed how these layered circuits help to realize processes of development, learning, perceptual grouping, attention, and 3-D vision through a combination of bottom-up, horizontal, and top-down interactions. A main theme is that the mechanisms which enable development and learning to occur in a stable way imply properties of adult behavior. These results thus begin to unify three fields: infant cortical development, adult cortical neurophysiology and anatomy, and adult visual perception. The identified cortical mechanisms promise to generalize to explain how other perceptual and cognitive processes work.
Collapse
|
4
|
Chang HC, Grossberg S, Cao Y. Where's Waldo? How perceptual, cognitive, and emotional brain processes cooperate during learning to categorize and find desired objects in a cluttered scene. Front Integr Neurosci 2014; 8:43. [PMID: 24987339 PMCID: PMC4060746 DOI: 10.3389/fnint.2014.00043] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 05/02/2014] [Indexed: 11/13/2022] Open
Abstract
The Where's Waldo problem concerns how individuals can rapidly learn to search a scene to detect, attend, recognize, and look at a valued target object in it. This article develops the ARTSCAN Search neural model to clarify how brain mechanisms across the What and Where cortical streams are coordinated to solve the Where's Waldo problem. The What stream learns positionally-invariant object representations, whereas the Where stream controls positionally-selective spatial and action representations. The model overcomes deficiencies of these computationally complementary properties through What and Where stream interactions. Where stream processes of spatial attention and predictive eye movement control modulate What stream processes whereby multiple view- and positionally-specific object categories are learned and associatively linked to view- and positionally-invariant object categories through bottom-up and attentive top-down interactions. Gain fields control the coordinate transformations that enable spatial attention and predictive eye movements to carry out this role. What stream cognitive-emotional learning processes enable the focusing of motivated attention upon the invariant object categories of desired objects. What stream cognitive names or motivational drives can prime a view- and positionally-invariant object category of a desired target object. A volitional signal can convert these primes into top-down activations that can, in turn, prime What stream view- and positionally-specific categories. When it also receives bottom-up activation from a target, such a positionally-specific category can cause an attentional shift in the Where stream to the positional representation of the target, and an eye movement can then be elicited to foveate it. These processes describe interactions among brain regions that include visual cortex, parietal cortex, inferotemporal cortex, prefrontal cortex (PFC), amygdala, basal ganglia (BG), and superior colliculus (SC).
Collapse
Affiliation(s)
- Hung-Cheng Chang
- Graduate Program in Cognitive and Neural Systems, Department of Mathematics, Center for Adaptive Systems, Center for Computational Neuroscience and Neural Technology, Boston University Boston, MA, USA
| | - Stephen Grossberg
- Graduate Program in Cognitive and Neural Systems, Department of Mathematics, Center for Adaptive Systems, Center for Computational Neuroscience and Neural Technology, Boston University Boston, MA, USA
| | - Yongqiang Cao
- Graduate Program in Cognitive and Neural Systems, Department of Mathematics, Center for Adaptive Systems, Center for Computational Neuroscience and Neural Technology, Boston University Boston, MA, USA
| |
Collapse
|
5
|
Lessmann M, Würtz RP. Learning invariant object recognition from temporal correlation in a hierarchical network. Neural Netw 2014; 54:70-84. [PMID: 24657573 DOI: 10.1016/j.neunet.2014.02.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 02/21/2014] [Accepted: 02/23/2014] [Indexed: 11/29/2022]
Abstract
Invariant object recognition, which means the recognition of object categories independent of conditions like viewing angle, scale and illumination, is a task of great interest that humans can fulfill much better than artificial systems. During the last years several basic principles were derived from neurophysiological observations and careful consideration: (1) Developing invariance to possible transformations of the object by learning temporal sequences of visual features that occur during the respective alterations. (2) Learning in a hierarchical structure, so basic level (visual) knowledge can be reused for different kinds of objects. (3) Using feedback to compare predicted input with the current one for choosing an interpretation in the case of ambiguous signals. In this paper we propose a network which implements all of these concepts in a computationally efficient manner which gives very good results on standard object datasets. By dynamically switching off weakly active neurons and pruning weights computation is sped up and thus handling of large databases with several thousands of images and a number of categories in a similar order becomes possible. The involved parameters allow flexible adaptation to the information content of training data and allow tuning to different databases relatively easily. Precondition for successful learning is that training images are presented in an order assuring that images of the same object under similar viewing conditions follow each other. Through an implementation with sparse data structures the system has moderate memory demands and still yields very good recognition rates.
Collapse
Affiliation(s)
- Markus Lessmann
- Institute for Neural Computation, Ruhr-University Bochum, Germany.
| | - Rolf P Würtz
- Institute for Neural Computation, Ruhr-University Bochum, Germany.
| |
Collapse
|
6
|
Wyatte D, Curran T, O'Reilly R. The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded. J Cogn Neurosci 2012; 24:2248-61. [DOI: 10.1162/jocn_a_00282] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Everyday vision requires robustness to a myriad of environmental factors that degrade stimuli. Foreground clutter can occlude objects of interest, and complex lighting and shadows can decrease the contrast of items. How does the brain recognize visual objects despite these low-quality inputs? On the basis of predictions from a model of object recognition that contains excitatory feedback, we hypothesized that recurrent processing would promote robust recognition when objects were degraded by strengthening bottom–up signals that were weakened because of occlusion and contrast reduction. To test this hypothesis, we used backward masking to interrupt the processing of partially occluded and contrast reduced images during a categorization experiment. As predicted by the model, we found significant interactions between the mask and occlusion and the mask and contrast, such that the recognition of heavily degraded stimuli was differentially impaired by masking. The model provided a close fit of these results in an isomorphic version of the experiment with identical stimuli. The model also provided an intuitive explanation of the interactions between the mask and degradations, indicating that masking interfered specifically with the extensive recurrent processing necessary to amplify and resolve highly degraded inputs, whereas less degraded inputs did not require much amplification and could be rapidly resolved, making them less susceptible to masking. Together, the results of the experiment and the accompanying model simulations illustrate the limits of feedforward vision and suggest that object recognition is better characterized as a highly interactive, dynamic process that depends on the coordination of multiple brain areas.
Collapse
|
7
|
Wang L. Multi-associative neural networks and their applications to learning and retrieving complex spatio-temporal sequences. ACTA ACUST UNITED AC 2012; 29:73-82. [PMID: 18252281 DOI: 10.1109/3477.740167] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Based on the previous work of a number of authors, we discuss an important class of neural networks which we call multi-associative neural networks (MANNs) and which associate one pattern with multiple patterns. As a computationally efficient example of such networks, we describe a specific MANN, that is, a multi-associative, dynamically generated variant of the counterpropagation network (MCPN). As an application of MANNs, we design a general system that can learn and retrieve complex spatio-temporal sequences with any MANN. This system consists of comparator units, a parallel array of MANNs, and delayed feedback lines from the output of the system to the neural network layer. During learning, pairs of sequences of spatial patterns are presented to the system and the system learns-to associate patterns at successive times in sequence. During retrieving, a cue sequence, which may be obscured by spatial noise and temporal gaps, causes the system to output the stored spatio-temporal sequence. We prove analytically that this system is capable of learning and generating any spatio-temporal sequences within the maximum complexity determined by the number of embedded MANNs, with the maximum length and number of sequences determined by the memory capacity of the embedded MANNs. To demonstrate the applicability of this general system, we present an implementation using the MCPN. The system shows desirable properties such as fast and accurate learning and retrieving, and ability to store a large number of complex sequences consisting of nonorthogonal spatial patterns.
Collapse
Affiliation(s)
- L Wang
- Sch. of Comput. & Math., Deakin Univ., Geelong, Vic
| |
Collapse
|
8
|
Alexander DM, Trengove C, Sheridan PE, van Leeuwen C. Generalization of learning by synchronous waves: from perceptual organization to invariant organization. Cogn Neurodyn 2011; 5:113-32. [PMID: 22654985 PMCID: PMC3100473 DOI: 10.1007/s11571-010-9142-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Revised: 11/09/2010] [Accepted: 11/09/2010] [Indexed: 10/18/2022] Open
Abstract
From a few presentations of an object, perceptual systems are able to extract invariant properties such that novel presentations are immediately recognized. This may be enabled by inferring the set of all representations equivalent under certain transformations. We implemented this principle in a neurodynamic model that stores activity patterns representing transformed versions of the same object in a distributed fashion within maps, such that translation across the map corresponds to the relevant transformation. When a pattern on the map is activated, this causes activity to spread out as a wave across the map, activating all the transformed versions represented. Computational studies illustrate the efficacy of the proposed mechanism. The model rapidly learns and successfully recognizes rotated and scaled versions of a visual representation from a few prior presentations. For topographical maps such as primary visual cortex, the mechanism simultaneously represents identity and variation of visual percepts whose features change through time.
Collapse
Affiliation(s)
- David M. Alexander
- Laboratory for Perceptual Dynamics, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan
| | - Chris Trengove
- Brain and Neural Systems Team, RIKEN Computational Science Research Program, Saitama, Japan
- Laboratory for Computational Neurophysics, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan
| | - Phillip E. Sheridan
- School of Information and Communication Technology, Griffith University, Meadowbrook, QLD Australia
| | - Cees van Leeuwen
- Laboratory for Perceptual Dynamics, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan
| |
Collapse
|
9
|
Grossberg S, Markowitz J, Cao Y. On the road to invariant recognition: explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning. Neural Netw 2011; 24:1036-49. [PMID: 21665428 DOI: 10.1016/j.neunet.2011.04.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/30/2011] [Accepted: 04/05/2011] [Indexed: 11/30/2022]
Abstract
Visual object recognition is an essential accomplishment of advanced brains. Object recognition needs to be tolerant, or invariant, with respect to changes in object position, size, and view. In monkeys and humans, a key area for recognition is the anterior inferotemporal cortex (ITa). Recent neurophysiological data show that ITa cells with high object selectivity often have low position tolerance. We propose a neural model whose cells learn to simulate this tradeoff, as well as ITa responses to image morphs, while explaining how invariant recognition properties may arise in stages due to processes across multiple cortical areas. These processes include the cortical magnification factor, multiple receptive field sizes, and top-down attentive matching and learning properties that may be tuned by task requirements to attend to either concrete or abstract visual features with different levels of vigilance. The model predicts that data from the tradeoff and image morph tasks emerge from different levels of vigilance in the animals performing them. This result illustrates how different vigilance requirements of a task may change the course of category learning, notably the critical features that are attended and incorporated into learned category prototypes. The model outlines a path for developing an animal model of how defective vigilance control can lead to symptoms of various mental disorders, such as autism and amnesia.
Collapse
Affiliation(s)
- Stephen Grossberg
- Department of Cognitive and Neural Systems, Center of Excellence for Learning in Education, Science and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
10
|
Cao Y, Grossberg S, Markowitz J. How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? Neural Netw 2011; 24:1050-61. [PMID: 21596523 DOI: 10.1016/j.neunet.2011.04.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Revised: 04/10/2011] [Accepted: 04/12/2011] [Indexed: 11/18/2022]
Abstract
All primates depend for their survival on being able to rapidly learn about and recognize objects. Objects may be visually detected at multiple positions, sizes, and viewpoints. How does the brain rapidly learn and recognize objects while scanning a scene with eye movements, without causing a combinatorial explosion in the number of cells that are needed? How does the brain avoid the problem of erroneously classifying parts of different objects together at the same or different positions in a visual scene? In monkeys and humans, a key area for such invariant object category learning and recognition is the inferotemporal cortex (IT). A neural model is proposed to explain how spatial and object attention coordinate the ability of IT to learn invariant category representations of objects that are seen at multiple positions, sizes, and viewpoints. The model clarifies how interactions within a hierarchy of processing stages in the visual brain accomplish this. These stages include the retina, lateral geniculate nucleus, and cortical areas V1, V2, V4, and IT in the brain's What cortical stream, as they interact with spatial attention processes within the parietal cortex of the Where cortical stream. The model builds upon the ARTSCAN model, which proposed how view-invariant object representations are generated. The positional ARTSCAN (pARTSCAN) model proposes how the following additional processes in the What cortical processing stream also enable position-invariant object representations to be learned: IT cells with persistent activity, and a combination of normalizing object category competition and a view-to-object learning law which together ensure that unambiguous views have a larger effect on object recognition than ambiguous views. The model explains how such invariant learning can be fooled when monkeys, or other primates, are presented with an object that is swapped with another object during eye movements to foveate the original object. The swapping procedure is predicted to prevent the reset of spatial attention, which would otherwise keep the representations of multiple objects from being combined by learning. Li and DiCarlo (2008) have presented neurophysiological data from monkeys showing how unsupervised natural experience in a target swapping experiment can rapidly alter object representations in IT. The model quantitatively simulates the swapping data by showing how the swapping procedure fools the spatial attention mechanism. More generally, the model provides a unifying framework, and testable predictions in both monkeys and humans, for understanding object learning data using neurophysiological methods in monkeys, and spatial attention, episodic learning, and memory retrieval data using functional imaging methods in humans.
Collapse
Affiliation(s)
- Yongqiang Cao
- Center for Adaptive Systems, Department of Cognitive and Neural Systems, Center of Excellence for Learning in Education, Science, and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
11
|
View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cogn Psychol 2009; 58:1-48. [DOI: 10.1016/j.cogpsych.2008.05.001] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Accepted: 05/06/2008] [Indexed: 11/22/2022]
|
12
|
Kietzmann TC, Lange S, Riedmiller M. Computational object recognition: a biologically motivated approach. BIOLOGICAL CYBERNETICS 2009; 100:59-79. [PMID: 19089445 DOI: 10.1007/s00422-008-0281-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Accepted: 11/14/2008] [Indexed: 05/27/2023]
Abstract
We propose a conceptual framework for artificial object recognition systems based on findings from neurophysiological and neuropsychological research on the visual system in primate cortex. We identify some essential questions, which have to be addressed in the course of designing object recognition systems. As answers, we review some major aspects of biological object recognition, which are then translated into the technical field of computer vision. The key suggestions are the use of incremental and view-based approaches together with the ability of online feature selection and the interconnection of object-views to form an overall object representation. The effectiveness of the computational approach is estimated by testing a possible realization in various tasks and conditions explicitly designed to allow for a direct comparison with the biological counterpart. The results exhibit excellent performance with regard to recognition accuracy, the creation of sparse models and the selection of appropriate features.
Collapse
Affiliation(s)
- Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
| | | | | |
Collapse
|
13
|
Ames H, Grossberg S. Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3918-3936. [PMID: 19206817 DOI: 10.1121/1.2997478] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Collapse
Affiliation(s)
- Heather Ames
- Department of Cognitive and Neural Systems, Center for Adaptive Systems, and Center of Excellence for Learning In Education, Science, and Technology, Boston University, Boston, Massachusetts 02215, USA
| | | |
Collapse
|
14
|
Bhatt R, Carpenter GA, Grossberg S. Texture segregation by visual cortex: Perceptual grouping, attention, and learning. Vision Res 2007; 47:3173-211. [PMID: 17904187 DOI: 10.1016/j.visres.2007.07.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2006] [Revised: 06/19/2007] [Accepted: 07/10/2007] [Indexed: 10/22/2022]
Abstract
A neural model called dARTEX is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model unifies five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits the Ben-Shahar and Zucker [Ben-Shahar, O. & Zucker, S. (2004). Sensitivity to curvatures in orientation-based texture segmentation. Vision Research, 44, 257-277] human psychophysical data on orientation-based textures. Surface-based attentional shrouds improve texture learning and classification: Brodatz texture classification rate varies from 95.1% to 98.6% with correct attention, and from 74.1% to 75.5% without attention. Object boundary output of the model in response to photographic images is compared to computer vision algorithms and human segmentations.
Collapse
Affiliation(s)
- Rushi Bhatt
- Department of Cognitive and Neural Systems, Center for Adaptive Systems and Center of Excellence for Learning in Education, Science, and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
15
|
Gnadt W, Grossberg S. SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal. Neural Netw 2007; 21:699-758. [PMID: 17996419 DOI: 10.1016/j.neunet.2007.09.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2007] [Revised: 09/26/2007] [Accepted: 09/26/2007] [Indexed: 11/18/2022]
Abstract
How do reactive and planned behaviors interact in real time? How are sequences of such behaviors released at appropriate times during autonomous navigation to realize valued goals? Controllers for both animals and mobile robots, or animats, need reactive mechanisms for exploration, and learned plans to reach goal objects once an environment becomes familiar. The SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation) animat model embodies these capabilities, and is tested in a 3D virtual reality environment. SOVEREIGN includes several interacting subsystems which model complementary properties of cortical What and Where processing streams and which clarify similarities between mechanisms for navigation and arm movement control. As the animat explores an environment, visual inputs are processed by networks that are sensitive to visual form and motion in the What and Where streams, respectively. Position-invariant and size-invariant recognition categories are learned by real-time incremental learning in the What stream. Estimates of target position relative to the animat are computed in the Where stream, and can activate approach movements toward the target. Motion cues from animat locomotion can elicit head-orienting movements to bring a new target into view. Approach and orienting movements are alternately performed during animat navigation. Cumulative estimates of each movement are derived from interacting proprioceptive and visual cues. Movement sequences are stored within a motor working memory. Sequences of visual categories are stored in a sensory working memory. These working memories trigger learning of sensory and motor sequence categories, or plans, which together control planned movements. Predictively effective chunk combinations are selectively enhanced via reinforcement learning when the animat is rewarded. Selected planning chunks effect a gradual transition from variable reactive exploratory movements to efficient goal-oriented planned movement sequences. Volitional signals gate interactions between model subsystems and the release of overt behaviors. The model can control different motor sequences under different motivational states and learns more efficient sequences to rewarded goals as exploration proceeds.
Collapse
Affiliation(s)
- William Gnadt
- Department of Cognitive and Neural Systems, Center for Adaptive Systems, Center of Excellence for Learning in Education, Science and Technology, Boston University, Boston, MA 02215, United States
| | | |
Collapse
|
16
|
ANN Hybrid Ensemble Learning Strategy in 3D Object Recognition and Pose Estimation Based on Similarity. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/11538059_68] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
17
|
Affiliation(s)
- Stephen Grossberg
- Department of Cognitive and Neural Systems, Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA.
| |
Collapse
|
18
|
|
19
|
Körding KP, König P. Neurons with two sites of synaptic integration learn invariant representations. Neural Comput 2001; 13:2823-49. [PMID: 11705412 DOI: 10.1162/089976601317098547] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Neurons in mammalian cerebral cortex combine specific responses with respect to some stimulus features with invariant responses to other stimulus features. For example, in primary visual cortex, complex cells code for orientation of a contour but ignore its position to a certain degree. In higher areas, such as the inferotemporal cortex, translation-invariant, rotation-invariant, and even view point-invariant responses can be observed. Such properties are of obvious interest to artificial systems performing tasks like pattern recognition. It remains to be resolved how such response properties develop in biological systems. Here we present an unsupervised learning rule that addresses this problem. It is based on a neuron model with two sites of synaptic integration, allowing qualitatively different effects of input to basal and apical dendritic trees, respectively. Without supervision, the system learns to extract invariance properties using temporal or spatial continuity of stimuli. Furthermore, top-down information can be smoothly integrated in the same framework. Thus, this model lends a physiological implementation to approaches of unsupervised learning of invariant-response properties.
Collapse
Affiliation(s)
- K P Körding
- Institute of Neuroinformatics, ETH/University Zürich, 8057 Zürich, Switzerland.
| | | |
Collapse
|
20
|
Grossberg S. Linking the laminar circuits of visual cortex to visual perception: development, grouping, and attention. Neurosci Biobehav Rev 2001; 25:513-26. [PMID: 11595271 DOI: 10.1016/s0149-7634(01)00030-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
How do the laminar circuits of visual cortical areas V1 and V2 implement context-sensitive binding processes such as perceptual grouping and attention, and how do these circuits develop and learn in a stable way? Recent neural models clarify how preattentive and attentive perceptual mechanisms are intimately linked within the laminar circuits of visual cortex, notably how bottom-up, top-down, and horizontal cortical connections interact within the cortical layers. These laminar circuits allow the responses of visual cortical neurons to be influenced, not only by the stimuli within their classical receptive fields, but also by stimuli in the extra-classical surround. Such context-sensitive visual processing can greatly enhance the analysis of visual scenes, especially those containing targets that are low contrast, partially occluded, or crowded by distractors. Attentional enhancement can selectively propagate along groupings of both real and illusory contours, thereby showing how attention can selectively enhance object representations. Recent models explain how attention may have a stronger facilitatory effect on low contrast than on high contrast stimuli, and how pop-out from orientation contrast may occur. The specific functional roles which the model proposes for the cortical layers allow several testable neurophysiological predictions to be made. Model mechanisms clarify how intracortical and intercortical feedback help to stabilize cortical development and learning. Although feedback plays a key role, fast feedforward processing is possible in response to unambiguous information. Model circuits are capable of synchronizing quickly, but context-sensitive persistence of previous events can influence how synchrony develops.
Collapse
Affiliation(s)
- S Grossberg
- Department of Cognitive and Neural Systems and Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA.
| |
Collapse
|
21
|
Granger E, Rubin MA, Grossberg S, Lavoie P. A what-and-where fusion neural network for recognition and tracking of multiple radar emitters. Neural Netw 2001; 14:325-44. [PMID: 11341569 DOI: 10.1016/s0893-6080(01)00019-3] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
A neural network recognition and tracking system is proposed for classification of radar pulses in autonomous Electronic Support Measure systems. Radar type information is considered with position-specific information from active emitters in a scene. Type-specific parameters of the input pulse stream are fed to a neural network classifier trained on samples of data collected in the field. Meanwhile, a clustering algorithm is used to separate pulses from different emitters according to position-specific parameters of the input pulse stream. Classifier responses corresponding to different emitters are separated into tracks, or trajectories, one per active emitter, allowing for more accurate identification of radar types based on multiple views of emitter data along each emitter trajectory. Such a What-and-Where fusion strategy is motivated by a similar subdivision of labor in the brain. The fuzzy ARTMAP neural network is used to classify streams of pulses according to radar type using their functional parameters. Simulation results obtained with a radar pulse data set indicate that fuzzy ARTMAP compares favorably to several other approaches when performance is measured in terms of accuracy and computational complexity. Incorporation into fuzzy ARTMAP of negative match tracking (from ARTMAP-IC) facilitated convergence during training with this data set. Other modifications improved classification of data that include missing input pattern components and missing training classes. Fuzzy ARTMAP was combined with a bank of Kalman filters to group pulses transmitted from different emitters based on their position-specific parameters, and with a module to accumulate evidence from fuzzy ARTMAP responses corresponding to the track defined for each emitter. Simulation results demonstrate that the system provides a high level of performance on complex, incomplete and overlapping radar data.
Collapse
Affiliation(s)
- E Granger
- Defence Research Establishment Ottawa, Department of National Defence, Ontario, Canada
| | | | | | | |
Collapse
|
22
|
Abbasi S, Mokhtarian F. Affine-similar shape retrieval: application to multiview 3-D object recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2001; 10:131-139. [PMID: 18249603 DOI: 10.1109/83.892449] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The feasibility of representing a three-dimensional (3-D) object with a small number of standard views is studied. The object boundary of each view is considered as a two dimensional (2-D) shape and is represented by the locations of the maxima of its curvature scale space (CSS) image contours. The idea is to identify an unknown object from an image taken from a random view by using the stored descriptions of the standard views. The CSS image has been selected for MPEG-7 standardization. The maxima of CSS image have already been used to represent 2-D shapes in different applications under similarity transforms. Since the new application involves affine transforms, we first examine the effects of general affine transforms on the representation and show that the locations of the maxima of the CSS image do not move dramatically even under large affine transformations. Our system for shape-based retrieval from large image databases is then applied to multiview 3-D object representation and recognition. Our collection of 3-D objects consists of 18 aircraft of different shapes. Three silhouette contours corresponding to random views are separately used as input for each object. Results indicate that robust and efficient 3-D free-form object recognition through multiview representation can be achieved using the CSS representation.
Collapse
Affiliation(s)
- S Abbasi
- Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, Surrey, UK
| | | |
Collapse
|
23
|
Delorme A, Richard G, Fabre-Thorpe M. Ultra-rapid categorisation of natural scenes does not rely on colour cues: a study in monkeys and humans. Vision Res 2000; 40:2187-200. [PMID: 10878280 DOI: 10.1016/s0042-6989(00)00083-3] [Citation(s) in RCA: 117] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
In a rapid categorisation task, monkeys and humans had to detect a target (animal or food) in briefly flashed (32 ms) and previously unseen natural images. Removing colour cues had very little effect on average performance. Impairments were restricted to a mild accuracy drop (in some human subjects) and a small reaction time mean increase (10-15 ms) observed both in monkeys and humans but only in the detection of food targets. In both tasks, accuracy and latency of the fastest behavioural responses were unaffected, suggesting that such ultra-rapid categorizations could depend on feed-forward processing of early coarse achromatic magnocellular information.
Collapse
Affiliation(s)
- A Delorme
- Centre de Recherche Cerveau et Cognition (UMR 5549), Faculté de Médecine de Rangueil, 133, route de Narbonne, 31062, Toulouse, France
| | | | | |
Collapse
|
24
|
|
25
|
Grossberg S, Williamson JR. A self-organizing neural system for learning to recognize textured scenes. Vision Res 1999; 39:1385-406. [PMID: 10343850 DOI: 10.1016/s0042-6989(98)00250-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A self-organizing ARTEX model is developed to categorize and classify textured image regions. ARTEX specializes the FACADE model of how the visual cortex sees, and the ART model of how temporal and prefrontal cortices interact with the hippocampal system to learn visual recognition categories and their names. FACADE processing generates a vector of boundary and surface properties, notably texture and brightness properties, by utilizing multi-scale filtering, competition, and diffusive filling-in. Its context-sensitive local measures of textured scenes can be used to recognize scenic properties that gradually change across space, as well as abrupt texture boundaries. ART incrementally learns recognition categories that classify FACADE output vectors, class names of these categories, and their probabilities. Top-down expectations within ART encode learned prototypes that pay attention to expected visual features. When novel visual information creates a poor match with the best existing category prototype, a memory search selects a new category with which classify the novel data. ARTEX is compared with psychophysical data, and is bench marked on classification of natural textures and synthetic aperture radar images. It outperforms state-of-the-art systems that use rule-based, backpropagation, and K-nearest neighbor classifiers.
Collapse
Affiliation(s)
- S Grossberg
- Department of Cognitive and Neural Systems, Boston University, MA 02215, USA.
| | | |
Collapse
|
26
|
Abstract
The processes whereby our brains continue to learn about a changing world in a stable fashion throughout life are proposed to lead to conscious experiences. These processes include the learning of top-down expectations, the matching of these expectations against bottom-up data, the focusing of attention upon the expected clusters of information, and the development of resonant states between bottom-up and top-down processes as they reach an attentive consensus between what is expected and what is there in the outside world. It is suggested that all conscious states in the brain are resonant states and that these resonant states trigger learning of sensory and cognitive representations. The models which summarize these concepts are therefore called Adaptive Resonance Theory, or ART, models. Psychophysical and neurobiological data in support of ART are presented from early vision, visual object recognition, auditory streaming, variable-rate speech perception, somatosensory perception, and cognitive-emotional interactions, among others. It is noted that ART mechanisms seem to be operative at all levels of the visual system, and it is proposed how these mechanisms are realized by known laminar circuits of visual cortex. It is predicted that the same circuit realization of ART mechanisms will be found in the laminar circuits of all sensory and cognitive neocortex. Concepts and data are summarized concerning how some visual percepts may be visibly, or modally, perceived, whereas amodal percepts may be consciously recognized even though they are perceptually invisible. It is also suggested that sensory and cognitive processing in the What processing stream of the brain obey top-down matching and learning laws that are often complementary to those used for spatial and motor processing in the brain's Where processing stream. This enables our sensory and cognitive representations to maintain their stability as we learn more about the world, while allowing spatial and motor representations to forget learned maps and gains that are no longer appropriate as our bodies develop and grow from infanthood to adulthood. Procedural memories are proposed to be unconscious because the inhibitory matching process that supports these spatial and motor processes cannot lead to resonance.
Collapse
Affiliation(s)
- S Grossberg
- Department of Cognitive and Neural Systems, Boston University, MA 02215, USA
| |
Collapse
|