1
|
McMahon E, Isik L. Seeing social interactions. Trends Cogn Sci 2023; 27:1165-1179. [PMID: 37805385 PMCID: PMC10841760 DOI: 10.1016/j.tics.2023.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/01/2023] [Accepted: 09/05/2023] [Indexed: 10/09/2023]
Abstract
Seeing the interactions between other people is a critical part of our everyday visual experience, but recognizing the social interactions of others is often considered outside the scope of vision and grouped with higher-level social cognition like theory of mind. Recent work, however, has revealed that recognition of social interactions is efficient and automatic, is well modeled by bottom-up computational algorithms, and occurs in visually-selective regions of the brain. We review recent evidence from these three methodologies (behavioral, computational, and neural) that converge to suggest the core of social interaction perception is visual. We propose a computational framework for how this process is carried out in the brain and offer directions for future interdisciplinary investigations of social perception.
Collapse
Affiliation(s)
- Emalie McMahon
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
| | - Leyla Isik
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
2
|
Malik M, Isik L. Relational visual representations underlie human social interaction recognition. Nat Commun 2023; 14:7317. [PMID: 37951960 PMCID: PMC10640586 DOI: 10.1038/s41467-023-43156-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 11/02/2023] [Indexed: 11/14/2023] Open
Abstract
Humans effortlessly recognize social interactions from visual input. Attempts to model this ability have typically relied on generative inverse planning models, which make predictions by inverting a generative model of agents' interactions based on their inferred goals, suggesting humans use a similar process of mental inference to recognize interactions. However, growing behavioral and neuroscience evidence suggests that recognizing social interactions is a visual process, separate from complex mental state inference. Yet despite their success in other domains, visual neural network models have been unable to reproduce human-like interaction recognition. We hypothesize that humans rely on relational visual information in particular, and develop a relational, graph neural network model, SocialGNN. Unlike prior models, SocialGNN accurately predicts human interaction judgments across both animated and natural videos. These results suggest that humans can make complex social interaction judgments without an explicit model of the social and physical world, and that structured, relational visual representations are key to this behavior.
Collapse
Affiliation(s)
- Manasi Malik
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Leyla Isik
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
| |
Collapse
|
3
|
Leshinskaya A, Bajaj M, Thompson-Schill SL. Novel objects with causal event schemas elicit selective responses in tool- and hand-selective lateral occipitotemporal cortex. Cereb Cortex 2023; 33:5557-5573. [PMID: 36469589 PMCID: PMC10152094 DOI: 10.1093/cercor/bhac442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 10/10/2022] [Accepted: 10/11/2022] [Indexed: 12/11/2022] Open
Abstract
Tool-selective lateral occipitotemporal cortex (LOTC) responds preferentially to images of tools (hammers, brushes) relative to non-tool objects (clocks, shoes). What drives these responses? Unlike other objects, tools exert effects on their surroundings. We tested whether LOTC responses are influenced by event schemas that denote different temporal relations. Participants learned about novel objects embedded in different event sequences. Causer objects moved prior to the appearance of an environmental event (e.g. stars), while Reactor objects moved after an event. Visual features and motor association were controlled. During functional magnetic resonance imaging, participants viewed still images of the objects. We localized tool-selective LOTC and non-tool-selective parahippocampal cortex (PHC) by contrasting neural responses to images of familiar tools and non-tools. We found that LOTC responded more to Causers than Reactors, while PHC did not. We also measured responses to images of hands, which elicit overlapping responses with tools. Across inferior temporal cortex, voxels' tool and hand selectivity positively predicted a preferential response to Causers. We conclude that an event schema typical of tools is sufficient to drive LOTC and that category-preferential responses across the temporal lobe may reflect relational event structures typical of those domains.
Collapse
Affiliation(s)
- Anna Leshinskaya
- Department of Psychology, University of Pennsylvania, 425 S. University Ave, Stephen A Levin Building, Philadelphia, PA 19104, United States
- Center for Neuroscience, University of California, Davis, 1544 Newton Court, Room 209, Davis, CA, United States
| | - Mira Bajaj
- Department of Psychology, University of Pennsylvania, 425 S. University Ave, Stephen A Levin Building, Philadelphia, PA 19104, United States
- The Johns Hopkins University School of Medicine, 733 N Broadway, Baltimore, MD 21205, United States
| | - Sharon L Thompson-Schill
- Department of Psychology, University of Pennsylvania, 425 S. University Ave, Stephen A Levin Building, Philadelphia, PA 19104, United States
| |
Collapse
|
4
|
Mind the gap: challenges of deep learning approaches to Theory of Mind. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10401-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
5
|
Rubio-Fernandez P, Shukla V, Bhatia V, Ben-Ami S, Sinha P. Head turning is an effective cue for gaze following: Evidence from newly sighted individuals, school children and adults. Neuropsychologia 2022; 174:108330. [PMID: 35843461 DOI: 10.1016/j.neuropsychologia.2022.108330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 03/24/2022] [Accepted: 07/08/2022] [Indexed: 10/17/2022]
Abstract
In referential communication, gaze is often interpreted as a social cue that facilitates comprehension and enables word learning. Here we investigated the degree to which head turning facilitates gaze following. We presented participants with static pictures of a man looking at a target object in a first and third block of trials (pre- and post-intervention), while they saw short videos of the same man turning towards the target in the second block of trials (intervention). In Experiment 1, newly sighted individuals (treated for congenital cataracts; N = 8) benefited from the motion cues, both when comparing their initial performance with static gaze cues to their performance with dynamic head turning, and their performance with static cues before and after the videos. In Experiment 2, neurotypical school children (ages 5-10 years; N = 90) and adults (N = 30) also revealed improved performance with motion cues, although most participants had started to follow the static gaze cues before they saw the videos. Our results confirm that head turning is an effective social cue when interpreting new words, offering new insights for a pathways approach to development.
Collapse
Affiliation(s)
| | | | | | - Shlomit Ben-Ami
- Massachusetts Institute of Technology, USA; Tel Aviv University, Israel
| | | |
Collapse
|
6
|
Schultz J, Frith CD. Animacy and the prediction of behaviour. Neurosci Biobehav Rev 2022; 140:104766. [DOI: 10.1016/j.neubiorev.2022.104766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 06/24/2022] [Accepted: 07/01/2022] [Indexed: 10/17/2022]
|
7
|
Lessons from infant learning for unsupervised machine learning. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00488-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Face identity coding in the deep neural network and primate brain. Commun Biol 2022; 5:611. [PMID: 35725902 PMCID: PMC9209415 DOI: 10.1038/s42003-022-03557-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/01/2022] [Indexed: 01/01/2023] Open
Abstract
A central challenge in face perception research is to understand how neurons encode face identities. This challenge has not been met largely due to the lack of simultaneous access to the entire face processing neural network and the lack of a comprehensive multifaceted model capable of characterizing a large number of facial features. Here, we addressed this challenge by conducting in silico experiments using a pre-trained face recognition deep neural network (DNN) with a diverse array of stimuli. We identified a subset of DNN units selective to face identities, and these identity-selective units demonstrated generalized discriminability to novel faces. Visualization and manipulation of the network revealed the importance of identity-selective units in face recognition. Importantly, using our monkey and human single-neuron recordings, we directly compared the response of artificial units with real primate neurons to the same stimuli and found that artificial units shared a similar representation of facial features as primate neurons. We also observed a region-based feature coding mechanism in DNN units as in human neurons. Together, by directly linking between artificial and primate neural systems, our results shed light on how the primate brain performs face recognition tasks.
Collapse
|
9
|
Zohary E, Harari D, Ullman S, Ben-Zion I, Doron R, Attias S, Porat Y, Sklar AY, Mckyton A. Gaze following requires early visual experience. Proc Natl Acad Sci U S A 2022; 119:e2117184119. [PMID: 35549552 PMCID: PMC9171757 DOI: 10.1073/pnas.2117184119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Gaze understanding—a suggested precursor for understanding others’ intentions—requires recovery of gaze direction from the observed person's head and eye position. This challenging computation is naturally acquired at infancy without explicit external guidance, but can it be learned later if vision is extremely poor throughout early childhood? We addressed this question by studying gaze following in Ethiopian patients with early bilateral congenital cataracts diagnosed and treated by us only at late childhood. This sight restoration provided a unique opportunity to directly address basic issues on the roles of “nature” and “nurture” in development, as it caused a selective perturbation to the natural process, eliminating some gaze-direction cues while leaving others still available. Following surgery, the patients’ visual acuity typically improved substantially, allowing discrimination of pupil position in the eye. Yet, the patients failed to show eye gaze-following effects and fixated less than controls on the eyes—two spontaneous behaviors typically seen in controls. Our model for unsupervised learning of gaze direction explains how head-based gaze following can develop under severe image blur, resembling preoperative conditions. It also suggests why, despite acquiring sufficient resolution to extract eye position, automatic eye gaze following is not established after surgery due to lack of detailed early visual experience. We suggest that visual skills acquired in infancy in an unsupervised manner will be difficult or impossible to acquire when internal guidance is no longer available, even when sufficient image resolution for the task is restored. This creates fundamental barriers to spontaneous vision recovery following prolonged deprivation in early age.
Collapse
Affiliation(s)
- Ehud Zohary
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Daniel Harari
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Shimon Ullman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Itay Ben-Zion
- Department of Ophthalmology, Padeh Medical Center, Poriya 15208, Israel
| | - Ravid Doron
- Department of Optometry and Vision Science, Hadassah Academic College, Jerusalem 91010, Israel
| | - Sara Attias
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Yuval Porat
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Asael Y. Sklar
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Ayelet Mckyton
- Neurology Department, Hadassah Medical Organization and Faculty of Medicine, Jerusalem 91120, Israel
| |
Collapse
|
10
|
Lagriffoul F. A Schema-Based Robot Controller Complying With the Constraints of Biological Systems. Front Neurorobot 2022; 16:836767. [PMID: 35615342 PMCID: PMC9124795 DOI: 10.3389/fnbot.2022.836767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 04/05/2022] [Indexed: 11/25/2022] Open
Abstract
This article reports on the early stages of conception of a robotic control system based on Piaget's schemas theory. Beyond some initial experimental results, we question the scientific method used in developmental robotics (DevRob) and argue that it is premature to abstract away the functional architecture of the brain when so little is known about its mechanisms. Instead, we advocate for applying a method similar to the method used in model-based cognitive science, which consists in selecting plausible models using computational and physiological constraints. Previous study on schema-based robotics is analyzed through the critical lens of the proposed method, and a minimal system designed using this method is presented.
Collapse
|
11
|
Kominsky JF, Li Y, Carey S. Infants’ Attributions of Insides and Animacy in Causal Interactions. Cogn Sci 2022; 46:e13087. [DOI: 10.1111/cogs.13087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/07/2021] [Accepted: 12/15/2021] [Indexed: 11/28/2022]
Affiliation(s)
| | - Yiping Li
- Department of Psychology Harvard University
| | | |
Collapse
|
12
|
Abstract
Face-selective neurons are observed in the primate visual pathway and are considered as the basis of face detection in the brain. However, it has been debated as to whether this neuronal selectivity can arise innately or whether it requires training from visual experience. Here, using a hierarchical deep neural network model of the ventral visual stream, we suggest a mechanism in which face-selectivity arises in the complete absence of training. We found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. This innate selectivity also enables the untrained network to perform face-detection tasks. Intriguingly, we observed that units selective to various non-face objects can also arise innately in untrained networks. Our results imply that the random feedforward connections in early, untrained deep neural networks may be sufficient for initializing primitive visual selectivity.
Collapse
|
13
|
Gumbsch C, Adam M, Elsner B, Butz MV. Emergent Goal-Anticipatory Gaze in Infants via Event-Predictive Learning and Inference. Cogn Sci 2021; 45:e13016. [PMID: 34379329 DOI: 10.1111/cogs.13016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 05/17/2021] [Accepted: 06/16/2021] [Indexed: 12/18/2022]
Abstract
From about 7 months of age onward, infants start to reliably fixate the goal of an observed action, such as a grasp, before the action is complete. The available research has identified a variety of factors that influence such goal-anticipatory gaze shifts, including the experience with the shown action events and familiarity with the observed agents. However, the underlying cognitive processes are still heavily debated. We propose that our minds (i) tend to structure sensorimotor dynamics into probabilistic, generative event-predictive, and event boundary predictive models, and, meanwhile, (ii) choose actions with the objective to minimize predicted uncertainty. We implement this proposition by means of event-predictive learning and active inference. The implemented learning mechanism induces an inductive, event-predictive bias, thus developing schematic encodings of experienced events and event boundaries. The implemented active inference principle chooses actions by aiming at minimizing expected future uncertainty. We train our system on multiple object-manipulation events. As a result, the generation of goal-anticipatory gaze shifts emerges while learning about object manipulations: the model starts fixating the inferred goal already at the start of an observed event after having sampled some experience with possible events and when a familiar agent (i.e., a hand) is involved. Meanwhile, the model keeps reactively tracking an unfamiliar agent (i.e., a mechanical claw) that is performing the same movement. We qualitatively compare these modeling results to behavioral data of infants and conclude that event-predictive learning combined with active inference may be critical for eliciting goal-anticipatory gaze behavior in infants.
Collapse
Affiliation(s)
- Christian Gumbsch
- Neuro-Cognitive Modeling Group, Department of Computer Science, University of Tübingen.,Autonomous Learning Group, Max Planck Institute for Intelligent Systems
| | | | | | - Martin V Butz
- Neuro-Cognitive Modeling Group, Department of Computer Science, University of Tübingen
| |
Collapse
|
14
|
Kim G, Jang J, Baek S, Song M, Paik SB. Visual number sense in untrained deep neural networks. SCIENCE ADVANCES 2021; 7:7/1/eabd6127. [PMID: 33523851 PMCID: PMC7775775 DOI: 10.1126/sciadv.abd6127] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/03/2020] [Indexed: 05/25/2023]
Abstract
Number sense, the ability to estimate numerosity, is observed in naïve animals, but how this cognitive function emerges in the brain remains unclear. Here, using an artificial deep neural network that models the ventral visual stream of the brain, we show that number-selective neurons can arise spontaneously, even in the complete absence of learning. We also show that the responses of these neurons can induce the abstract number sense, the ability to discriminate numerosity independent of low-level visual cues. We found number tuning in a randomly initialized network originating from a combination of monotonically decreasing and increasing neuronal activities, which emerges spontaneously from the statistical properties of bottom-up projections. We confirmed that the responses of these number-selective neurons show the single- and multineuron characteristics observed in the brain and enable the network to perform number comparison tasks. These findings provide insight into the origin of innate cognitive functions.
Collapse
Affiliation(s)
- Gwangsu Kim
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Jaeson Jang
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Seungdae Baek
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Min Song
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Se-Bum Paik
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea.
- Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
15
|
|
16
|
|
17
|
Abstract
Eye Movement Desensitization and Reprocessing Therapy (EMDR) is an effective treatment for Post-traumatic Stress Disorder (PTSD). The Adaptive Information Processing Model (AIP) guides the development and practice of EMDR. The AIP postulates inadequately processed memory as the foundation of PTSD pathology. Predictive Processing postulates that the primary function of the brain is prediction that serves to anticipate the next moment of experience in order to resist the dissipative force of entropy thus facilitating continued survival. Memory is the primary substrate of prediction, and is optimized by an ongoing process of precision weighted prediction error minimization that refines prediction by updating the memories on which it is based. The Predictive Processing model of EMDR postulates that EMDR facilitates the predictive processing of traumatic memory by overcoming the bias against exploration and evidence accumulation. The EMDR protocol brings the traumatic memory into an active state of re-experiencing. Defensive responding and/or low sensory precision preclude evidence accumulation to test the predictions of the traumatic memory in the present. Sets of therapist guided eye movements repeatedly challenge the bias against evidence accumulation and compel sensory sampling of the benign present. Eye movements reset the theta rhythm organizing the flow of information through the brain, facilitating the deployment of both overt and covert attention, and the mnemonic search for associations. Sampling of sensation does not support the predictions of the traumatic memory resulting in prediction error that the brain then attempts to minimize. The net result is a restoration of the integrity of the rhythmic deployment of attention, a recalibration of sensory precision, and the updating (reconsolidation) of the traumatic memory. Thus one prediction of the model is a decrease in Attention Bias Variability, a core dysfunction in PTSD, following successful treatment with EMDR.
Collapse
|
18
|
Origins of the concepts cause, cost, and goal in prereaching infants. Proc Natl Acad Sci U S A 2019; 116:17747-17752. [PMID: 31431537 DOI: 10.1073/pnas.1904410116] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We investigated the origins and interrelations of causal knowledge and knowledge of agency in 3-month-old infants, who cannot yet effect changes in the world by reaching for, grasping, and picking up objects. Across 5 experiments, n = 152 prereaching infants viewed object-directed reaches that varied in efficiency (following the shortest physically possible path vs. a longer path), goal (lifting an object vs. causing a change in its state), and causal structure (action on contact vs. action at a distance and after a delay). Prereaching infants showed no strong looking preference between a person's efficient and inefficient reaches when the person grasped and displaced an object. When the person reached for and caused a change in the state of the object on contact, however, infants looked longer when this action was inefficient than when it was efficient. Three-month-old infants also showed a key signature of adults' and older infants' causal inferences: This looking preference was abolished if a short spatial and temporal gap separated the action from its effect. The basic intuition that people are causal agents, who navigate around physical constraints to change the state of the world, may be one important foundation for infants' ability to plan their own actions and learn from the acts of others.
Collapse
|
19
|
Wu R. Learning What to Learn Across the Life Span: From Objects to Real-World Skills. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2019. [DOI: 10.1177/0963721419847994] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
One of the most difficult and important problems that all learners face across the life span is learning what to learn. Understanding what to learn is difficult when both relevant and irrelevant information compete for attention. In these situations, the learner can rely on cues in the environment, as well as prior knowledge. However, these sources of information sometimes conflict, and the learner has to prioritize some sources over others. Determining what to learn is important because learning relevant information helps the learner achieve goals, whereas learning irrelevant information can waste time and energy. A new theoretical approach posits that adaptation is relevant for all age groups because the environment is dynamic, suggesting that learning what to learn is a problem relevant across the life span instead of only during infancy and childhood. In this article, I review new research demonstrating the importance and ways of learning what to learn across the life span, from objects to real-world skills, before highlighting some unresolved issues for future research.
Collapse
Affiliation(s)
- Rachel Wu
- Department of Psychology, University of California, Riverside
| |
Collapse
|
20
|
Leshinskaya A, Thompson-Schill SL. From the structure of experience to concepts of structure: How the concept "cause" is attributed to objects and events. J Exp Psychol Gen 2019; 148:619-643. [PMID: 30973260 PMCID: PMC6461371 DOI: 10.1037/xge0000594] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The pervasive presence of relational information in concepts, and its indirect presence in sensory input, raises the question of how it is extracted from experience. We operationalized experience as a stream of events in which reliable predictive relationships exist among random ones, and in which learners are naïve as to what they will learn (i.e., a statistical learning paradigm). First, we asked whether predictive event pairs would spontaneously be seen as causing each other, given no instructions to evaluate causality. We found that predictive information indeed informed later causal judgments but did not lead to a spontaneous sense of causality. Thus, event contingencies are relevant to causal inference, but such interpretations may not occur fully bottom-up. A second question was how such experience might be used to learn about novel objects. Because events occurred either around or involving a continually present object, we were able to distinguish objects from events. We found that objects can be attributed causal properties by virtue of a higher-order structure, in which the object's identity is linked not to the increased likelihood of its effect, but rather, to the predictive structure among events, given its presence. This is an important demonstration that objects' causal properties can be highly abstract: They need not refer to an occurrence of a sensory event per se, or its link to an object, but rather to whether or not a predictive relationship holds among events in its presence. These learning mechanisms may be important for acquiring abstract knowledge from experience. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
21
|
|
22
|
Ullman S, Dorfman N, Harari D. A model for discovering 'containment' relations. Cognition 2019; 183:67-81. [PMID: 30419508 PMCID: PMC6331663 DOI: 10.1016/j.cognition.2018.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 10/28/2018] [Accepted: 11/02/2018] [Indexed: 11/30/2022]
Abstract
Rapid developments in the fields of learning and object recognition have been obtained by successfully developing and using methods for learning from a large number of labeled image examples. However, such current methods cannot explain infants' learning of new concepts based on their visual experience, in particular, the ability to learn complex concepts without external guidance, as well as the natural order in which related concepts are acquired. A remarkable example of early visual learning is the category of 'containers' and the notion of 'containment'. Surprisingly, this is one of the earliest spatial relations to be learned, starting already around 3 month of age, and preceding other common relations (e.g., 'support', 'in-between'). In this work we present a model, which explains infants' capacity of learning 'containment' and related concepts by 'just looking', together with their empirical development trajectory. Learning occurs in the model fast and without external guidance, relying only on perceptual processes that are present in the first months of life. Instead of labeled training examples, the system provides its own internal supervision to guide the learning process. We show how the detection of so-called 'paradoxical occlusion' provides natural internal supervision, which guides the system to gradually acquire a range of useful containment-related concepts. Similar mechanisms of using implicit internal supervision can have broad application in other cognitive domains as well as artificial intelligent systems, because they alleviate the need for supplying extensive external supervision, and because they can guide the learning process to extract concepts that are meaningful to the observer, even if they are not by themselves obvious, or salient in the input.
Collapse
Affiliation(s)
- Shimon Ullman
- Weizmann Institute of Science, Department of Computer Science and Applied Mathematics, 234 Herzl Street, Rehovot 7610001, Israel
| | - Nimrod Dorfman
- Weizmann Institute of Science, Department of Computer Science and Applied Mathematics, 234 Herzl Street, Rehovot 7610001, Israel
| | - Daniel Harari
- Weizmann Institute of Science, Department of Computer Science and Applied Mathematics, 234 Herzl Street, Rehovot 7610001, Israel.
| |
Collapse
|
23
|
Chen L, Singh S, Kailath T, Roychowdhury V. Brain-inspired automated visual object discovery and detection. Proc Natl Acad Sci U S A 2019; 116:96-105. [PMID: 30559207 PMCID: PMC6320548 DOI: 10.1073/pnas.1802103115] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite significant recent progress, machine vision systems lag considerably behind their biological counterparts in performance, scalability, and robustness. A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under various nonideal circumstances, such as partial occlusion and different view angles. Replication of such capabilities in a machine would require three key ingredients: (i) access to large-scale perceptual data of the kind that humans experience, (ii) flexible representations of objects, and (iii) an efficient unsupervised learning algorithm. The Internet fortunately provides unprecedented access to vast amounts of visual data. This paper leverages the availability of such data to develop a scalable framework for unsupervised learning of object prototypes-brain-inspired flexible, scale, and shift invariant representations of deformable objects (e.g., humans, motorcycles, cars, airplanes) comprised of parts, their different configurations and views, and their spatial relationships. Computationally, the object prototypes are represented as geometric associative networks using probabilistic constructs such as Markov random fields. We apply our framework to various datasets and show that our approach is computationally scalable and can construct accurate and operational part-aware object models much more efficiently than in much of the recent computer vision literature. We also present efficient algorithms for detection and localization in new scenes of objects and their partial views.
Collapse
Affiliation(s)
- Lichao Chen
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA 90095
| | - Sudhir Singh
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA 90095
| | - Thomas Kailath
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305
| | - Vwani Roychowdhury
- Department of Electrical and Computer Engineering, University of California, Los Angeles, CA 90095;
| |
Collapse
|
24
|
Grill-Spector K, Weiner KS, Gomez J, Stigliani A, Natu VS. The functional neuroanatomy of face perception: from brain measurements to deep neural networks. Interface Focus 2018; 8:20180013. [PMID: 29951193 PMCID: PMC6015811 DOI: 10.1098/rsfs.2018.0013] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2018] [Indexed: 12/14/2022] Open
Abstract
A central goal in neuroscience is to understand how processing within the ventral visual stream enables rapid and robust perception and recognition. Recent neuroscientific discoveries have significantly advanced understanding of the function, structure and computations along the ventral visual stream that serve as the infrastructure supporting this behaviour. In parallel, significant advances in computational models, such as hierarchical deep neural networks (DNNs), have brought machine performance to a level that is commensurate with human performance. Here, we propose a new framework using the ventral face network as a model system to illustrate how increasing the neural accuracy of present DNNs may allow researchers to test the computational benefits of the functional architecture of the human brain. Thus, the review (i) considers specific neural implementational features of the ventral face network, (ii) describes similarities and differences between the functional architecture of the brain and DNNs, and (iii) provides a hypothesis for the computational value of implementational features within the brain that may improve DNN performance. Importantly, this new framework promotes the incorporation of neuroscientific findings into DNNs in order to test the computational benefits of fundamental organizational features of the visual system.
Collapse
Affiliation(s)
- Kalanit Grill-Spector
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
- Stanford Neurosciences Institute, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Kevin S. Weiner
- Department of Psychology, University of California Berkeley, Berkeley, CA 94720, USA
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Jesse Gomez
- Stanford Neurosciences Program, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Anthony Stigliani
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Vaidehi S. Natu
- Department of Psychology, School of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
25
|
Abstract
Existing approaches to describe social interactions consider emotional states or use ad-hoc descriptors for microanalysis of interactions. Such descriptors are different in each context thereby limiting comparisons, and can also mix facets of meaning such as emotional states, short term tactics and long-term goals. To develop a systematic set of concepts for second-by-second social interactions, we suggest a complementary approach based on practices employed in theater. Theater uses the concept of dramatic action, the effort that one makes to change the psychological state of another. Unlike states (e.g. emotions), dramatic actions aim to change states; unlike long-term goals or motivations, dramatic actions can last seconds. We defined a set of 22 basic dramatic action verbs using a lexical approach, such as ‘to threaten’–the effort to incite fear, and ‘to encourage’–the effort to inspire hope or confidence. We developed a set of visual cartoon stimuli for these basic dramatic actions, and find that people can reliably and reproducibly assign dramatic action verbs to these stimuli. We show that each dramatic action can be carried out with different emotions, indicating that the two constructs are distinct. We characterized a principal valence axis of dramatic actions. Finally, we re-analyzed three widely-used interaction coding systems in terms of dramatic actions, to suggest that dramatic actions might serve as a common vocabulary across research contexts. This study thus operationalizes and tests dramatic action as a potentially useful concept for research on social interaction, and in particular on influence tactics.
Collapse
|
26
|
Abstract
The present article shows that infant and dyad differences in hand-eye coordination predict dyad differences in joint attention (JA). In the study reported here, 51 toddlers ranging in age from 11 to 24 months and their parents wore head-mounted eye trackers as they played with objects together. We found that physically active toddlers aligned their looking behavior with their parent and achieved a substantial proportion of time spent jointly attending to the same object. However, JA did not arise through gaze following but rather through the coordination of gaze with manual actions on objects as both infants and parents attended to their partner's object manipulations. Moreover, dyad differences in JA were associated with dyad differences in hand following.
Collapse
|
27
|
Abstract
AbstractRecent progress in artificial intelligence has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
Collapse
|
28
|
Abstract
Understanding how humans perceive cause and effect in visual events has long intrigued philosophers and scientists. A new study in primates reveals the neural correlates of perceived causality at the single-cell level, but in an unexpected place - the motor system.
Collapse
|
29
|
Caggiano V, Fleischer F, Pomper JK, Giese MA, Thier P. Mirror Neurons in Monkey Premotor Area F5 Show Tuning for Critical Features of Visual Causality Perception. Curr Biol 2016; 26:3077-3082. [DOI: 10.1016/j.cub.2016.10.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 09/14/2016] [Accepted: 10/06/2016] [Indexed: 11/24/2022]
|
30
|
Marblestone AH, Wayne G, Kording KP. Toward an Integration of Deep Learning and Neuroscience. Front Comput Neurosci 2016; 10:94. [PMID: 27683554 PMCID: PMC5021692 DOI: 10.3389/fncom.2016.00094] [Citation(s) in RCA: 243] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/24/2016] [Indexed: 01/22/2023] Open
Abstract
Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short- and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. In support of these hypotheses, we argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain's specialized systems can be interpreted as enabling efficient optimization for specific problem classes. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.
Collapse
Affiliation(s)
- Adam H. Marblestone
- Synthetic Neurobiology Group, Massachusetts Institute of Technology, Media LabCambridge, MA, USA
| | | | - Konrad P. Kording
- Rehabilitation Institute of Chicago, Northwestern UniversityChicago, IL, USA
| |
Collapse
|
31
|
Yoshimi R, Takeno M, Toyota Y, Tsuchida N, Sugiyama Y, Kunishita Y, Kishimoto D, Kamiyama R, Minegishi K, Hama M, Kirino Y, Ishigatsubo Y, Ohno S, Ueda A, Nakajima H. On-demand ultrasonography assessment in the most symptomatic joint supports the 8-joint score system for management of rheumatoid arthritis patients. Mod Rheumatol 2016; 27:257-265. [PMID: 27409294 DOI: 10.1080/14397595.2016.1206173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVES To investigate whether on-demand ultrasonography (US) assessment alongside a routine examination is useful in the management of rheumatoid arthritis (RA). METHODS US was performed in eight (bilateral MCP 2, 3, wrist and knee) joints as the routine in a cumulative total of 406 RA patients. The most symptomatic joint other than the routine joints was additionally scanned. Power Doppler (PD) and gray-scale images were scored semiquantitatively. Eight-joint scores were calculated as the sum of individual scores for the routine joints. RESULTS The most symptomatic joint was found among the routine joints in 209 patients (Group A) and in other joints in 148 (Group B). The PD scores of the most symptomatic joint correlated well with the 8-joint scores in Group A (rs = 0.66), but not in Group B (rs = 0.33). The sensitivity and specificity of assessment of the most symptomatic joint for routine assessment positivity were high (84.0% and 100%, respectively) in Group A, but low (50.0% and 61.8%, respectively) in Group B. Additional examination detected synovitis in 38% of Group B with negative results in the routine. CONCLUSIONS On-demand US assessment in the most symptomatic joint, combined with the routine assessment, is useful for detecting RA synovitis.
Collapse
Affiliation(s)
- Ryusuke Yoshimi
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Mitsuhiro Takeno
- b Department of Allergy and Rheumatology , Nippon Medical School Graduate School of Medicine , Tokyo , Japan , and
| | - Yukihiro Toyota
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Naomi Tsuchida
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Yumiko Sugiyama
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Yosuke Kunishita
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Daiga Kishimoto
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Reikou Kamiyama
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Kaoru Minegishi
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Maasa Hama
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Yohei Kirino
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Yoshiaki Ishigatsubo
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Shigeru Ohno
- c Center for Rheumatic Disease, Yokohama City University Medical Center , Yokohama , Japan
| | - Atsuhisa Ueda
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| | - Hideaki Nakajima
- a Department of Hematology and Clinical Immunology , Yokohama City University School of Medicine , Yokohama , Japan
| |
Collapse
|
32
|
Fausey CM, Jayaraman S, Smith LB. From faces to hands: Changing visual input in the first two years. Cognition 2016; 152:101-107. [PMID: 27043744 PMCID: PMC4856551 DOI: 10.1016/j.cognition.2016.03.005] [Citation(s) in RCA: 130] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 03/07/2016] [Accepted: 03/08/2016] [Indexed: 11/25/2022]
Abstract
Human development takes place in a social context. Two pervasive sources of social information are faces and hands. Here, we provide the first report of the visual frequency of faces and hands in the everyday scenes available to infants. These scenes were collected by having infants wear head cameras during unconstrained everyday activities. Our corpus of 143hours of infant-perspective scenes, collected from 34 infants aged 1month to 2years, was sampled for analysis at 1/5Hz. The major finding from this corpus is that the faces and hands of social partners are not equally available throughout the first two years of life. Instead, there is an earlier period of dense face input and a later period of dense hand input. At all ages, hands in these scenes were primarily in contact with objects and the spatio-temporal co-occurrence of hands and faces was greater than expected by chance. The orderliness of the shift from faces to hands suggests a principled transition in the contents of visual experiences and is discussed in terms of the role of developmental gates on the timing and statistics of visual experiences.
Collapse
Affiliation(s)
- Caitlin M Fausey
- Department of Psychology, University of Oregon, Eugene, OR 97403, United States.
| | - Swapnaa Jayaraman
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| | - Linda B Smith
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| |
Collapse
|
33
|
Yu C, Smith LB. Multiple Sensory-Motor Pathways Lead to Coordinated Visual Attention. Cogn Sci 2016; 41 Suppl 1:5-31. [PMID: 27016038 DOI: 10.1111/cogs.12366] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2015] [Revised: 10/13/2015] [Accepted: 11/16/2015] [Indexed: 11/30/2022]
Abstract
Joint attention has been extensively studied in the developmental literature because of overwhelming evidence that the ability to socially coordinate visual attention to an object is essential to healthy developmental outcomes, including language learning. The goal of this study was to understand the complex system of sensory-motor behaviors that may underlie the establishment of joint attention between parents and toddlers. In an experimental task, parents and toddlers played together with multiple toys. We objectively measured joint attention-and the sensory-motor behaviors that underlie it-using a dual head-mounted eye-tracking system and frame-by-frame coding of manual actions. By tracking the momentary visual fixations and hand actions of each participant, we precisely determined just how often they fixated on the same object at the same time, the visual behaviors that preceded joint attention and manual behaviors that preceded and co-occurred with joint attention. We found that multiple sequential sensory-motor patterns lead to joint attention. In addition, there are developmental changes in this multi-pathway system evidenced as variations in strength among multiple routes. We propose that coordinated visual attention between parents and toddlers is primarily a sensory-motor behavior. Skill in achieving coordinated visual attention in social settings-like skills in other sensory-motor domains-emerges from multiple pathways to the same functional end.
Collapse
Affiliation(s)
- Chen Yu
- Department of Psychological and Brain Sciences, Cognitive Science Program, Indiana University
| | - Linda B Smith
- Department of Psychological and Brain Sciences, Cognitive Science Program, Indiana University
| |
Collapse
|
34
|
Yu C, Smith LB. Linking Joint Attention with Hand-Eye Coordination - A Sensorimotor Approach to Understanding Child-Parent Social Interaction. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2015; 2015:2763-2768. [PMID: 29226280 PMCID: PMC5722468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
An understanding of human collaboration requires a level of analysis that concentrates on sensorimotor behaviors in which the behaviors of social partners continually adjust to and influence each other. A suite of individual differences in partners' ability to both read the social cues of others and to send effective behavioral cues to others create dyad differences in joint attention and joint action. The present paper shows that infant and dyad differences in hand-eye coordination predict dyad differences in joint attention. In the study reported here, 51 toddlers and their parents wore head-mounted eye-trackers as they played together with objects. This method allowed us to track the gaze direction of each participant to determine when they attended to the same object. We found that physically active toddlers align their looking behavior with their parent, and achieve a high proportion of time spent jointly attending to the same object in toy play. However, joint attention bouts in toy play don't depend on gaze following but rather on the coordination of gaze with hand actions on objects. Both infants and parents attend to their partner's object manipulations and in so doing fixate the object visually attended by their partner. Thus, the present results provide evidence for another pathway to joint attention - hand following instead of gaze following. Moreover, dyad differences in joint attention are associated with dyad differences in hand following, and specifically parents' and infants' manual activities on objects and the within- and between-partner coordination of hands and eyes during parent-infant interactions. In particular, infants' manual actions on objects play a critical role in organizing parent-infant joint attention to an object.
Collapse
Affiliation(s)
- Chen Yu
- Department of Psychological and Brain Sciences, and Cognitive Science Program, Indiana University, Bloomington, IN, 47405 USA
| | - Linda B Smith
- Department of Psychological and Brain Sciences, and Cognitive Science Program, Indiana University, Bloomington, IN, 47405 USA
| |
Collapse
|
35
|
Tian M, Grill-Spector K. Spatiotemporal information during unsupervised learning enhances viewpoint invariant object recognition. J Vis 2015; 15:7. [PMID: 26024454 DOI: 10.1167/15.6.7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Recognizing objects is difficult because it requires both linking views of an object that can be different and distinguishing objects with similar appearance. Interestingly, people can learn to recognize objects across views in an unsupervised way, without feedback, just from the natural viewing statistics. However, there is intense debate regarding what information during unsupervised learning is used to link among object views. Specifically, researchers argue whether temporal proximity, motion, or spatiotemporal continuity among object views during unsupervised learning is beneficial. Here, we untangled the role of each of these factors in unsupervised learning of novel three-dimensional (3-D) objects. We found that after unsupervised training with 24 object views spanning a 180° view space, participants showed significant improvement in their ability to recognize 3-D objects across rotation. Surprisingly, there was no advantage to unsupervised learning with spatiotemporal continuity or motion information than training with temporal proximity. However, we discovered that when participants were trained with just a third of the views spanning the same view space, unsupervised learning via spatiotemporal continuity yielded significantly better recognition performance on novel views than learning via temporal proximity. These results suggest that while it is possible to obtain view-invariant recognition just from observing many views of an object presented in temporal proximity, spatiotemporal information enhances performance by producing representations with broader view tuning than learning via temporal association. Our findings have important implications for theories of object recognition and for the development of computational algorithms that learn from examples.
Collapse
|
36
|
Liberman N, Trope Y. Traversing psychological distance. Trends Cogn Sci 2014; 18:364-9. [PMID: 24726527 DOI: 10.1016/j.tics.2014.03.001] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Revised: 03/12/2014] [Accepted: 03/13/2014] [Indexed: 10/25/2022]
Abstract
Traversing psychological distance involves going beyond direct experience, and includes planning, perspective taking, and contemplating counterfactuals. Consistent with this view, temporal, spatial, and social distances as well as hypotheticality are associated, affect each other, and are inferred from one another. Moreover, traversing all distances involves the use of abstraction, which we define as forming a belief about the substitutability for a specific purpose of subjectively distinct objects. Indeed, across many instances of both abstraction and psychological distancing, more abstract constructs are used for more distal objects. Here, we describe the implications of this relation for prediction, choice, communication, negotiation, and self-control. We ask whether traversing distance is a general mental ability and whether distance should replace expectancy in expected-utility theories.
Collapse
Affiliation(s)
- Nira Liberman
- Department of Psychology, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Yaacov Trope
- Department of Psychology, New York University, 6 Washington Place, New York, NY 10003, USA.
| |
Collapse
|
37
|
Yu C, Smith LB. Joint attention without gaze following: human infants and their parents coordinate visual attention to objects through eye-hand coordination. PLoS One 2013; 8:e79659. [PMID: 24236151 PMCID: PMC3827436 DOI: 10.1371/journal.pone.0079659] [Citation(s) in RCA: 182] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 09/24/2013] [Indexed: 11/24/2022] Open
Abstract
The coordination of visual attention among social partners is central to many components of human behavior and human development. Previous research has focused on one pathway to the coordination of looking behavior by social partners, gaze following. The extant evidence shows that even very young infants follow the direction of another's gaze but they do so only in highly constrained spatial contexts because gaze direction is not a spatially precise cue as to the visual target and not easily used in spatially complex social interactions. Our findings, derived from the moment-to-moment tracking of eye gaze of one-year-olds and their parents as they actively played with toys, provide evidence for an alternative pathway, through the coordination of hands and eyes in goal-directed action. In goal-directed actions, the hands and eyes of the actor are tightly coordinated both temporally and spatially, and thus, in contexts including manual engagement with objects, hand movements and eye movements provide redundant information about where the eyes are looking. Our findings show that one-year-olds rarely look to the parent's face and eyes in these contexts but rather infants and parents coordinate looking behavior without gaze following by attending to objects held by the self or the social partner. This pathway, through eye-hand coupling, leads to coordinated joint switches in visual attention and to an overall high rate of looking at the same object at the same time, and may be the dominant pathway through which physically active toddlers align their looking behavior with a social partner.
Collapse
Affiliation(s)
- Chen Yu
- Department of Psychological and Brain Sciences, Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, United States of America
- * E-mail:
| | - Linda B. Smith
- Department of Psychological and Brain Sciences, Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, United States of America
| |
Collapse
|
38
|
Poggio T, Ullman S. Vision: are models of object recognition catching up with the brain? Ann N Y Acad Sci 2013; 1305:72-82. [PMID: 23773126 DOI: 10.1111/nyas.12148] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Object recognition has been a central yet elusive goal of computational vision. For many years, computer performance seemed highly deficient and unable to emulate the basic capabilities of the human recognition system. Over the past decade or so, computer scientists and neuroscientists have developed algorithms and systems-and models of visual cortex-that have come much closer to human performance in visual identification and categorization. In this personal perspective, we discuss the ongoing struggle of visual models to catch up with the visual cortex, identify key reasons for the relatively rapid improvement of artificial systems and models, and identify open problems for computational vision in this domain.
Collapse
Affiliation(s)
- Tomaso Poggio
- Department of Brain and Cognitive Sciences, McGovern Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | | |
Collapse
|
39
|
|