1
|
Ahn S, Adeli H, Zelinsky GJ. The attentive reconstruction of objects facilitates robust object recognition. PLoS Comput Biol 2024; 20:e1012159. [PMID: 38870125 PMCID: PMC11175536 DOI: 10.1371/journal.pcbi.1012159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 05/11/2024] [Indexed: 06/15/2024] Open
Abstract
Humans are extremely robust in our ability to perceive and recognize objects-we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image. Top-down attention then uses this reconstruction as a template to bias feedforward processing to align with the most plausible object hypothesis. Building on auto-encoder neural networks, our model makes detailed hypotheses about the appearance and location of the candidate objects in the image by reconstructing a complete object representation from potentially incomplete visual input due to noise and occlusion. The model then leverages the best object reconstruction, measured by reconstruction error, to direct the bottom-up process of selectively routing low-level features, a top-down biasing that captures a core function of attention. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets. Not only did our model achieve superior performance on these challenging tasks designed to approximate real-world noise and occlusion viewing conditions, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our model suggests that a complete understanding of object perception and recognition requires integrating top-down and attention feedback, which we propose is an object reconstruction.
Collapse
Affiliation(s)
- Seoyoung Ahn
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| | - Hossein Adeli
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York City, New York, United States of America
| | - Gregory J. Zelinsky
- Department of Psychology, Stony Brook University, Stony Brook, New York, United States of America
- Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
2
|
Lande KJ. Compositionality in perception: A framework. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024:e1691. [PMID: 38807187 DOI: 10.1002/wcs.1691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Perception involves the processing of content or information about the world. In what form is this content represented? I argue that perception is widely compositional. The perceptual system represents many stimulus features (including shape, orientation, and motion) in terms of combinations of other features (such as shape parts, slant and tilt, common and residual motion vectors). But compositionality can take a variety of forms. The ways in which perceptual representations compose are markedly different from the ways in which sentences or thoughts are thought to be composed. I suggest that the thesis that perception is compositional is not itself a concrete hypothesis with specific predictions; rather it affords a productive framework for developing and evaluating specific empirical hypotheses about the form and content of perceptual representations. The question is not just whether perception is compositional, but how. Answering this latter question can provide fundamental insights into perception. This article is categorized under: Philosophy > Representation Philosophy > Foundations of Cognitive Science Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kevin J Lande
- Department of Philosophy and Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
3
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
4
|
Franken TP, Reynolds JH. Grouping cells in primate visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.16.575953. [PMID: 38293172 PMCID: PMC10827172 DOI: 10.1101/2024.01.16.575953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Our perception of how objects are laid out in visual scenes is remarkably stable, despite rapid shifts in the patterns of light that fall on the retina with each saccade. One mechanism that may help establish perceptual stability is border ownership assignment. Studies in macaque area V2 have identified border ownership neurons that signal which side of a border belongs to a foreground surface. This signal persists for hundreds of milliseconds after border ownership has been rendered ambiguous by deleting the stimulus features that distinguish foreground from background. Remarkably, this signal survives eye movements: border ownership neurons also exhibit border ownership signals de novo when an eye movement places the newly ambiguous border within their receptive field. The grouping cell hypothesis proposes the existence of hypothetical grouping cells in a downstream brain area. These cells would compute persistent proto-object representations and therefore have the properties to endow cells in upstream brain areas with selectivity for border ownership. Such grouping cells have been predicted to show a centripetal and persistent pattern of preferred side of ownership for a border placed parallel to the perimeter of their classical receptive field, and such a centripetal ownership preference pattern should also occur de novo in these same cells if an ambiguous border lands in their receptive field after a saccade. It is unknown if grouping cells exist. Here we used laminar multielectrodes in area V4 - the main source of feedback to V2 - of behaving macaques to determine whether such grouping cells exist. Consistent with the model prediction we find a substantial population of neurons with these properties, in all laminar compartments, and they exhibit a response latency that is short enough to act as the source that endows neurons in V2 with selectivity for border ownership. While grouping cell activity provides information about the location of foreground surfaces, these neurons are, counterintuitively, not as strongly tuned for luminance contrast polarity, a feature of those surfaces, as are border ownership cells. Our data suggest a division of labor in which these newly discovered grouping cells provide spatiotemporal continuity of segmented surfaces whereas border ownership cells link this location information with surface features such as luminance contrast.
Collapse
Affiliation(s)
- Tom P. Franken
- Systems Neurobiology Laboratory, The Salk Institute for Biological Studies, San Diego, California, USA
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, USA
- Lead contact
| | - John H. Reynolds
- Systems Neurobiology Laboratory, The Salk Institute for Biological Studies, San Diego, California, USA
| |
Collapse
|
5
|
Golan T, Taylor J, Schütt H, Peters B, Sommers RP, Seeliger K, Doerig A, Linton P, Konkle T, van Gerven M, Kording K, Richards B, Kietzmann TC, Lindsay GW, Kriegeskorte N. Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses. Behav Brain Sci 2023; 46:e392. [PMID: 38054329 DOI: 10.1017/s0140525x23001553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
An ideal vision model accounts for behavior and neurophysiology in both naturalistic conditions and designed lab experiments. Unlike psychological theories, artificial neural networks (ANNs) actually perform visual tasks and generate testable predictions for arbitrary inputs. These advantages enable ANNs to engage the entire spectrum of the evidence. Failures of particular models drive progress in a vibrant ANN research program of human vision.
Collapse
Affiliation(s)
- Tal Golan
- Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
| | - Heiko Schütt
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Center for Neural Science, New York University, New York, NY, USA
| | - Benjamin Peters
- School of Psychology & Neuroscience, University of Glasgow, Glasgow, UK
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Paul Linton
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Presidential Scholars in Society and Neuroscience, Center for Science and Society, Columbia University, New York, NY, USA
- Italian Academy for Advanced Studies in America, Columbia University, New York, NY, USA
| | - Talia Konkle
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA ://konklab.fas.harvard.edu/
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlandsartcogsys.com
| | - Konrad Kording
- Departments of Bioengineering and Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | - Blake Richards
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Mila, Montreal, QC, Canada
- School of Computer Science, McGill University, Montreal, QC, Canada
- Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
- Montreal Neurological Institute, Montreal, QC, Canada
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Grace W Lindsay
- Department of Psychology and Center for Data Science, New York University, New York, NY, USA
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Departments of Psychology, Neuroscience, and Electrical Engineering, Columbia University, New York, NY, USA
| |
Collapse
|
6
|
Li AY, Mur M. Neural networks need real-world behavior. Behav Brain Sci 2023; 46:e398. [PMID: 38054287 DOI: 10.1017/s0140525x23001504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Bowers et al. propose to use controlled behavioral experiments when evaluating deep neural networks as models of biological vision. We agree with the sentiment and draw parallels to the notion that "neuroscience needs behavior." As a promising path forward, we suggest complementing image recognition tasks with increasingly realistic and well-controlled task environments that engage real-world object recognition behavior.
Collapse
Affiliation(s)
- Aedan Y Li
- Department of Psychology, Western University, London, ON, Canada , www.aedanyueli.com
| | - Marieke Mur
- Department of Psychology, Western University, London, ON, Canada , www.aedanyueli.com
- Department of Computer Science, Western University, London, ON, Canada
| |
Collapse
|
7
|
Zhao H, Zhang Y, Han L, Qian W, Wang J, Wu H, Li J, Dai Y, Zhang Z, Bowen CR, Yang Y. Intelligent Recognition Using Ultralight Multifunctional Nano-Layered Carbon Aerogel Sensors with Human-Like Tactile Perception. NANO-MICRO LETTERS 2023; 16:11. [PMID: 37943399 PMCID: PMC10635924 DOI: 10.1007/s40820-023-01216-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/11/2023] [Indexed: 11/10/2023]
Abstract
Humans can perceive our complex world through multi-sensory fusion. Under limited visual conditions, people can sense a variety of tactile signals to identify objects accurately and rapidly. However, replicating this unique capability in robots remains a significant challenge. Here, we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure, temperature, material recognition and 3D location capabilities, which is combined with multimodal supervised learning algorithms for object recognition. The sensor exhibits human-like pressure (0.04-100 kPa) and temperature (21.5-66.2 °C) detection, millisecond response times (11 ms), a pressure sensitivity of 92.22 kPa-1 and triboelectric durability of over 6000 cycles. The devised algorithm has universality and can accommodate a range of application scenarios. The tactile system can identify common foods in a kitchen scene with 94.63% accuracy and explore the topographic and geomorphic features of a Mars scene with 100% accuracy. This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing, recognition and intelligence.
Collapse
Affiliation(s)
- Huiqi Zhao
- CAS Center for Excellence in Nanoscience, Beijing Key Laboratory of Micro-Nano Energy and Sensor, Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, People's Republic of China
- School of Nanoscience and Technology, University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
| | - Yizheng Zhang
- Tencent Robotics X, Shenzhen, 518054, People's Republic of China
| | - Lei Han
- Tencent Robotics X, Shenzhen, 518054, People's Republic of China
| | - Weiqi Qian
- CAS Center for Excellence in Nanoscience, Beijing Key Laboratory of Micro-Nano Energy and Sensor, Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, People's Republic of China
- School of Nanoscience and Technology, University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
| | - Jiabin Wang
- CAS Center for Excellence in Nanoscience, Beijing Key Laboratory of Micro-Nano Energy and Sensor, Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, People's Republic of China
- Center on Nanoenergy Research, School of Physical Science and Technology, Guangxi University, Nanning, 530004, People's Republic of China
| | - Heting Wu
- CAS Center for Excellence in Nanoscience, Beijing Key Laboratory of Micro-Nano Energy and Sensor, Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, People's Republic of China
| | - Jingchen Li
- Tencent Robotics X, Shenzhen, 518054, People's Republic of China
| | - Yuan Dai
- Tencent Robotics X, Shenzhen, 518054, People's Republic of China.
| | - Zhengyou Zhang
- Tencent Robotics X, Shenzhen, 518054, People's Republic of China
| | - Chris R Bowen
- Department of Mechanical Engineering, University of Bath, Bath, BA2 7AK, UK
| | - Ya Yang
- CAS Center for Excellence in Nanoscience, Beijing Key Laboratory of Micro-Nano Energy and Sensor, Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, People's Republic of China.
- School of Nanoscience and Technology, University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Center on Nanoenergy Research, School of Physical Science and Technology, Guangxi University, Nanning, 530004, People's Republic of China.
| |
Collapse
|
8
|
Abstract
Deep neural networks (DNNs) are machine learning algorithms that have revolutionized computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. In this article, we review evidence regarding current DNNs as adequate behavioral models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models and to understand model quality as a multidimensional concept in which clarity about modeling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that, as of today, DNNs should only be regarded as promising-but not yet adequate-computational models of human core object recognition behavior. On the way, we dispel several myths surrounding DNNs in vision science.
Collapse
Affiliation(s)
- Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany;
| | | |
Collapse
|
9
|
Brooks JA, Tzirakis P, Baird A, Kim L, Opara M, Fang X, Keltner D, Monroy M, Corona R, Metrick J, Cowen AS. Deep learning reveals what vocal bursts express in different cultures. Nat Hum Behav 2023; 7:240-250. [PMID: 36577898 DOI: 10.1038/s41562-022-01489-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 10/26/2022] [Indexed: 12/29/2022]
Abstract
Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called 'vocal bursts'. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts. A total of n = 4,031 participants in China, India, South Africa, the USA and Venezuela mimicked vocal bursts drawn from 2,756 seed recordings. Participants also judged the emotional meaning of each vocal burst. A deep neural network tasked with predicting the culture-specific meanings people attributed to vocal bursts while disregarding context and speaker identity discovered 24 acoustic dimensions, or kinds, of vocal expression with distinct emotion-related meanings. The meanings attributed to these complex vocal modulations were 79% preserved across the five countries and three languages. These results reveal the underlying dimensions of human emotional vocalization in remarkable detail.
Collapse
Affiliation(s)
- Jeffrey A Brooks
- Research Division, Hume AI, New York, NY, USA. .,University of California, Berkeley, Berkeley, CA, USA.
| | | | - Alice Baird
- Research Division, Hume AI, New York, NY, USA
| | - Lauren Kim
- Research Division, Hume AI, New York, NY, USA
| | | | - Xia Fang
- Zhejiang University, Hangzhou, China
| | - Dacher Keltner
- Research Division, Hume AI, New York, NY, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Maria Monroy
- University of California, Berkeley, Berkeley, CA, USA
| | | | | | - Alan S Cowen
- Research Division, Hume AI, New York, NY, USA. .,University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
10
|
Moore JA, Tuladhar A, Ismail Z, Mouches P, Wilms M, Forkert ND. Dementia in Convolutional Neural Networks: Using Deep Learning Models to Simulate Neurodegeneration of the Visual System. Neuroinformatics 2023; 21:45-55. [PMID: 36083416 DOI: 10.1007/s12021-022-09602-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2022] [Indexed: 11/27/2022]
Abstract
Although current research aims to improve deep learning networks by applying knowledge about the healthy human brain and vice versa, the potential of using such networks to model and study neurodegenerative diseases remains largely unexplored. In this work, we present an in-depth feasibility study modeling progressive dementia in silico with deep convolutional neural networks. Therefore, networks were trained to perform visual object recognition and then progressively injured by applying neuronal as well as synaptic injury. After each iteration of injury, network object recognition accuracy, saliency map similarity between the intact and injured networks, and internal activations of the degenerating models were evaluated. The evaluation revealed that cognitive function of the network progressively decreased with increasing injury load whereas this effect was much more pronounced for synaptic damage. The effects of neurodegeneration found for the in silico model are especially similar to the loss of visual cognition seen in patients with posterior cortical atrophy.
Collapse
Affiliation(s)
- Jasmine A Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada.
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada.
| | - Anup Tuladhar
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada
- Department of Psychiatry, University of Calgary, Calgary, AB, Canada
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada
| | - Pauline Mouches
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Matthias Wilms
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
- Department of Electrical and Software Engineering, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
11
|
Quilty-Dunn J, Porot N, Mandelbaum E. The best game in town: The reemergence of the language-of-thought hypothesis across the cognitive sciences. Behav Brain Sci 2022; 46:e261. [PMID: 36471543 DOI: 10.1017/s0140525x22002849] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Mental representations remain the central posits of psychology after many decades of scrutiny. However, there is no consensus about the representational format(s) of biological cognition. This paper provides a survey of evidence from computational cognitive psychology, perceptual psychology, developmental psychology, comparative psychology, and social psychology, and concludes that one type of format that routinely crops up is the language-of-thought (LoT). We outline six core properties of LoTs: (i) discrete constituents; (ii) role-filler independence; (iii) predicate-argument structure; (iv) logical operators; (v) inferential promiscuity; and (vi) abstract content. These properties cluster together throughout cognitive science. Bayesian computational modeling, compositional features of object perception, complex infant and animal reasoning, and automatic, intuitive cognition in adults all implicate LoT-like structures. Instead of regarding LoT as a relic of the previous century, researchers in cognitive science and philosophy-of-mind must take seriously the explanatory breadth of LoT-based architectures. We grant that the mind may harbor many formats and architectures, including iconic and associative structures as well as deep-neural-network-like architectures. However, as computational/representational approaches to the mind continue to advance, classical compositional symbolic structures - that is, LoTs - only prove more flexible and well-supported over time.
Collapse
Affiliation(s)
- Jake Quilty-Dunn
- Department of Philosophy and Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO, USA. , sites.google.com/site/jakequiltydunn/
| | - Nicolas Porot
- Africa Institute for Research in Economics and Social Sciences, Mohammed VI Polytechnic University, Rabat, Morocco. , nicolasporot.com
| | - Eric Mandelbaum
- Departments of Philosophy and Psychology, The Graduate Center & Baruch College, CUNY, New York, NY, USA. , ericmandelbaum.com
| |
Collapse
|
12
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
13
|
Sp A. Trailblazers in Neuroscience: Using compositionality to understand how parts combine in whole objects. Eur J Neurosci 2022; 56:4378-4392. [PMID: 35760552 PMCID: PMC10084036 DOI: 10.1111/ejn.15746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/27/2022]
Abstract
A fundamental question for any visual system is whether its image representation can be understood in terms of its components. Decomposing any image into components is challenging because there are many possible decompositions with no common dictionary, and enumerating them leads to a combinatorial explosion. Even in perception, many objects are readily seen as containing parts, but there are many exceptions. These exceptions include objects that are not perceived as containing parts, properties like symmetry that cannot be localized to any single part, and also special categories like words and faces whose perception is widely believed to be holistic. Here, I describe a novel approach we have used to address these issues and evaluate compositionality at the behavioral and neural levels. The key design principle is to create a large number of objects by combining a small number of pre-defined components in all possible ways. This allows for building component-based models that explain whole objects using a combination of these components. Importantly, any systematic error in model fits can be used to detect the presence of emergent or holistic properties. Using this approach, we have found that whole object representations are surprisingly predictable from their components, that some components are preferred to others in perception, and that emergent properties can be discovered or explained using compositional models. Thus, compositionality is a powerful approach for understanding how whole objects relate to their parts.
Collapse
Affiliation(s)
- Arun Sp
- Centre for Neuroscience, Indian Institute of Science Bangalore
| |
Collapse
|
14
|
Motor-related signals support localization invariance for stable visual perception. PLoS Comput Biol 2022; 18:e1009928. [PMID: 35286305 PMCID: PMC8947590 DOI: 10.1371/journal.pcbi.1009928] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 03/24/2022] [Accepted: 02/16/2022] [Indexed: 11/19/2022] Open
Abstract
Our ability to perceive a stable visual world in the presence of continuous movements of the body, head, and eyes has puzzled researchers in the neuroscience field for a long time. We reformulated this problem in the context of hierarchical convolutional neural networks (CNNs)—whose architectures have been inspired by the hierarchical signal processing of the mammalian visual system—and examined perceptual stability as an optimization process that identifies image-defining features for accurate image classification in the presence of movements. Movement signals, multiplexed with visual inputs along overlapping convolutional layers, aided classification invariance of shifted images by making the classification faster to learn and more robust relative to input noise. Classification invariance was reflected in activity manifolds associated with image categories emerging in late CNN layers and with network units acquiring movement-associated activity modulations as observed experimentally during saccadic eye movements. Our findings provide a computational framework that unifies a multitude of biological observations on perceptual stability under optimality principles for image classification in artificial neural networks. Stable visual perception during eye and body movements suggests neural algorithms that convert location information—"where” type of signals—across multiple frames of reference, for instance, from retinocentric to craniocentric coordinates. Accordingly, numerous theoretical studies have proposed biologically plausible computational processes to achieve such transformations. However, how coordinate transformations can then be used by the hierarchy of cortical visual areas to produce stable perception remains largely unknown. Here, we explore the hypothesis that perception equates to the activity states of networks trained to classify “features” (e.g., objects, salient components) in the visual scene, and perceptual stability equates to robust classification of these features relative to self-generated movements, that is, a “what” type of information processing. We demonstrate in CNNs that neural signals related to eye and body movements support accurate image classification by making “where” type of computations—localization invariances—faster to learn and more robust relative to input perturbations. Therefore, by equating perception to the activity states of classifier networks, we provide a simple unifying mechanistic framework to explain the role movement signals in support of stable perception in dynamic interactions with the environment.
Collapse
|
15
|
Tuladhar A, Moore JA, Ismail Z, Forkert ND. Modeling Neurodegeneration in silico With Deep Learning. Front Neuroinform 2021; 15:748370. [PMID: 34867256 PMCID: PMC8640525 DOI: 10.3389/fninf.2021.748370] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 10/21/2021] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks, inspired by information processing in the brain, can achieve human-like performance for various tasks. However, research efforts to use these networks as models of the brain have primarily focused on modeling healthy brain function so far. In this work, we propose a paradigm for modeling neural diseases in silico with deep learning and demonstrate its use in modeling posterior cortical atrophy (PCA), an atypical form of Alzheimer’s disease affecting the visual cortex. We simulated PCA in deep convolutional neural networks (DCNNs) trained for visual object recognition by randomly injuring connections between artificial neurons. Results showed that injured networks progressively lost their object recognition capability. Simulated PCA impacted learned representations hierarchically, as networks lost object-level representations before category-level representations. Incorporating this paradigm in computational neuroscience will be essential for developing in silico models of the brain and neurological diseases. The paradigm can be expanded to incorporate elements of neural plasticity and to other cognitive domains such as motor control, auditory cognition, language processing, and decision making.
Collapse
Affiliation(s)
- Anup Tuladhar
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
| | - Jasmine A Moore
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Biomedical Engineering Program, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada.,Department of Psychiatry, University of Calgary, Calgary, AB, Canada.,O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, AB, Canada.,Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.,Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| |
Collapse
|