1
|
Jaffe PI, Santiago-Reyes GX, Schafer RJ, Bissett PG, Poldrack RA. An image-computable model of speeded decision-making. eLife 2025; 13:RP98351. [PMID: 40019474 PMCID: PMC11870652 DOI: 10.7554/elife.98351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2025] Open
Abstract
Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.
Collapse
Affiliation(s)
- Paul I Jaffe
- Department of Psychology, Stanford UniversityStanfordUnited States
| | | | | | | | | |
Collapse
|
2
|
Subramanian A, Price S, Kumbhar O, Sizikova E, Majaj NJ, Pelli DG. Benchmarking the speed-accuracy tradeoff in object recognition by humans and neural networks. J Vis 2025; 25:4. [PMID: 39752176 PMCID: PMC11706240 DOI: 10.1167/jov.25.1.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 10/25/2024] [Indexed: 01/04/2025] Open
Abstract
Active object recognition, fundamental to tasks like reading and driving, relies on the ability to make time-sensitive decisions. People exhibit a flexible tradeoff between speed and accuracy, a crucial human skill. However, current computational models struggle to incorporate time. To address this gap, we present the first dataset (with 148 observers) exploring the speed-accuracy tradeoff (SAT) in ImageNet object recognition. Participants performed a 16-way ImageNet categorization task where their responses counted only if they occurred near the time of a fixed-delay beep. Each block of trials allowed one reaction time. As expected, human accuracy increases with reaction time. We compare human performance with that of dynamic neural networks that adapt their computation to the available inference time. Time is a scarce resource for human object recognition, and finding an appropriate analog in neural networks is challenging. Networks can repeat operations by using layers, recurrent cycles, or early exits. We use the repetition count as a network's analog for time. In our analysis, the number of layers, recurrent cycles, and early exits correlates strongly with floating-point operations, making them suitable time analogs. Comparing networks and humans on SAT-fit error, category-wise correlation, and SAT-curve steepness, we find cascaded dynamic neural networks most promising in modeling human speed and accuracy. Surprisingly, convolutional recurrent networks, typically favored in human object recognition modeling, perform the worst on our benchmark.
Collapse
Affiliation(s)
- Ajay Subramanian
- Department of Psychology, New York University, New York, NY, USA
| | - Sara Price
- Center for Data Science, New York University, New York, NY, USA
| | - Omkar Kumbhar
- Computer Science Department, New York University, New York, NY, USA
| | - Elena Sizikova
- Center for Data Science, New York University, New York, NY, USA
| | - Najib J Majaj
- Center for Neural Science, New York University, New York, NY, USA
| | - Denis G Pelli
- Department of Psychology, New York University, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| |
Collapse
|
3
|
Akbarinia A. Exploring the categorical nature of colour perception: Insights from artificial networks. Neural Netw 2025; 181:106758. [PMID: 39368278 DOI: 10.1016/j.neunet.2024.106758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/26/2024] [Accepted: 09/23/2024] [Indexed: 10/07/2024]
Abstract
The electromagnetic spectrum of light from a rainbow is a continuous signal, yet we perceive it vividly in several distinct colour categories. The origins and underlying mechanisms of this phenomenon remain partly unexplained. We investigate categorical colour perception in artificial neural networks (ANNs) using the odd-one-out paradigm. In the first experiment, we compared unimodal vision networks (e.g., ImageNet object recognition) to multimodal vision-language models (e.g., CLIP text-image matching). Our results show that vision networks predict a significant portion of human data (approximately 80%), while vision-language models account for the remaining unexplained data, even in non-linguistic experiments. These findings suggest that categorical colour perception is a language-independent representation, though it is partly shaped by linguistic colour terms during its development. In the second experiment, we explored how the visual task influences the colour categories of an ANN by examining twenty-four Taskonomy networks. Our results indicate that human-like colour categories are task-dependent, predominantly emerging in semantic and 3D tasks, with a notable absence in low-level tasks. To explain this difference, we analysed kernel responses before the winner-takes-all stage, observing that networks with mismatching colour categories may still align in underlying continuous representations. Our findings quantify the dual influence of visual signals and linguistic factors in categorical colour perception and demonstrate the task-dependent nature of this phenomenon, suggesting that categorical colour perception emerges to facilitate certain visual tasks.
Collapse
Affiliation(s)
- Arash Akbarinia
- Department of Experimental Psychology, University of Giessen, Germany.
| |
Collapse
|
4
|
Mukherjee K, Rogers TT. Using drawings and deep neural networks to characterize the building blocks of human visual similarity. Mem Cognit 2025; 53:219-241. [PMID: 38814385 DOI: 10.3758/s13421-024-01580-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/31/2024]
Abstract
Early in life and without special training, human beings discern resemblance between abstract visual stimuli, such as drawings, and the real-world objects they represent. We used this capacity for visual abstraction as a tool for evaluating deep neural networks (DNNs) as models of human visual perception. Contrasting five contemporary DNNs, we evaluated how well each explains human similarity judgments among line drawings of recognizable and novel objects. For object sketches, human judgments were dominated by semantic category information; DNN representations contributed little additional information. In contrast, such features explained significant unique variance perceived similarity of abstract drawings. In both cases, a vision transformer trained to blend representations of images and their natural language descriptions showed the greatest ability to explain human perceptual similarity-an observation consistent with contemporary views of semantic representation and processing in the human mind and brain. Together, the results suggest that the building blocks of visual similarity may arise within systems that learn to use visual information, not for specific classification, but in service of generating semantic representations of objects.
Collapse
Affiliation(s)
- Kushin Mukherjee
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
| | - Timothy T Rogers
- Department of Psychology & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
5
|
Duyck S, Costantino AI, Bracci S, Op de Beeck H. A computational deep learning investigation of animacy perception in the human brain. Commun Biol 2024; 7:1718. [PMID: 39741161 DOI: 10.1038/s42003-024-07415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
The functional organization of the human object vision pathway distinguishes between animate and inanimate objects. To understand animacy perception, we explore the case of zoomorphic objects resembling animals. While the perception of these objects as animal-like seems obvious to humans, such "Animal bias" is a striking discrepancy between the human brain and deep neural networks (DNNs). We computationally investigated the potential origins of this bias. We successfully induced this bias in DNNs trained explicitly with zoomorphic objects. Alternative training schedules failed to cause an Animal bias. We considered the superordinate distinction between animate and inanimate classes, the sensitivity for faces and bodies, the bias for shape over texture, the role of ecologically valid categories, recurrent connections, and language-informed visual processing. These findings provide computational support that the Animal bias for zoomorphic objects is a unique property of human perception yet can be explained by human learning history.
Collapse
Affiliation(s)
- Stefanie Duyck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Andrea I Costantino
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium.
| | - Stefania Bracci
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy
| | - Hans Op de Beeck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
6
|
Jarvers C, Neumann H. Teaching deep networks to see shape: Lessons from a simplified visual world. PLoS Comput Biol 2024; 20:e1012019. [PMID: 39527647 PMCID: PMC11581402 DOI: 10.1371/journal.pcbi.1012019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 11/21/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Deep neural networks have been remarkably successful as models of the primate visual system. One crucial problem is that they fail to account for the strong shape-dependence of primate vision. Whereas humans base their judgements of category membership to a large extent on shape, deep networks rely much more strongly on other features such as color and texture. While this problem has been widely documented, the underlying reasons remain unclear. We design simple, artificial image datasets in which shape, color, and texture features can be used to predict the image class. By training networks from scratch to classify images with single features and feature combinations, we show that some network architectures are unable to learn to use shape features, whereas others are able to use shape in principle but are biased towards the other features. We show that the bias can be explained by the interactions between the weight updates for many images in mini-batch gradient descent. This suggests that different learning algorithms with sparser, more local weight changes are required to make networks more sensitive to shape and improve their capability to describe human vision.
Collapse
Affiliation(s)
- Christian Jarvers
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, Ulm, Germany
| |
Collapse
|
7
|
Shekhar M, Rahnev D. Human-like dissociations between confidence and accuracy in convolutional neural networks. PLoS Comput Biol 2024; 20:e1012578. [PMID: 39541396 PMCID: PMC11594416 DOI: 10.1371/journal.pcbi.1012578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 11/26/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024] Open
Abstract
Prior research has shown that manipulating stimulus energy by changing both stimulus contrast and variability results in confidence-accuracy dissociations in humans. Specifically, even when performance is matched, higher stimulus energy leads to higher confidence. The most common explanation for this effect, derived from cognitive modeling, is the positive evidence heuristic where confidence neglects evidence that disconfirms the choice. However, an alternative explanation is the signal-and-variance-increase hypothesis, according to which these dissociations arise from changes in the separation and variance of perceptual representations. Because artificial neural networks lack built-in confidence heuristics, they can serve as a test for the necessity of confidence heuristics in explaining confidence-accuracy dissociations. Therefore, we tested whether confidence-accuracy dissociations induced by stimulus energy manipulations emerge naturally in convolutional neural networks (CNNs). We found that, across three different energy manipulations, CNNs produced confidence-accuracy dissociations similar to those found in humans. This effect was present for a range of CNN architectures from shallow 4-layer networks to very deep ones, such as VGG-19 and ResNet-50 pretrained on ImageNet. Further, we traced back the reason for the confidence-accuracy dissociations in all CNNs to the same signal-and-variance increase that has been proposed for humans: higher stimulus energy increased the separation and variance of evidence distributions in the CNNs' output layer leading to higher confidence even for matched accuracy. These findings cast doubt on the necessity of the positive evidence heuristic to explain human confidence and establish CNNs as promising models for testing cognitive theories of human behavior.
Collapse
Affiliation(s)
- Medha Shekhar
- School of Psychology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Dobromir Rahnev
- School of Psychology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
8
|
Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat Commun 2024; 15:9383. [PMID: 39477923 PMCID: PMC11526138 DOI: 10.1038/s41467-024-53147-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/01/2024] [Indexed: 11/02/2024] Open
Abstract
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
Collapse
Affiliation(s)
- Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Kendrick N Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
9
|
Simony E, Grossman S, Malach R. Brain-machine convergent evolution: Why finding parallels between brain and artificial systems is informative. Proc Natl Acad Sci U S A 2024; 121:e2319709121. [PMID: 39356668 PMCID: PMC11474058 DOI: 10.1073/pnas.2319709121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024] Open
Abstract
Central nervous system neurons manifest a rich diversity of selectivity profiles-whose precise role is still poorly understood. Following the striking success of artificial networks, a major debate has emerged concerning their usefulness in explaining neuronal properties. Here we propose that finding parallels between artificial and neuronal networks is informative precisely because these systems are so different from each other. Our argument is based on an extension of the concept of convergent evolution-well established in biology-to the domain of artificial systems. Applying this concept to different areas and levels of the cortical hierarchy can be a powerful tool for elucidating the functional role of well-known cortical selectivities. Importantly, we further demonstrate that such parallels can uncover novel functionalities by showing that grid cells in the entorhinal cortex can be modeled to function as a set of basis functions in a lossy representation such as the well-known JPEG compression. Thus, contrary to common intuition, here we illustrate that finding parallels with artificial systems provides novel and informative insights, particularly in those cases that are far removed from realistic brain biology.
Collapse
Affiliation(s)
- Erez Simony
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot76100, Israel
- Faculty of Electrical Engineering, Holon Institute of Technology, Holon5810201, Israel
| | - Shany Grossman
- Max Planck Institute for Human Development, Berlin14195, Germany
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, Berlin14195, Germany
- Institute of Psychology, Universitsät Hamburg, Hamburg20146, Germany
| | - Rafael Malach
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot76100, Israel
| |
Collapse
|
10
|
Croteau J, Fornaciai M, Huber DE, Park J. The divisive normalization model of visual number sense: model predictions and experimental confirmation. Cereb Cortex 2024; 34:bhae418. [PMID: 39441025 DOI: 10.1093/cercor/bhae418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 09/29/2024] [Accepted: 10/02/2024] [Indexed: 10/25/2024] Open
Abstract
Our intuitive sense of number allows rapid estimation for the number of objects (numerosity) in a scene. How does the continuous nature of neural information processing create a discrete representation of number? A neurocomputational model with divisive normalization explains this process and existing data; however, a successful model should not only explain existing data but also generate novel predictions. Here, we experimentally test novel predictions of this model to evaluate its merit for explaining mechanisms of numerosity perception. We did so by consideration of the coherence illusion: the underestimation of number for arrays containing heterogeneous compared to homogeneous items. First, we established the existence of the coherence illusion for homogeneity manipulations of both area and orientation of items in an array. Second, despite the behavioral similarity, the divisive normalization model predicted that these two illusions should reflect activity in different stages of visual processing. Finally, visual evoked potentials from an electroencephalography experiment confirmed these predictions, showing that area and orientation coherence modulate brain responses at distinct latencies and topographies. These results demonstrate the utility of the divisive normalization model for explaining numerosity perception, according to which numerosity perception is a byproduct of canonical neurocomputations that exist throughout the visual pathway.
Collapse
Affiliation(s)
- Jenna Croteau
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, MA 01003, United States
| | - Michele Fornaciai
- Institute for Research in Psychology (IPSY) and Institute of Neuroscience (IoNS), Université Catholique de Louvain, Place du Cardinal Mercier 10, Louvain-la-Neuve, 1348, Belgium
| | - David E Huber
- Department of Psychology and Neuroscience, University of Colorado Boulder, Muenzinger D244, 345 UCB, Boulder, CO 80309, United States
| | - Joonkoo Park
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, MA 01003, United States
- Commonwealth Honors College, University of Massachusetts Amherst, 157 Commonwealth Avenue, Amherst, MA 01003, United States
| |
Collapse
|
11
|
McGrath SW, Russin J, Pavlick E, Feiman R. How Can Deep Neural Networks Inform Theory in Psychological Science? CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2024; 33:325-333. [PMID: 39949337 PMCID: PMC11824574 DOI: 10.1177/09637214241268098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Over the last decade, deep neural networks (DNNs) have transformed the state of the art in artificial intelligence. In domains like language production and reasoning, long considered uniquely human abilities, contemporary models have proven capable of strikingly human-like performance. However, in contrast to classical symbolic models, neural networks can be inscrutable even to their designers, making it unclear what significance, if any, they have for theories of human cognition. Two extreme reactions are common. Neural network enthusiasts argue that, because the inner workings of DNNs do not seem to resemble any of the traditional constructs of psychological or linguistic theory, their success renders these theories obsolete and motivates a radical paradigm shift. Neural network skeptics instead take this inability to interpret DNNs in psychological terms to mean that their success is irrelevant to psychological science. In this paper, we review recent work that suggests that the internal mechanisms of DNNs can, in fact, be interpreted in the functional terms characteristic of psychological explanations. We argue that this undermines the shared assumption of both extremes and opens the door for DNNs to inform theories of cognition and its development.
Collapse
Affiliation(s)
- Sam Whitman McGrath
- Philosophy Department, Department of Cognitive, Linguistic & Psychological Sciences, Brown University
| | - Jacob Russin
- Department of Computer Science, Department of Cognitive, Linguistic & Psychological Sciences, Brown University
| | | | - Roman Feiman
- Department of Cognitive, Linguistic & Psychological Sciences, Program in Linguistics, Brown University; Postal Address: Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Metcalf Research Building, 190 Thayer St., Providence, RI 02912
| |
Collapse
|
12
|
Ravichandran N, Lansner A, Herman P. Spiking representation learning for associative memories. Front Neurosci 2024; 18:1439414. [PMID: 39371606 PMCID: PMC11450452 DOI: 10.3389/fnins.2024.1439414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 08/29/2024] [Indexed: 10/08/2024] Open
Abstract
Networks of interconnected neurons communicating through spiking signals offer the bedrock of neural computations. Our brain's spiking neural networks have the computational capacity to achieve complex pattern recognition and cognitive functions effortlessly. However, solving real-world problems with artificial spiking neural networks (SNNs) has proved to be difficult for a variety of reasons. Crucially, scaling SNNs to large networks and processing large-scale real-world datasets have been challenging, especially when compared to their non-spiking deep learning counterparts. The critical operation that is needed of SNNs is the ability to learn distributed representations from data and use these representations for perceptual, cognitive and memory operations. In this work, we introduce a novel SNN that performs unsupervised representation learning and associative memory operations leveraging Hebbian synaptic and activity-dependent structural plasticity coupled with neuron-units modelled as Poisson spike generators with sparse firing (~1 Hz mean and ~100 Hz maximum firing rate). Crucially, the architecture of our model derives from the neocortical columnar organization and combines feedforward projections for learning hidden representations and recurrent projections for forming associative memories. We evaluated the model on properties relevant for attractor-based associative memories such as pattern completion, perceptual rivalry, distortion resistance, and prototype extraction.
Collapse
Affiliation(s)
- Naresh Ravichandran
- Computational Cognitive Brain Science Group, Department of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Anders Lansner
- Computational Cognitive Brain Science Group, Department of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| | - Pawel Herman
- Computational Cognitive Brain Science Group, Department of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- Digital Futures, KTH Royal Institute of Technology, Stockholm, Sweden
- Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| |
Collapse
|
13
|
Gigerenzer G. Psychological AI: Designing Algorithms Informed by Human Psychology. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:839-848. [PMID: 37522323 PMCID: PMC11373155 DOI: 10.1177/17456916231180597] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Psychological artificial intelligence (AI) applies insights from psychology to design computer algorithms. Its core domain is decision-making under uncertainty, that is, ill-defined situations that can change in unexpected ways rather than well-defined, stable problems, such as chess and Go. Psychological theories about heuristic processes under uncertainty can provide possible insights. I provide two illustrations. The first shows how recency-the human tendency to rely on the most recent information and ignore base rates-can be built into a simple algorithm that predicts the flu substantially better than did Google Flu Trends's big-data algorithms. The second uses a result from memory research-the paradoxical effect that making numbers less precise increases recall-in the design of algorithms that predict recidivism. These case studies provide an existence proof that psychological AI can help design efficient and transparent algorithms.
Collapse
|
14
|
Clapp M, Bahuguna J, Giossi C, Rubin JE, Verstynen T, Vich C. CBGTPy: An extensible cortico-basal ganglia-thalamic framework for modeling biological decision making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.05.556301. [PMID: 37732280 PMCID: PMC10508778 DOI: 10.1101/2023.09.05.556301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Here we introduce CBGTPy, a virtual environment for designing and testing goal-directed agents with internal dynamics that are modeled on the cortico-basal-ganglia-thalamic (CBGT) pathways in the mammalian brain. CBGTPy enables researchers to investigate the internal dynamics of the CBGT system during a variety of tasks, allowing for the formation of testable predictions about animal behavior and neural activity. The framework has been designed around the principle of flexibility, such that many experimental parameters in a decision making paradigm can be easily defined and modified. Here we demonstrate the capabilities of CBGTPy across a range of single and multi-choice tasks, highlighting the ease of set up and the biologically realistic behavior that it produces. We show that CBGTPy is extensible enough to apply to a range of experimental protocols and to allow for the implementation of model extensions with minimal developmental effort.
Collapse
Affiliation(s)
- Matthew Clapp
- Department of Psychology & Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Jyotika Bahuguna
- Department of Psychology & Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Cristina Giossi
- Departament de Ciències Matemàtiques i Informàtica, Universitat de les Illes Balears, Palma, Spain
- Institute of Applied Computing and Community Code, Palma, Spain
| | - Jonathan E. Rubin
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
- Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Timothy Verstynen
- Department of Psychology & Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Center for the Neural Basis of Cognition, Pittsburgh, Pennsylvania, United States of America
| | - Catalina Vich
- Departament de Ciències Matemàtiques i Informàtica, Universitat de les Illes Balears, Palma, Spain
- Institute of Applied Computing and Community Code, Palma, Spain
| |
Collapse
|
15
|
Kallmayer A, Võ MLH. Anchor objects drive realism while diagnostic objects drive categorization in GAN generated scenes. COMMUNICATIONS PSYCHOLOGY 2024; 2:68. [PMID: 39242968 PMCID: PMC11332195 DOI: 10.1038/s44271-024-00119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/15/2024] [Indexed: 09/09/2024]
Abstract
Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N1 = 50, N2 = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.
Collapse
Affiliation(s)
- Aylin Kallmayer
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany.
| | - Melissa L-H Võ
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany
| |
Collapse
|
16
|
Soydaner D, Wagemans J. Unveiling the factors of aesthetic preferences with explainable AI. Br J Psychol 2024. [PMID: 38758182 DOI: 10.1111/bjop.12707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/14/2024] [Indexed: 05/18/2024]
Abstract
The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing several different machine learning (ML) models that focus on aesthetic attributes known to influence preferences. Our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology compares the performance of various ML models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, namely Aesthetics with Attributes Database (AADB), Explainable Visual Aesthetics (EVA), and Personalized image Aesthetics database with Rich Attributes (PARA), providing insights into the roles of attributes and their interactions. Finally, our study presents ML models for aesthetics research, alongside the introduction of XAI. Our aim is to shed light on the complex nature of aesthetic preferences in images through ML and to provide a deeper understanding of the attributes that influence aesthetic judgements.
Collapse
Affiliation(s)
- Derya Soydaner
- Department of Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| | - Johan Wagemans
- Department of Brain and Cognition, University of Leuven (KU Leuven), Leuven, Belgium
| |
Collapse
|
17
|
Caplette L, Turk-Browne NB. Computational reconstruction of mental representations using human behavior. Nat Commun 2024; 15:4183. [PMID: 38760341 PMCID: PMC11101448 DOI: 10.1038/s41467-024-48114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024] Open
Abstract
Revealing how the mind represents information is a longstanding goal of cognitive science. However, there is currently no framework for reconstructing the broad range of mental representations that humans possess. Here, we ask participants to indicate what they perceive in images made of random visual features in a deep neural network. We then infer associations between the semantic features of their responses and the visual features of the images. This allows us to reconstruct the mental representations of multiple visual concepts, both those supplied by participants and other concepts extrapolated from the same semantic space. We validate these reconstructions in separate participants and further generalize our approach to predict behavior for new stimuli and in a new task. Finally, we reconstruct the mental representations of individual observers and of a neural network. This framework enables a large-scale investigation of conceptual representations.
Collapse
Affiliation(s)
| | - Nicholas B Turk-Browne
- Department of Psychology, Yale University, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| |
Collapse
|
18
|
Depeweg S, Rothkopf CA, Jäkel F. Solving Bongard Problems With a Visual Language and Pragmatic Constraints. Cogn Sci 2024; 48:e13432. [PMID: 38700123 DOI: 10.1111/cogs.13432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 02/15/2024] [Accepted: 02/26/2024] [Indexed: 05/05/2024]
Abstract
More than 50 years ago, Bongard introduced 100 visual concept learning problems as a challenge for artificial vision systems. These problems are now known as Bongard problems. Although they are well known in cognitive science and artificial intelligence, only very little progress has been made toward building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing compositional visual concepts based on this vocabulary. Using this language and Bayesian inference, concepts can be induced from the examples that are provided in each problem. We find a reasonable agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself for a subset of 35 problems. While this approach is far from solving Bongard problems like humans, it does considerably better than previous approaches. We discuss the issues we encountered while developing this system and their continuing relevance for understanding visual cognition. For instance, contrary to other concept learning problems, the examples are not random in Bongard problems; instead they are carefully chosen to ensure that the concept can be induced, and we found it helpful to take the resulting pragmatic constraints into account.
Collapse
Affiliation(s)
| | - Contantin A Rothkopf
- Centre for Cognitive Science & Institute of Psychology, Technische Universität Darmstadt
- Frankfurt Institute for Advanced Studies, Frankfurt am Main
| | - Frank Jäkel
- Centre for Cognitive Science & Institute of Psychology, Technische Universität Darmstadt
| |
Collapse
|
19
|
Nara S, Kaiser D. Integrative processing in artificial and biological vision predicts the perceived beauty of natural images. SCIENCE ADVANCES 2024; 10:eadi9294. [PMID: 38427730 PMCID: PMC10906925 DOI: 10.1126/sciadv.adi9294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 01/29/2024] [Indexed: 03/03/2024]
Abstract
Previous research shows that the beauty of natural images is already determined during perceptual analysis. However, it is unclear which perceptual computations give rise to the perception of beauty. Here, we tested whether perceived beauty is predicted by spatial integration across an image, a perceptual computation that reduces processing demands by aggregating image parts into more efficient representations of the whole. We quantified integrative processing in an artificial deep neural network model, where the degree of integration was determined by the amount of deviation between activations for the whole image and its constituent parts. This quantification of integration predicted beauty ratings for natural images across four studies with different stimuli and designs. In a complementary functional magnetic resonance imaging study, we show that integrative processing in human visual cortex similarly predicts perceived beauty. Together, our results establish integration as a computational principle that facilitates perceptual analysis and thereby mediates the perception of beauty.
Collapse
Affiliation(s)
- Sanjeev Nara
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, Marburg, Germany
| |
Collapse
|
20
|
Pezzulo G, Parr T, Cisek P, Clark A, Friston K. Generating meaning: active inference and the scope and limits of passive AI. Trends Cogn Sci 2024; 28:97-112. [PMID: 37973519 DOI: 10.1016/j.tics.2023.10.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/03/2023] [Accepted: 10/05/2023] [Indexed: 11/19/2023]
Abstract
Prominent accounts of sentient behavior depict brains as generative models of organismic interaction with the world, evincing intriguing similarities with current advances in generative artificial intelligence (AI). However, because they contend with the control of purposive, life-sustaining sensorimotor interactions, the generative models of living organisms are inextricably anchored to the body and world. Unlike the passive models learned by generative AI systems, they must capture and control the sensory consequences of action. This allows embodied agents to intervene upon their worlds in ways that constantly put their best models to the test, thus providing a solid bedrock that is - we argue - essential to the development of genuine understanding. We review the resulting implications and consider future directions for generative AI.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy.
| | - Thomas Parr
- Nuffield Department of Clinical Neurosciences, University of Oxford
| | - Paul Cisek
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| | - Andy Clark
- Department of Philosophy, University of Sussex, Brighton, UK; Department of Informatics, University of Sussex, Brighton, UK; Department of Philosophy, Macquarie University, Sydney, New South Wales, Australia
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London, UK; VERSES AI Research Lab, Los Angeles, CA, USA
| |
Collapse
|
21
|
Nadler EO, Darragh-Ford E, Desikan BS, Conaway C, Chu M, Hull T, Guilbeault D. Divergences in color perception between deep neural networks and humans. Cognition 2023; 241:105621. [PMID: 37716312 DOI: 10.1016/j.cognition.2023.105621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/23/2023] [Accepted: 09/09/2023] [Indexed: 09/18/2023]
Abstract
Deep neural networks (DNNs) are increasingly proposed as models of human vision, bolstered by their impressive performance on image classification and object recognition tasks. Yet, the extent to which DNNs capture fundamental aspects of human vision such as color perception remains unclear. Here, we develop novel experiments for evaluating the perceptual coherence of color embeddings in DNNs, and we assess how well these algorithms predict human color similarity judgments collected via an online survey. We find that state-of-the-art DNN architectures - including convolutional neural networks and vision transformers - provide color similarity judgments that strikingly diverge from human color judgments of (i) images with controlled color properties, (ii) images generated from online searches, and (iii) real-world images from the canonical CIFAR-10 dataset. We compare DNN performance against an interpretable and cognitively plausible model of color perception based on wavelet decomposition, inspired by foundational theories in computational neuroscience. While one deep learning model - a convolutional DNN trained on a style transfer task - captures some aspects of human color perception, our wavelet algorithm provides more coherent color embeddings that better predict human color judgments compared to all DNNs we examine. These results hold when altering the high-level visual task used to train similar DNN architectures (e.g., image classification versus image segmentation), as well as when examining the color embeddings of different layers in a given DNN architecture. These findings break new ground in the effort to analyze the perceptual representations of machine learning algorithms and to improve their ability to serve as cognitively plausible models of human vision. Implications for machine learning, human perception, and embodied cognition are discussed.
Collapse
Affiliation(s)
- Ethan O Nadler
- Carnegie Observatories, USA; Department of Physics, University of Southern California, USA.
| | - Elise Darragh-Ford
- Kavli Institute for Particle Astrophysics and Cosmology and Department of Physics, Stanford University, USA
| | - Bhargav Srinivasa Desikan
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Switzerland; Knowledge Lab, University of Chicago, USA
| | | | - Mark Chu
- School of the Arts, Columbia University, USA
| | | | | |
Collapse
|
22
|
Rubinov M. Circular and unified analysis in network neuroscience. eLife 2023; 12:e79559. [PMID: 38014843 PMCID: PMC10684154 DOI: 10.7554/elife.79559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 10/18/2023] [Indexed: 11/29/2023] Open
Abstract
Genuinely new discovery transcends existing knowledge. Despite this, many analyses in systems neuroscience neglect to test new speculative hypotheses against benchmark empirical facts. Some of these analyses inadvertently use circular reasoning to present existing knowledge as new discovery. Here, I discuss that this problem can confound key results and estimate that it has affected more than three thousand studies in network neuroscience over the last decade. I suggest that future studies can reduce this problem by limiting the use of speculative evidence, integrating existing knowledge into benchmark models, and rigorously testing proposed discoveries against these models. I conclude with a summary of practical challenges and recommendations.
Collapse
Affiliation(s)
- Mika Rubinov
- Departments of Biomedical Engineering, Computer Science, and Psychology, Vanderbilt UniversityNashvilleUnited States
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
| |
Collapse
|
23
|
Finn ES, Poldrack RA, Shine JM. Functional neuroimaging as a catalyst for integrated neuroscience. Nature 2023; 623:263-273. [PMID: 37938706 DOI: 10.1038/s41586-023-06670-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023]
Abstract
Functional magnetic resonance imaging (fMRI) enables non-invasive access to the awake, behaving human brain. By tracking whole-brain signals across a diverse range of cognitive and behavioural states or mapping differences associated with specific traits or clinical conditions, fMRI has advanced our understanding of brain function and its links to both normal and atypical behaviour. Despite this headway, progress in human cognitive neuroscience that uses fMRI has been relatively isolated from rapid advances in other subdomains of neuroscience, which themselves are also somewhat siloed from one another. In this Perspective, we argue that fMRI is well-placed to integrate the diverse subfields of systems, cognitive, computational and clinical neuroscience. We first summarize the strengths and weaknesses of fMRI as an imaging tool, then highlight examples of studies that have successfully used fMRI in each subdomain of neuroscience. We then provide a roadmap for the future advances that will be needed to realize this integrative vision. In this way, we hope to demonstrate how fMRI can help usher in a new era of interdisciplinary coherence in neuroscience.
Collapse
Affiliation(s)
- Emily S Finn
- Department of Psychological and Brain Sciences, Dartmouth College, Dartmouth, NH, USA.
| | | | - James M Shine
- School of Medical Sciences, University of Sydney, Sydney, New South Wales, Australia.
| |
Collapse
|
24
|
Gu Z, Jamison K, Sabuncu MR, Kuceyeski A. Human brain responses are modulated when exposed to optimized natural images or synthetically generated images. Commun Biol 2023; 6:1076. [PMID: 37872319 PMCID: PMC10593916 DOI: 10.1038/s42003-023-05440-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023] Open
Abstract
Understanding how human brains interpret and process information is important. Here, we investigated the selectivity and inter-individual differences in human brain responses to images via functional MRI. In our first experiment, we found that images predicted to achieve maximal activations using a group level encoding model evoke higher responses than images predicted to achieve average activations, and the activation gain is positively associated with the encoding model accuracy. Furthermore, anterior temporal lobe face area (aTLfaces) and fusiform body area 1 had higher activation in response to maximal synthetic images compared to maximal natural images. In our second experiment, we found that synthetic images derived using a personalized encoding model elicited higher responses compared to synthetic images from group-level or other subjects' encoding models. The finding of aTLfaces favoring synthetic images than natural images was also replicated. Our results indicate the possibility of using data-driven and generative approaches to modulate macro-scale brain region responses and probe inter-individual differences in and functional specialization of the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY, USA
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
25
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
26
|
Westfall M. Toward biologically plausible artificial vision. Behav Brain Sci 2023; 46:e290. [PMID: 37766603 DOI: 10.1017/s0140525x23001930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
Quilty-Dunn et al. argue that deep convolutional neural networks (DCNNs) optimized for image classification exemplify structural disanalogies to human vision. A different kind of artificial vision - found in reinforcement-learning agents navigating artificial three-dimensional environments - can be expected to be more human-like. Recent work suggests that language-like representations substantially improves these agents' performance, lending some indirect support to the language-of-thought hypothesis (LoTH).
Collapse
Affiliation(s)
- Mason Westfall
- Department of Philosophy, Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO, USA ://www.masonwestfall.com
| |
Collapse
|
27
|
Abstract
Deep neural networks (DNNs) are machine learning algorithms that have revolutionized computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. In this article, we review evidence regarding current DNNs as adequate behavioral models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models and to understand model quality as a multidimensional concept in which clarity about modeling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that, as of today, DNNs should only be regarded as promising-but not yet adequate-computational models of human core object recognition behavior. On the way, we dispel several myths surrounding DNNs in vision science.
Collapse
Affiliation(s)
- Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany;
| | | |
Collapse
|
28
|
Lin C, Bulls LS, Tepfer LJ, Vyas AD, Thornton MA. Advancing Naturalistic Affective Science with Deep Learning. AFFECTIVE SCIENCE 2023; 4:550-562. [PMID: 37744976 PMCID: PMC10514024 DOI: 10.1007/s42761-023-00215-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 08/03/2023] [Indexed: 09/26/2023]
Abstract
People express their own emotions and perceive others' emotions via a variety of channels, including facial movements, body gestures, vocal prosody, and language. Studying these channels of affective behavior offers insight into both the experience and perception of emotion. Prior research has predominantly focused on studying individual channels of affective behavior in isolation using tightly controlled, non-naturalistic experiments. This approach limits our understanding of emotion in more naturalistic contexts where different channels of information tend to interact. Traditional methods struggle to address this limitation: manually annotating behavior is time-consuming, making it infeasible to do at large scale; manually selecting and manipulating stimuli based on hypotheses may neglect unanticipated features, potentially generating biased conclusions; and common linear modeling approaches cannot fully capture the complex, nonlinear, and interactive nature of real-life affective processes. In this methodology review, we describe how deep learning can be applied to address these challenges to advance a more naturalistic affective science. First, we describe current practices in affective research and explain why existing methods face challenges in revealing a more naturalistic understanding of emotion. Second, we introduce deep learning approaches and explain how they can be applied to tackle three main challenges: quantifying naturalistic behaviors, selecting and manipulating naturalistic stimuli, and modeling naturalistic affective processes. Finally, we describe the limitations of these deep learning methods, and how these limitations might be avoided or mitigated. By detailing the promise and the peril of deep learning, this review aims to pave the way for a more naturalistic affective science.
Collapse
Affiliation(s)
- Chujun Lin
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Landry S. Bulls
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Lindsey J. Tepfer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Amisha D. Vyas
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Mark A. Thornton
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| |
Collapse
|
29
|
Lindeberg T. Covariance properties under natural image transformations for the generalised Gaussian derivative model for visual receptive fields. Front Comput Neurosci 2023; 17:1189949. [PMID: 37398936 PMCID: PMC10311448 DOI: 10.3389/fncom.2023.1189949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/23/2023] [Indexed: 07/04/2023] Open
Abstract
The property of covariance, also referred to as equivariance, means that an image operator is well-behaved under image transformations, in the sense that the result of applying the image operator to a transformed input image gives essentially a similar result as applying the same image transformation to the output of applying the image operator to the original image. This paper presents a theory of geometric covariance properties in vision, developed for a generalised Gaussian derivative model of receptive fields in the primary visual cortex and the lateral geniculate nucleus, which, in turn, enable geometric invariance properties at higher levels in the visual hierarchy. It is shown how the studied generalised Gaussian derivative model for visual receptive fields obeys true covariance properties under spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations. These covariance properties imply that a vision system, based on image and video measurements in terms of the receptive fields according to the generalised Gaussian derivative model, can, to first order of approximation, handle the image and video deformations between multiple views of objects delimited by smooth surfaces, as well as between multiple views of spatio-temporal events, under varying relative motions between the objects and events in the world and the observer. We conclude by describing implications of the presented theory for biological vision, regarding connections between the variabilities of the shapes of biological visual receptive fields and the variabilities of spatial and spatio-temporal image structures under natural image transformations. Specifically, we formulate experimentally testable biological hypotheses as well as needs for measuring population statistics of receptive field characteristics, originating from predictions from the presented theory, concerning the extent to which the shapes of the biological receptive fields in the primary visual cortex span the variabilities of spatial and spatio-temporal image structures induced by natural image transformations, based on geometric covariance properties.
Collapse
Affiliation(s)
- Tony Lindeberg
- Computational Brain Science Lab, Division of Computational Science and Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
30
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
31
|
Akbarinia A, Morgenstern Y, Gegenfurtner KR. Contrast sensitivity function in deep networks. Neural Netw 2023; 164:228-244. [PMID: 37156217 DOI: 10.1016/j.neunet.2023.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/14/2023] [Accepted: 04/18/2023] [Indexed: 05/10/2023]
Abstract
The contrast sensitivity function (CSF) is a fundamental signature of the visual system that has been measured extensively in several species. It is defined by the visibility threshold for sinusoidal gratings at all spatial frequencies. Here, we investigated the CSF in deep neural networks using the same 2AFC contrast detection paradigm as in human psychophysics. We examined 240 networks pretrained on several tasks. To obtain their corresponding CSFs, we trained a linear classifier on top of the extracted features from frozen pretrained networks. The linear classifier is exclusively trained on a contrast discrimination task with natural images. It has to find which of the two input images has higher contrast. The network's CSF is measured by detecting which one of two images contains a sinusoidal grating of varying orientation and spatial frequency. Our results demonstrate characteristics of the human CSF are manifested in deep networks both in the luminance channel (a band-limited inverted U-shaped function) and in the chromatic channels (two low-pass functions of similar properties). The exact shape of the networks' CSF appears to be task-dependent. The human CSF is better captured by networks trained on low-level visual tasks such as image-denoising or autoencoding. However, human-like CSF also emerges in mid- and high-level tasks such as edge detection and object recognition. Our analysis shows that human-like CSF appears in all architectures but at different depths of processing, some at early layers, while others in intermediate and final layers. Overall, these results suggest that (i) deep networks model the human CSF faithfully, making them suitable candidates for applications of image quality and compression, (ii) efficient/purposeful processing of the natural world drives the CSF shape, and (iii) visual representation from all levels of visual hierarchy contribute to the tuning curve of the CSF, in turn implying a function which we intuitively think of as modulated by low-level visual features may arise as a consequence of pooling from a larger set of neurons at all levels of the visual system.
Collapse
Affiliation(s)
- Arash Akbarinia
- Department of Experimental Psychology, University of Giessen, Germany.
| | - Yaniv Morgenstern
- Department of Experimental Psychology, University of Giessen, Germany; Faculty of Psychology and Educational Sciences, KU Leuven, Belgium
| | | |
Collapse
|
32
|
Adolfi F, Bowers JS, Poeppel D. Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Netw 2023; 162:199-211. [PMID: 36913820 DOI: 10.1016/j.neunet.2023.02.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 02/15/2023] [Accepted: 02/21/2023] [Indexed: 03/15/2023]
Abstract
Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.
Collapse
Affiliation(s)
- Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; University of Bristol, School of Psychological Science, Bristol, United Kingdom.
| | - Jeffrey S Bowers
- University of Bristol, School of Psychological Science, Bristol, United Kingdom
| | - David Poeppel
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Department of Psychology, New York University, NY, United States; Max Planck NYU Center for Language, Music, and Emotion, Frankfurt, Germany, New York, NY, United States
| |
Collapse
|
33
|
Clark KB. Neural Field Continuum Limits and the Structure-Function Partitioning of Cognitive-Emotional Brain Networks. BIOLOGY 2023; 12:352. [PMID: 36979044 PMCID: PMC10045557 DOI: 10.3390/biology12030352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/07/2023] [Accepted: 02/13/2023] [Indexed: 02/25/2023]
Abstract
In The cognitive-emotional brain, Pessoa overlooks continuum effects on nonlinear brain network connectivity by eschewing neural field theories and physiologically derived constructs representative of neuronal plasticity. The absence of this content, which is so very important for understanding the dynamic structure-function embedding and partitioning of brains, diminishes the rich competitive and cooperative nature of neural networks and trivializes Pessoa's arguments, and similar arguments by other authors, on the phylogenetic and operational significance of an optimally integrated brain filled with variable-strength neural connections. Riemannian neuromanifolds, containing limit-imposing metaplastic Hebbian- and antiHebbian-type control variables, simulate scalable network behavior that is difficult to capture from the simpler graph-theoretic analysis preferred by Pessoa and other neuroscientists. Field theories suggest the partitioning and performance benefits of embedded cognitive-emotional networks that optimally evolve between exotic classical and quantum computational phases, where matrix singularities and condensations produce degenerate structure-function homogeneities unrealistic of healthy brains. Some network partitioning, as opposed to unconstrained embeddedness, is thus required for effective execution of cognitive-emotional network functions and, in our new era of neuroscience, should be considered a critical aspect of proper brain organization and operation.
Collapse
Affiliation(s)
- Kevin B. Clark
- Cures Within Reach, Chicago, IL 60602, USA;
- Felidae Conservation Fund, Mill Valley, CA 94941, USA
- Campus and Domain Champions Program, Multi-Tier Assistance, Training, and Computational Help (MATCH) Track, National Science Foundation’s Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support (ACCESS), https://access-ci.org/
- Expert Network, Penn Center for Innovation, University of Pennsylvania, Philadelphia, PA 19104, USA
- Network for Life Detection (NfoLD), NASA Astrobiology Program, NASA Ames Research Center, Mountain View, CA 94035, USA
- Multi-Omics and Systems Biology & Artificial Intelligence and Machine Learning Analysis Working Groups, NASA GeneLab, NASA Ames Research Center, Mountain View, CA 94035, USA
- Frontier Development Lab, NASA Ames Research Center, Mountain View, CA 94035, USA & SETI Institute, Mountain View, CA 94043, USA
- Peace Innovation Institute, The Hague 2511, Netherlands & Stanford University, Palo Alto, CA 94305, USA
- Shared Interest Group for Natural and Artificial Intelligence (sigNAI), Max Planck Alumni Association, 14057 Berlin, Germany
- Biometrics and Nanotechnology Councils, Institute for Electrical and Electronics Engineers (IEEE), New York, NY 10016, USA
| |
Collapse
|