1
|
DiCarlo JJ, Yamins DLK, Ferguson ME, Fedorenko E, Bethge M, Bonnen T, Schrimpf M. Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision. Behav Brain Sci 2023; 46:e390. [PMID: 38054303 DOI: 10.1017/s0140525x23001607] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
In the target article, Bowers et al. dispute deep artificial neural network (ANN) models as the currently leading models of human vision without producing alternatives. They eschew the use of public benchmarking platforms to compare vision models with the brain and behavior, and they advocate for a fragmented, phenomenon-specific modeling approach. These are unconstructive to scientific progress. We outline how the Brain-Score community is moving forward to add new model-to-human comparisons to its community-transparent suite of benchmarks.
Collapse
Affiliation(s)
- James J DiCarlo
- Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA ; https://dicarlolab.mit.edu ; https://evlab.mit.edu/ ; https://mschrimpf.com/
| | - Daniel L K Yamins
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA ; http://neuroailab.stanford.edu/research.html
| | - Michael E Ferguson
- Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA ; https://dicarlolab.mit.edu ; https://evlab.mit.edu/ ; https://mschrimpf.com/
| | - Evelina Fedorenko
- Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA ; https://dicarlolab.mit.edu ; https://evlab.mit.edu/ ; https://mschrimpf.com/
| | - Matthias Bethge
- Tübingen AI Center, University of Tübingen, Tübingen, Germany ; https://bethgelab.org/
| | - Tyler Bonnen
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA ; http://neuroailab.stanford.edu/research.html
| | - Martin Schrimpf
- Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA ; https://dicarlolab.mit.edu ; https://evlab.mit.edu/ ; https://mschrimpf.com/
- École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
2
|
Bowers JS, Malhotra G, Dujmović M, Montero ML, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Clarifying status of DNNs as models of human vision. Behav Brain Sci 2023; 46:e415. [PMID: 38054298 DOI: 10.1017/s0140525x23002777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton L Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | | | - Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
3
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
4
|
Zerroug A, Vaishnav M, Colin J, Musslick S, Serre T. A Benchmark for Compositional Visual Reasoning. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2022; 35:29776-29788. [PMID: 37534101 PMCID: PMC10396074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, there remains a major gap between humans and AI systems in terms of the sample efficiency with which they learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality - allowing them to efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms. We take inspiration from fluid intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abstract rules and generating image datasets corresponding to these rules at scale. Our proposed benchmark includes measures of sample efficiency, generalization, compositionality, and transfer across task rules. We systematically evaluate modern neural architectures and find that convolutional architectures surpass transformer-based architectures across all performance measures in most data regimes. However, all computational models are much less data efficient than humans, even after learning informative visual representations using self-supervision. Overall, we hope our challenge will spur interest in developing neural architectures that can learn to harness compositionality for more efficient learning.
Collapse
Affiliation(s)
- Aimen Zerroug
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, France
- Carney Institute for Brain Science, Dept. of Cognitive Linguistic & Psychological Sciences Brown University, Providence, RI 02912
- Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, France
| | - Mohit Vaishnav
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, France
- Carney Institute for Brain Science, Dept. of Cognitive Linguistic & Psychological Sciences Brown University, Providence, RI 02912
- Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, France
| | - Julien Colin
- Carney Institute for Brain Science, Dept. of Cognitive Linguistic & Psychological Sciences Brown University, Providence, RI 02912
| | - Sebastian Musslick
- Carney Institute for Brain Science, Dept. of Cognitive Linguistic & Psychological Sciences Brown University, Providence, RI 02912
| | - Thomas Serre
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, France
- Carney Institute for Brain Science, Dept. of Cognitive Linguistic & Psychological Sciences Brown University, Providence, RI 02912
| |
Collapse
|