1
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
2
|
Yeaton JD, Grainger J. Positional cueing, string location variability, and letter-in-string identification. Acta Psychol (Amst) 2022; 223:103510. [PMID: 35077951 DOI: 10.1016/j.actpsy.2022.103510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 01/15/2022] [Accepted: 01/17/2022] [Indexed: 11/24/2022] Open
Abstract
In three experiments we measured accuracy in identifying a single letter among a string of five briefly presented consonants followed by a post-mask. The position of the to-be-identified letter was either indicated by an ordinal cue (e.g., position 2) or an underscore cue (e.g., #####). In Experiment 1 the ordinal cue was presented prior to onset of the letter string, and the underscore cue presented at string offset. In Experiments 2 and 3, both the ordinal and the underscore cues were pre-cues. In all experiments, letter strings could either appear centered on fixation or shifted randomly to the left or to the right. Participants were tested in separate blocks of trials for each of the four conditions generated by the combination of cue-type and string-location variability. In Experiment 1, letter identification accuracy was higher with ordinal cues and with fixed string locations, and ordinal cueing was more affected by string location variability. In Experiments 2 and 3, letter identification accuracy was higher with underscore pre-cues. We conclude that under conditions of brief stimulus durations (100 ms) and backward masking, letter-in-string identification accuracy is determined by read-out from location-specific letter detectors, independently of the type of cueing. Differences in the effectiveness of different types of cue are determined by differences in the ease of isolating a given gaze-centered location, and by differences in the ease with which attention can be directed to that location.
Collapse
|
3
|
Biscione V, Bowers JS. Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Netw 2022; 150:222-236. [DOI: 10.1016/j.neunet.2022.02.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 01/14/2022] [Accepted: 02/23/2022] [Indexed: 10/18/2022]
|
4
|
Blything R, Biscione V, Vankov II, Ludwig CJH, Bowers JS. The human visual system and CNNs can both support robust online translation tolerance following extreme displacements. J Vis 2021; 21:9. [PMID: 33620380 PMCID: PMC7910631 DOI: 10.1167/jov.21.2.9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).
Collapse
Affiliation(s)
- Ryan Blything
- School of Psychological Science, University of Bristol, Bristol, UK.,
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK.,
| | - Ivan I Vankov
- Department of Cognitive Science and Psychology, Sofia, New Bulgarian University, Bulgaria.,
| | | | - Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK.,
| |
Collapse
|
5
|
Grainger J. Orthographic processing: A 'mid-level' vision of reading: The 44th Sir Frederic Bartlett Lecture. Q J Exp Psychol (Hove) 2018; 71:335-359. [PMID: 28376655 DOI: 10.1080/17470218.2017.1314515] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
I will describe how orthographic processing acts as a central interface between visual and linguistic processing during reading, and as such can be considered to be the 'mid-level vision' of reading research. In order to make this case, I first summarize the evidence in favour of letter-based word recognition before examining work investigating how orthographic similarities among words influence single word reading. I describe how evidence gradually accumulated against traditional measures of orthographic similarity and the associated theories of orthographic processing, forcing a reconsideration of how letter-position information is represented by skilled readers. Then, I present the theoretical framework that was developed to explain these findings, with a focus on the distinction between location-specific and location-invariant orthographic representations. Finally, I describe work extending this theoretical framework in two main directions: first, to the realm of reading development, with the aim to specify the key changes in the processing of letters and letter strings that accompany successful learning to read, and second, to the realm of sentence reading, in order to specify how orthographic information can be processed across several words in parallel, and how skilled readers keep track of which letters belong to which words.
Collapse
Affiliation(s)
- Jonathan Grainger
- Laboratoire de Psychologie Cognitive, Aix-Marseille University & CNRS, Marseille, France
| |
Collapse
|
6
|
Bowers JS. Parallel Distributed Processing Theory in the Age of Deep Networks. Trends Cogn Sci 2017; 21:950-961. [PMID: 29100738 DOI: 10.1016/j.tics.2017.09.013] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Revised: 09/22/2017] [Accepted: 09/28/2017] [Indexed: 11/19/2022]
Abstract
Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK.
| |
Collapse
|