1
|
Huang J, Prijatelj D, Dulay J, Scheirer W. Measuring Human Perception to Improve Open Set Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11382-11389. [PMID: 37104111 DOI: 10.1109/tpami.2023.3270772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The human ability to recognize when an object belongs or does not belong to a particular vision task outperforms all open set recognition algorithms. Human perception as measured by the methods and procedures of visual psychophysics from psychology provides an additional data stream for algorithms that need to manage novelty. For instance, measured reaction time from human subjects can offer insight as to whether a class sample is prone to be confused with a different class - known or novel. In this work, we designed and performed a large-scale behavioral experiment that collected over 200,000 human reaction time measurements associated with object recognition. The data collected indicated reaction time varies meaningfully across objects at the sample-level. We therefore designed a new psychophysical loss function that enforces consistency with human behavior in deep networks which exhibit variable reaction time for different images. As in biological vision, this approach allows us to achieve good open set recognition performance in regimes with limited labeled training data. Through experiments using data from ImageNet, significant improvement is observed when training Multi-Scale DenseNets with this new formulation: it significantly improved top-1 validation accuracy by 6.02%, top-1 test accuracy on known samples by 9.81%, and top-1 test accuracy on unknown samples by 33.18%. We compared our method to 10 open set recognition methods from the literature, which were all outperformed on multiple metrics.
Collapse
|
2
|
Liang X, Chen X, Ren K, Miao X, Chen Z, Jin Y. Low-light image enhancement via adaptive frequency decomposition network. Sci Rep 2023; 13:14107. [PMID: 37644042 PMCID: PMC10465598 DOI: 10.1038/s41598-023-40899-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 08/17/2023] [Indexed: 08/31/2023] Open
Abstract
Images captured in low light conditions suffer from low visibility, blurred details and strong noise, resulting in unpleasant visual appearance and poor performance of high level visual tasks. To address these problems, existing approaches have attempted to enhance the visibility of low-light images using convolutional neural networks (CNN). However, due to the insufficient consideration of the characteristics of the information of different frequency layers in the image, most of them yield blurry details and amplified noise. In this work, to fully extract and utilize these information, we proposed a novel Adaptive Frequency Decomposition Network (AFDNet) for low-light image enhancement. An Adaptive Frequency Decomposition (AFD) module is designed to adaptively extract low and high frequency information of different granularities. Specifically, the low-frequency information is employed for contrast enhancement and noise suppression in low-scale space and high-frequency information is for detail restoration in high-scale space. Meanwhile, a new frequency loss function are proposed to guarantee AFDNet's recovery capability for different frequency information. Extensive experiments on various publicly available datasets show that AFDNet outperforms the existing state-of-the-art methods both quantitatively and visually. In addition, our results showed that the performance of the face detection can be effectively improved by using AFDNet as pre-processing.
Collapse
Affiliation(s)
- Xiwen Liang
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Xiaoyan Chen
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China.
| | - Keying Ren
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Xia Miao
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Zhihui Chen
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China
| | - Yutao Jin
- School of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin, 300222, China
| |
Collapse
|
3
|
Grieggs S, Shen B, Rauch G, Li P, Ma J, Chiang D, Price B, Scheirer WJ. Measuring Human Perception to Improve Handwritten Document Transcription. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6594-6601. [PMID: 34170823 DOI: 10.1109/tpami.2021.3092688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this paper, we consider how to incorporate psychophysical measurements of human visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can reduce errors. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we describe a general enhancement strategy, underpinned by the new loss formulation, which can be applied to the training regime of any deep learning-based document transcription system. Through experimentation, reliable performance improvement is demonstrated for the standard IAM and RIMES datasets for three different network architectures. Further, we go on to show feasibility for our approach on a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall in the the 9th century.
Collapse
|
4
|
Pramod RT, Arun SP. Improving Machine Vision Using Human Perceptual Representations: The Case of Planar Reflection Symmetry for Object Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:228-241. [PMID: 32750809 PMCID: PMC7611439 DOI: 10.1109/tpami.2020.3008107] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Achieving human-like visual abilities is a holy grail for machine vision, yet precisely how insights from human vision can improve machines has remained unclear. Here, we demonstrate two key conceptual advances: First, we show that most machine vision models are systematically different from human object perception. To do so, we collected a large dataset of perceptual distances between isolated objects in humans and asked whether these perceptual data can be predicted by many common machine vision algorithms. We found that while the best algorithms explain ∼ 70 percent of the variance in the perceptual data, all the algorithms we tested make systematic errors on several types of objects. In particular, machine algorithms underestimated distances between symmetric objects compared to human perception. Second, we show that fixing these systematic biases can lead to substantial gains in classification performance. In particular, augmenting a state-of-the-art convolutional neural network with planar/reflection symmetry scores along multiple axes produced significant improvements in classification accuracy (1-10 percent) across categories. These results show that machine vision can be improved by discovering and fixing systematic differences from human vision.
Collapse
|
5
|
VidalMata RG, Banerjee S, RichardWebster B, Albright M, Davalos P, McCloskey S, Miller B, Tambo A, Ghosh S, Nagesh S, Yuan Y, Hu Y, Wu J, Yang W, Zhang X, Liu J, Wang Z, Chen HT, Huang TW, Chin WC, Li YC, Lababidi M, Otto C, Scheirer WJ. Bridging the Gap Between Computational Photography and Visual Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:4272-4290. [PMID: 32750769 DOI: 10.1109/tpami.2020.2996538] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. Consequently, there is a pressing need for the development of algorithms that are designed for the joint problem of improving visual appearance and recognition, which will be an enabling factor for the deployment of visual recognition tools in many real-world scenarios. To address this, we introduce the UG 2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG 2 Challenge workshop held at CVPR 2018. Under the proposed evaluation regime, we present an in-depth analysis of these algorithms and a host of deep learning-based and classic baseline approaches. From the observed results, it is evident that we are in the early days of building a bridge between computational photography and visual recognition, leaving many opportunities for innovation in this area.
Collapse
|
6
|
Liu Y, Qiu T, Wang J, Qi W. A Nighttime Vehicle Detection Method with Attentive GAN for Accurate Classification and Regression. ENTROPY 2021; 23:e23111490. [PMID: 34828188 PMCID: PMC8624689 DOI: 10.3390/e23111490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 11/08/2021] [Accepted: 11/08/2021] [Indexed: 11/16/2022]
Abstract
Vehicle detection plays a vital role in the design of Automatic Driving System (ADS), which has achieved remarkable improvements in recent years. However, vehicle detection in night scenes still has considerable challenges for the reason that the vehicle features are not obvious and are easily affected by complex road lighting or lights from vehicles. In this paper, a high-accuracy vehicle detection algorithm is proposed to detect vehicles in night scenes. Firstly, an improved Generative Adversarial Network (GAN), named Attentive GAN, is used to enhance the vehicle features of nighttime images. Then, with the purpose of achieving a higher detection accuracy, a multiple local regression is employed in the regression branch, which predicts multiple bounding box offsets. An improved Region of Interest (RoI) pooling method is used to get distinguishing features in a classification branch based on Faster Region-based Convolutional Neural Network (R-CNN). Cross entropy loss is introduced to improve the accuracy of classification branch. The proposed method is examined with the proposed dataset, which is composed of the selected nighttime images from BDD-100k dataset (Berkeley Diverse Driving Database, including 100,000 images). Compared with a series of state-of-the-art detectors, the experiments demonstrate that the proposed algorithm can effectively contribute to vehicle detection accuracy in nighttime.
Collapse
Affiliation(s)
- Yan Liu
- Correspondence: ; Tel.: +86-136-638-69878
| | | | | | | |
Collapse
|
7
|
Saavedra D, Banerjee S, Mery D. Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05521-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
8
|
A Hierarchy of Functional States in Working Memory. J Neurosci 2021; 41:4461-4475. [PMID: 33888611 PMCID: PMC8152603 DOI: 10.1523/jneurosci.3104-20.2021] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 11/21/2022] Open
Abstract
Extensive research has examined how information is maintained in working memory (WM), but it remains unknown how WM is used to guide behavior. We addressed this question by combining human electrophysiology (50 subjects, male and female) with pattern analyses, cognitive modeling, and a task requiring the prolonged maintenance of two WM items and priority shifts between them. This enabled us to discern neural states coding for memories that were selected to guide the next decision from states coding for concurrently held memories that were maintained for later use, and to examine how these states contribute to WM-based decisions. Selected memories were encoded in a functionally active state. This state was reflected in spontaneous brain activity during the delay period, closely tracked moment-to-moment fluctuations in the quality of evidence integration, and also predicted when memories would interfere with each other. In contrast, concurrently held memories were encoded in a functionally latent state. This state was reflected only in stimulus-evoked brain activity, tracked memory precision at longer timescales, but did not engage with ongoing decision dynamics. Intriguingly, the two functional states were highly flexible, as priority could be dynamically shifted back and forth between memories without degrading their precision. These results delineate a hierarchy of functional states, whereby latent memories supporting general maintenance are transformed into active decision circuits to guide flexible behavior.SIGNIFICANCE STATEMENT Working memory enables maintenance of information that is no longer available in the environment. Abundant neuroscientific work has examined where in the brain working memories are stored, but it remains unknown how they are represented and used to guide behavior. Our study shows that working memories are represented in qualitatively different formats, depending on behavioral priorities. Memories that are selected for guiding behavior are encoded in an active state that transforms sensory input into decision variables, whereas other concurrently held memories are encoded in a latent state that supports precise maintenance without affecting ongoing cognition. These results dissociate mechanisms supporting memory storage and usage, and open the door to reveal not only where memories are stored but also how.
Collapse
|
9
|
Yang Q, Wu Y, Cao D, Luo M, Wei T. A lowlight image enhancement method learning from both paired and unpaired data by adversarial training. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.057] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
10
|
Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Yang J, Zhou P, Wang Z. EnlightenGAN: Deep Light Enhancement Without Paired Supervision. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2340-2349. [PMID: 33481709 DOI: 10.1109/tip.2021.3051462] [Citation(s) in RCA: 139] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and the attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. Our codes and pre-trained models are available at: https://github.com/VITA-Group/EnlightenGAN.
Collapse
|
11
|
Artificial cognition: How experimental psychology can help generate explainable artificial intelligence. Psychon Bull Rev 2020; 28:454-475. [PMID: 33159244 DOI: 10.3758/s13423-020-01825-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2020] [Indexed: 11/08/2022]
Abstract
Artificial intelligence powered by deep neural networks has reached a level of complexity where it can be difficult or impossible to express how a model makes its decisions. This black-box problem is especially concerning when the model makes decisions with consequences for human well-being. In response, an emerging field called explainable artificial intelligence (XAI) aims to increase the interpretability, fairness, and transparency of machine learning. In this paper, we describe how cognitive psychologists can make contributions to XAI. The human mind is also a black box, and cognitive psychologists have over 150 years of experience modeling it through experimentation. We ought to translate the methods and rigor of cognitive psychology to the study of artificial black boxes in the service of explainability. We provide a review of XAI for psychologists, arguing that current methods possess a blind spot that can be complemented by the experimental cognitive tradition. We also provide a framework for research in XAI, highlight exemplary cases of experimentation within XAI inspired by psychological science, and provide a tutorial on experimenting with machines. We end by noting the advantages of an experimental approach and invite other psychologists to conduct research in this exciting new field.
Collapse
|
12
|
Abstract
Does the human mind resemble the machines that can behave like it? Biologically inspired machine-learning systems approach "human-level" accuracy in an astounding variety of domains, and even predict human brain activity-raising the exciting possibility that such systems represent the world like we do. However, even seemingly intelligent machines fail in strange and "unhumanlike" ways, threatening their status as models of our minds. How can we know when human-machine behavioral differences reflect deep disparities in their underlying capacities, vs. when such failures are only superficial or peripheral? This article draws on a foundational insight from cognitive science-the distinction between performance and competence-to encourage "species-fair" comparisons between humans and machines. The performance/competence distinction urges us to consider whether the failure of a system to behave as ideally hypothesized, or the failure of one creature to behave like another, arises not because the system lacks the relevant knowledge or internal capacities ("competence"), but instead because of superficial constraints on demonstrating that knowledge ("performance"). I argue that this distinction has been neglected by research comparing human and machine behavior, and that it should be essential to any such comparison. Focusing on the domain of image classification, I identify three factors contributing to the species-fairness of human-machine comparisons, extracted from recent work that equates such constraints. Species-fair comparisons level the playing field between natural and artificial intelligence, so that we can separate more superficial differences from those that may be deep and enduring.
Collapse
Affiliation(s)
- Chaz Firestone
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
13
|
Doerig A, Schmittwilken L, Sayim B, Manassi M, Herzog MH. Capsule networks as recurrent models of grouping and segmentation. PLoS Comput Biol 2020; 16:e1008017. [PMID: 32692780 PMCID: PMC7394447 DOI: 10.1371/journal.pcbi.1008017] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 07/31/2020] [Accepted: 06/04/2020] [Indexed: 11/18/2022] Open
Abstract
Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.
Collapse
Affiliation(s)
- Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Lynn Schmittwilken
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Dept. Computational Psychology, Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany
| | - Bilge Sayim
- Institute of Psychology, University of Bern, Bern, Switzerland
- Univ. Lille, CNRS, UMR 9193—SCALab—Sciences Cognitives et Sciences Affectives, F-59000 Lille, France
| | - Mauro Manassi
- School of Psychology, University of Aberdeen, Scotland, United Kingdom
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
14
|
Yahiaoui L, Horgan J, Deegan B, Yogamani S, Hughes C, Denny P. Overview and Empirical Analysis of ISP Parameter Tuning for Visual Perception in Autonomous Driving. J Imaging 2019; 5:jimaging5100078. [PMID: 34460644 PMCID: PMC8321211 DOI: 10.3390/jimaging5100078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 09/08/2019] [Accepted: 09/16/2019] [Indexed: 12/01/2022] Open
Abstract
Image quality is a well understood concept for human viewing applications, particularly in the multimedia space, but increasingly in an automotive context as well. The rise in prominence of autonomous driving and computer vision brings to the fore research in the area of the impact of image quality in camera perception for tasks such as recognition, localization and reconstruction. While the definition of “image quality” for computer vision may be ill-defined, what is clear is that the configuration of the image signal processing pipeline is the key factor in controlling the image quality for computer vision. This paper is partly review and partly positional with demonstration of several preliminary results promising for future research. As such, we give an overview of what is an Image Signal Processor (ISP) pipeline, describe some typical automotive computer vision problems, and give a brief introduction to the impact of image signal processing parameters on the performance of computer vision, via some empirical results. This paper provides a discussion on the merits of automatically tuning the ISP parameters using computer vision performance indicators as a cost metric, and thus bypassing the need to explicitly define what “image quality” means for computer vision. Due to lack of datasets for performing ISP tuning experiments, we apply proxy algorithms like sharpening before the vision algorithm processing. We performed these experiments with a classical algorithm namely AKAZE and a machine learning algorithm for pedestrian detection. We obtain encouraging results, such as an improvement of 14% accuracy for pedestrian detection by tuning sharpening technique parameters. We hope that this encourages creation of such datasets for more systematic evaluation of these topics.
Collapse
|
15
|
Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J Neurosci 2018; 38:7255-7269. [PMID: 30006365 DOI: 10.1523/jneurosci.0388-18.2018] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 06/06/2018] [Accepted: 07/08/2018] [Indexed: 11/21/2022] Open
Abstract
Primates, including humans, can typically recognize objects in visual images at a glance despite naturally occurring identity-preserving image transformations (e.g., changes in viewpoint). A primary neuroscience goal is to uncover neuron-level mechanistic models that quantitatively explain this behavior by predicting primate performance for each and every image. Here, we applied this stringent behavioral prediction test to the leading mechanistic models of primate vision (specifically, deep, convolutional, artificial neural networks; ANNs) by directly comparing their behavioral signatures against those of humans and rhesus macaque monkeys. Using high-throughput data collection systems for human and monkey psychophysics, we collected more than one million behavioral trials from 1472 anonymous humans and five male macaque monkeys for 2400 images over 276 binary object discrimination tasks. Consistent with previous work, we observed that state-of-the-art deep, feedforward convolutional ANNs trained for visual categorization (termed DCNNIC models) accurately predicted primate patterns of object-level confusion. However, when we examined behavioral performance for individual images within each object discrimination task, we found that all tested DCNNIC models were significantly nonpredictive of primate performance and that this prediction failure was not accounted for by simple image attributes nor rescued by simple model modifications. These results show that current DCNNIC models cannot account for the image-level behavioral patterns of primates and that new ANN models are needed to more precisely capture the neural mechanisms underlying primate object vision. To this end, large-scale, high-resolution primate behavioral benchmarks such as those obtained here could serve as direct guides for discovering such models.SIGNIFICANCE STATEMENT Recently, specific feedforward deep convolutional artificial neural networks (ANNs) models have dramatically advanced our quantitative understanding of the neural mechanisms underlying primate core object recognition. In this work, we tested the limits of those ANNs by systematically comparing the behavioral responses of these models with the behavioral responses of humans and monkeys at the resolution of individual images. Using these high-resolution metrics, we found that all tested ANN models significantly diverged from primate behavior. Going forward, these high-resolution, large-scale primate behavioral benchmarks could serve as direct guides for discovering better ANN models of the primate visual system.
Collapse
|