1
|
Osório M, Sa-Couto L, Wichert A. Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data? BIOLOGICAL CYBERNETICS 2024:10.1007/s00422-024-00995-y. [PMID: 39249119 DOI: 10.1007/s00422-024-00995-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 08/20/2024] [Indexed: 09/10/2024]
Abstract
It is generally assumed that the brain uses something akin to sparse distributed representations. These representations, however, are high-dimensional and consequently they affect classification performance of traditional Machine Learning models due to the "curse of dimensionality". In tasks for which there is a vast amount of labeled data, Deep Networks seem to solve this issue with many layers and a non-Hebbian backpropagation algorithm. The brain, however, seems to be able to solve the problem with few layers. In this work, we hypothesize that this happens by using Hebbian learning. Actually, the Hebbian-like learning rule of Restricted Boltzmann Machines learns the input patterns asymmetrically. It exclusively learns the correlation between non-zero values and ignores the zeros, which represent the vast majority of the input dimensionality. By ignoring the zeros the "curse of dimensionality" problem can be avoided. To test our hypothesis, we generated several sparse datasets and compared the performance of a Restricted Boltzmann Machine classifier with some Backprop-trained networks. The experiments using these codes confirm our initial intuition as the Restricted Boltzmann Machine shows a good generalization performance, while the Neural Networks trained with the backpropagation algorithm overfit the training data.
Collapse
Affiliation(s)
- Maria Osório
- Department of Computer Science and Engineering, INESC-ID & Instituto Superior Técnico - University of Lisbon, Av. Prof. Dr. Aníbal Cavaco Silva, Porto Salvo, 2744-016, Lisbon, Portugal.
| | - Luis Sa-Couto
- Department of Computer Science and Engineering, INESC-ID & Instituto Superior Técnico - University of Lisbon, Av. Prof. Dr. Aníbal Cavaco Silva, Porto Salvo, 2744-016, Lisbon, Portugal
| | - Andreas Wichert
- Department of Computer Science and Engineering, INESC-ID & Instituto Superior Técnico - University of Lisbon, Av. Prof. Dr. Aníbal Cavaco Silva, Porto Salvo, 2744-016, Lisbon, Portugal
| |
Collapse
|
2
|
Layton OW, Steinmetz ST. Accuracy optimized neural networks do not effectively model optic flow tuning in brain area MSTd. Front Neurosci 2024; 18:1441285. [PMID: 39286477 PMCID: PMC11403719 DOI: 10.3389/fnins.2024.1441285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 08/09/2024] [Indexed: 09/19/2024] Open
Abstract
Accuracy-optimized convolutional neural networks (CNNs) have emerged as highly effective models at predicting neural responses in brain areas along the primate ventral stream, but it is largely unknown whether they effectively model neurons in the complementary primate dorsal stream. We explored how well CNNs model the optic flow tuning properties of neurons in dorsal area MSTd and we compared our results with the Non-Negative Matrix Factorization (NNMF) model, which successfully models many tuning properties of MSTd neurons. To better understand the role of computational properties in the NNMF model that give rise to optic flow tuning that resembles that of MSTd neurons, we created additional CNN model variants that implement key NNMF constraints - non-negative weights and sparse coding of optic flow. While the CNNs and NNMF models both accurately estimate the observer's self-motion from purely translational or rotational optic flow, NNMF and the CNNs with nonnegative weights yield substantially less accurate estimates than the other CNNs when tested on more complex optic flow that combines observer translation and rotation. Despite its poor accuracy, NNMF gives rise to tuning properties that align more closely with those observed in primate MSTd than any of the accuracy-optimized CNNs. This work offers a step toward a deeper understanding of the computational properties and constraints that describe the optic flow tuning of primate area MSTd.
Collapse
Affiliation(s)
- Oliver W Layton
- Department of Computer Science, Colby College, Waterville, ME, United States
| | - Scott T Steinmetz
- Center for Computing Research, Sandia National Labs, Albuquerque, NM, United States
| |
Collapse
|
3
|
Rose O, Ponce CR. A concentration of visual cortex-like neurons in prefrontal cortex. Nat Commun 2024; 15:7002. [PMID: 39143147 PMCID: PMC11324908 DOI: 10.1038/s41467-024-51441-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 08/07/2024] [Indexed: 08/16/2024] Open
Abstract
Visual recognition is largely realized through neurons in the ventral stream, though recently, studies have suggested that ventrolateral prefrontal cortex (vlPFC) is also important for visual processing. While it is hypothesized that sensory and cognitive processes are integrated in vlPFC neurons, it is not clear how this mechanism benefits vision, or even if vlPFC neurons have properties essential for computations in visual cortex implemented via recurrence. Here, we investigated if vlPFC neurons in two male monkeys had functions comparable to visual cortex, including receptive fields, image selectivity, and the capacity to synthesize highly activating stimuli using generative networks. We found a subset of vlPFC sites show all properties, suggesting subpopulations of vlPFC neurons encode statistics about the world. Further, these vlPFC sites may be anatomically clustered, consistent with fMRI-identified functional organization. Our findings suggest that stable visual encoding in vlPFC may be a necessary condition for local and brain-wide computations.
Collapse
Affiliation(s)
- Olivia Rose
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
- Roy and Diana Vagelos Division of Biology & Biomedical Sciences, Washington University, St. Louis, MO, USA
| | - Carlos R Ponce
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Osório M, Wichert A. Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets. Neural Comput 2024; 36:1626-1642. [PMID: 38776966 DOI: 10.1162/neco_a_01677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 03/18/2024] [Indexed: 05/25/2024]
Abstract
In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks-Fruits 360, CIFAR-10, and Fashion MNIST-each visual feature is individually input into a neural network. Results reveal data set-dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the "dog" class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.
Collapse
Affiliation(s)
- Maria Osório
- Department of Computer Science and Engineering, INESC-ID and Instituto Superior Técnico, University of Lisbon, 2744-016 Porto Salvo, Portugal
| | - Andreas Wichert
- Department of Computer Science and Engineering, INESC-ID and Instituto Superior Técnico, University of Lisbon, 2744-016 Porto Salvo, Portugal
| |
Collapse
|
5
|
Micali G, Corallo F, Pagano M, Giambò FM, Duca A, D’Aleo P, Anselmo A, Bramanti A, Garofano M, Mazzon E, Bramanti P, Cappadona I. Artificial Intelligence and Heart-Brain Connections: A Narrative Review on Algorithms Utilization in Clinical Practice. Healthcare (Basel) 2024; 12:1380. [PMID: 39057522 PMCID: PMC11276532 DOI: 10.3390/healthcare12141380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 07/04/2024] [Accepted: 07/08/2024] [Indexed: 07/28/2024] Open
Abstract
Cardiovascular and neurological diseases are a major cause of mortality and morbidity worldwide. Such diseases require careful monitoring to effectively manage their progression. Artificial intelligence (AI) offers valuable tools for this purpose through its ability to analyse data and identify predictive patterns. This review evaluated the application of AI in cardiac and neurological diseases for their clinical impact on the general population. We reviewed studies on the application of AI in the neurological and cardiological fields. Our search was performed on the PubMed, Web of Science, Embase and Cochrane library databases. Of the initial 5862 studies, 23 studies met the inclusion criteria. The studies showed that the most commonly used algorithms in these clinical fields are Random Forest and Artificial Neural Network, followed by logistic regression and Support-Vector Machines. In addition, an ECG-AI algorithm based on convolutional neural networks has been developed and has been widely used in several studies for the detection of atrial fibrillation with good accuracy. AI has great potential to support physicians in interpretation, diagnosis, risk assessment and disease management.
Collapse
Affiliation(s)
- Giuseppe Micali
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Francesco Corallo
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Maria Pagano
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Fabio Mauro Giambò
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Antonio Duca
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Piercataldo D’Aleo
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Anna Anselmo
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Alessia Bramanti
- Department of Medicine, Surgery and Dentistry, University of Salerno, 84081 Baronissi, Italy
| | - Marina Garofano
- Department of Medicine, Surgery and Dentistry, University of Salerno, 84081 Baronissi, Italy
| | - Emanuela Mazzon
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| | - Placido Bramanti
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
- Faculty of Psychology, Università degli Studi eCampus, Via Isimbardi 10, 22060 Novedrate, Italy
| | - Irene Cappadona
- IRCCS Centro Neurolesi Bonino-Pulejo, Via Palermo, S.S. 113, C.da Casazza, 98124 Messina, Italy; (G.M.)
| |
Collapse
|
6
|
Fang C, Wu Z, Zheng H, Yang J, Ma C, Zhang T. MCP: Multi-Chicken Pose Estimation Based on Transfer Learning. Animals (Basel) 2024; 14:1774. [PMID: 38929393 PMCID: PMC11200378 DOI: 10.3390/ani14121774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 06/07/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open
Abstract
Poultry managers can better understand the state of poultry through poultry behavior analysis. As one of the key steps in behavior analysis, the accurate estimation of poultry posture is the focus of this research. This study mainly analyzes a top-down pose estimation method of multiple chickens. Therefore, we propose the "multi-chicken pose" (MCP), a pose estimation system for multiple chickens through deep learning. Firstly, we find the position of each chicken from the image via the chicken detector; then, an estimate of the pose of each chicken is made using a pose estimation network, which is based on transfer learning. On this basis, the pixel error (PE), root mean square error (RMSE), and image quantity distribution of key points are analyzed according to the improved chicken keypoint similarity (CKS). The experimental results show that the algorithm scores in different evaluation metrics are a mean average precision (mAP) of 0.652, a mean average recall (mAR) of 0.742, a percentage of correct keypoints (PCKs) of 0.789, and an RMSE of 17.30 pixels. To the best of our knowledge, this is the first time that transfer learning has been used for the pose estimation of multiple chickens as objects. The method can provide a new path for future poultry behavior analysis.
Collapse
Affiliation(s)
- Cheng Fang
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
| | - Zhenlong Wu
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
| | - Haikun Zheng
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
| | - Jikang Yang
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
| | - Chuang Ma
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
| | - Tiemin Zhang
- College of Engineering, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China; (C.F.)
- National Engineering Research Center for Breeding Swine Industry, Guangzhou 510642, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China
| |
Collapse
|
7
|
Guberman S, Latash ML. The Role of Imitation, Primitives, and Spatial Referent Coordinates in Motor Control: Implications for Writing and Reading. Motor Control 2024:1-15. [PMID: 38364817 DOI: 10.1123/mc.2023-0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/13/2023] [Accepted: 01/05/2024] [Indexed: 02/18/2024]
Abstract
We review a body of literature related to the drawing and recognition of geometrical two-dimensional linear drawings including letters. Handwritten letters are viewed not as two-dimensional geometrical objects but as one-dimensional trajectories of the tip of the implement. Handwritten letters are viewed as composed of a small set of kinematic primitives. Recognition of objects is mediated by processes of their creation (actual or imagined)-the imitation principle, a particular example of action-perception coupling. The concept of spatial directional field guiding the trajectories is introduced and linked to neuronal population vectors. Further, we link the kinematic description to the theory of control with spatial referent coordinates. This framework allows interpreting a number of experimental observations and clinical cases of agnosia. It also allows formulating predictions for new experimental studies of writing.
Collapse
Affiliation(s)
- Shelia Guberman
- Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, San Jose, CA, USA
| | - Mark L Latash
- Department of Kinesiology, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
8
|
Noda K, Soda T, Yamashita Y. Emergence of number sense through the integration of multimodal information: developmental learning insights from neural network models. Front Neurosci 2024; 18:1330512. [PMID: 38298912 PMCID: PMC10828047 DOI: 10.3389/fnins.2024.1330512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 01/02/2024] [Indexed: 02/02/2024] Open
Abstract
Introduction Associating multimodal information is essential for human cognitive abilities including mathematical skills. Multimodal learning has also attracted attention in the field of machine learning, and it has been suggested that the acquisition of better latent representation plays an important role in enhancing task performance. This study aimed to explore the impact of multimodal learning on representation, and to understand the relationship between multimodal representation and the development of mathematical skills. Methods We employed a multimodal deep neural network as the computational model for multimodal associations in the brain. We compared the representations of numerical information, that is, handwritten digits and images containing a variable number of geometric figures learned through single- and multimodal methods. Next, we evaluated whether these representations were beneficial for downstream arithmetic tasks. Results Multimodal training produced better latent representation in terms of clustering quality, which is consistent with previous findings on multimodal learning in deep neural networks. Moreover, the representations learned using multimodal information exhibited superior performance in arithmetic tasks. Discussion Our novel findings experimentally demonstrate that changes in acquired latent representations through multimodal association learning are directly related to cognitive functions, including mathematical skills. This supports the possibility that multimodal learning using deep neural network models may offer novel insights into higher cognitive functions.
Collapse
Affiliation(s)
| | | | - Yuichi Yamashita
- Department of Information Medicine, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Kodaira, Japan
| |
Collapse
|
9
|
Khan S, Wong A, Tripp B. Modeling the Role of Contour Integration in Visual Inference. Neural Comput 2023; 36:33-74. [PMID: 38052088 DOI: 10.1162/neco_a_01625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 09/08/2023] [Indexed: 12/07/2023]
Abstract
Under difficult viewing conditions, the brain's visual system uses a variety of recurrent modulatory mechanisms to augment feedforward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same versus different contours. The model learned robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same as or better than the model on the natural-image tasks. Thus, a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons.
Collapse
Affiliation(s)
- Salman Khan
- Centre for Theoretical Neuroscience, Department of System Design Engineering
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| | - Alexander Wong
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| | - Bryan Tripp
- Centre for Theoretical Neuroscience, Department of System Design Engineering
- Vision and Image Processing Group, Department of System Design Engineering
- Waterloo Artificial Intelligence Institute: University of Waterloo, Waterloo, ON, Canada, N2L 3G1
| |
Collapse
|
10
|
Golan T, Taylor J, Schütt H, Peters B, Sommers RP, Seeliger K, Doerig A, Linton P, Konkle T, van Gerven M, Kording K, Richards B, Kietzmann TC, Lindsay GW, Kriegeskorte N. Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses. Behav Brain Sci 2023; 46:e392. [PMID: 38054329 DOI: 10.1017/s0140525x23001553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
An ideal vision model accounts for behavior and neurophysiology in both naturalistic conditions and designed lab experiments. Unlike psychological theories, artificial neural networks (ANNs) actually perform visual tasks and generate testable predictions for arbitrary inputs. These advantages enable ANNs to engage the entire spectrum of the evidence. Failures of particular models drive progress in a vibrant ANN research program of human vision.
Collapse
Affiliation(s)
- Tal Golan
- Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
| | - Heiko Schütt
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Center for Neural Science, New York University, New York, NY, USA
| | - Benjamin Peters
- School of Psychology & Neuroscience, University of Glasgow, Glasgow, UK
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Paul Linton
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Presidential Scholars in Society and Neuroscience, Center for Science and Society, Columbia University, New York, NY, USA
- Italian Academy for Advanced Studies in America, Columbia University, New York, NY, USA
| | - Talia Konkle
- Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA ://konklab.fas.harvard.edu/
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlandsartcogsys.com
| | - Konrad Kording
- Departments of Bioengineering and Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | - Blake Richards
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Mila, Montreal, QC, Canada
- School of Computer Science, McGill University, Montreal, QC, Canada
- Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
- Montreal Neurological Institute, Montreal, QC, Canada
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Grace W Lindsay
- Department of Psychology and Center for Data Science, New York University, New York, NY, USA
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA ://linton.vision/
- Departments of Psychology, Neuroscience, and Electrical Engineering, Columbia University, New York, NY, USA
| |
Collapse
|
11
|
Abstract
Deep neural networks (DNNs) are machine learning algorithms that have revolutionized computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. In this article, we review evidence regarding current DNNs as adequate behavioral models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models and to understand model quality as a multidimensional concept in which clarity about modeling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that, as of today, DNNs should only be regarded as promising-but not yet adequate-computational models of human core object recognition behavior. On the way, we dispel several myths surrounding DNNs in vision science.
Collapse
Affiliation(s)
- Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany;
| | | |
Collapse
|
12
|
Pan X, DeForge A, Schwartz O. Generalizing biological surround suppression based on center surround similarity via deep neural network models. PLoS Comput Biol 2023; 19:e1011486. [PMID: 37738258 PMCID: PMC10550176 DOI: 10.1371/journal.pcbi.1011486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/04/2023] [Accepted: 09/04/2023] [Indexed: 09/24/2023] Open
Abstract
Sensory perception is dramatically influenced by the context. Models of contextual neural surround effects in vision have mostly accounted for Primary Visual Cortex (V1) data, via nonlinear computations such as divisive normalization. However, surround effects are not well understood within a hierarchy, for neurons with more complex stimulus selectivity beyond V1. We utilized feedforward deep convolutional neural networks and developed a gradient-based technique to visualize the most suppressive and excitatory surround. We found that deep neural networks exhibited a key signature of surround effects in V1, highlighting center stimuli that visually stand out from the surround and suppressing responses when the surround stimulus is similar to the center. We found that in some neurons, especially in late layers, when the center stimulus was altered, the most suppressive surround surprisingly can follow the change. Through the visualization approach, we generalized previous understanding of surround effects to more complex stimuli, in ways that have not been revealed in visual cortices. In contrast, the suppression based on center surround similarity was not observed in an untrained network. We identified further successes and mismatches of the feedforward CNNs to the biology. Our results provide a testable hypothesis of surround effects in higher visual cortices, and the visualization approach could be adopted in future biological experimental designs.
Collapse
Affiliation(s)
- Xu Pan
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| | - Annie DeForge
- School of Information, University of California, Berkeley, CA, United States of America
- Bentley University, Waltham, MA, United States of America
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| |
Collapse
|
13
|
Veerabadran V, Goldman J, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow I, Shlens J, Sohl-Dickstein J, Mozer MC, Elsayed GF. Subtle adversarial image manipulations influence both human and machine perception. Nat Commun 2023; 14:4933. [PMID: 37582834 PMCID: PMC10427626 DOI: 10.1038/s41467-023-40499-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 08/01/2023] [Indexed: 08/17/2023] Open
Abstract
Although artificial neural networks (ANNs) were inspired by the brain, ANNs exhibit a brittleness not generally observed in human perception. One shortcoming of ANNs is their susceptibility to adversarial perturbations-subtle modulations of natural images that result in changes to classification decisions, such as confidently mislabelling an image of an elephant, initially classified correctly, as a clock. In contrast, a human observer might well dismiss the perturbations as an innocuous imaging artifact. This phenomenon may point to a fundamental difference between human and machine perception, but it drives one to ask whether human sensitivity to adversarial perturbations might be revealed with appropriate behavioral measures. Here, we find that adversarial perturbations that fool ANNs similarly bias human choice. We further show that the effect is more likely driven by higher-order statistics of natural images to which both humans and ANNs are sensitive, rather than by the detailed architecture of the ANN.
Collapse
Affiliation(s)
- Vijay Veerabadran
- Google, Mountain View, CA, USA
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | | | - Shreya Shankar
- Google, Mountain View, CA, USA
- University of California, Berkeley, CA, USA
| | - Brian Cheung
- Google, Mountain View, CA, USA
- MIT Brain and Cognitive Sciences, Cambridge, MA, USA
| | | | | | | | | | | | | | | |
Collapse
|
14
|
McDonnell KJ. Leveraging the Academic Artificial Intelligence Silecosystem to Advance the Community Oncology Enterprise. J Clin Med 2023; 12:4830. [PMID: 37510945 PMCID: PMC10381436 DOI: 10.3390/jcm12144830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Over the last 75 years, artificial intelligence has evolved from a theoretical concept and novel paradigm describing the role that computers might play in our society to a tool with which we daily engage. In this review, we describe AI in terms of its constituent elements, the synthesis of which we refer to as the AI Silecosystem. Herein, we provide an historical perspective of the evolution of the AI Silecosystem, conceptualized and summarized as a Kuhnian paradigm. This manuscript focuses on the role that the AI Silecosystem plays in oncology and its emerging importance in the care of the community oncology patient. We observe that this important role arises out of a unique alliance between the academic oncology enterprise and community oncology practices. We provide evidence of this alliance by illustrating the practical establishment of the AI Silecosystem at the City of Hope Comprehensive Cancer Center and its team utilization by community oncology providers.
Collapse
Affiliation(s)
- Kevin J McDonnell
- Center for Precision Medicine, Department of Medical Oncology & Therapeutics Research, City of Hope Comprehensive Cancer Center, Duarte, CA 91010, USA
| |
Collapse
|
15
|
DiMattina C. Second-order boundaries segment more easily when they are density-defined rather than feature-defined. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548431. [PMID: 37502940 PMCID: PMC10369903 DOI: 10.1101/2023.07.10.548431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Previous studies have demonstrated that density is an important perceptual aspect of textural appearance to which the visual system is highly attuned. Furthermore, it is known that density cues not only influence texture segmentation, but can enable segmentation by themselves, in the absence of other cues. A popular computational model of texture segmentation known as the "Filter-Rectify-Filter" (FRF) model predicts that density should be a second-order cue enabling segmentation. For a compound texture boundary defined by superimposing two single-micropattern density boundaries, a version of the FRF model in which different micropattern-specific channels are analyzed separately by different second-stage filters makes the prediction that segmentation thresholds should be identical in two cases: (1) Compound boundaries with an equal number of micropatterns on each side but different relative proportions of each variety (compound feature boundaries) and (2) Compound boundaries with different numbers of micropatterns on each side, but with each side having an identical number of each variety (compound density boundaries). We directly tested this prediction by comparing segmentation thresholds for second-order compound feature and density boundaries, comprised of two superimposed single-micropattern density boundaries comprised of complementary micropattern pairs differing either in orientation or contrast polarity. In both cases, we observed lower segmentation thresholds for compound density boundaries than compound feature boundaries, with identical results when the compound density boundaries were equated for RMS contrast. In a second experiment, we considered how two varieties of micropatterns summate for compound boundary segmentation. In the case where two single micro-pattern density boundaries are superimposed to form a compound density boundary, we find that the two channels combine via probability summation. By contrast, when they are superimposed to form a compound feature boundary, segmentation performance is worse than for either channel alone. From these findings, we conclude that density segmentation may rely on neural mechanisms different from those which underlie feature segmentation, consistent with recent findings suggesting that density comprises a separate psychophysical 'channel'.
Collapse
Affiliation(s)
- Christopher DiMattina
- Computational Perception Laboratory, Florida Gulf Coast University, Fort Myers, FL, USA 33965-6565
- Department of Psychology, Florida Gulf Coast University, Fort Myers, FL, USA 33965-6565
| |
Collapse
|
16
|
Mocz V, Jeong SK, Chun M, Xu Y. Multiple visual objects are represented differently in the human brain and convolutional neural networks. Sci Rep 2023; 13:9088. [PMID: 37277406 DOI: 10.1038/s41598-023-36029-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 05/27/2023] [Indexed: 06/07/2023] Open
Abstract
Objects in the real world usually appear with other objects. To form object representations independent of whether or not other objects are encoded concurrently, in the primate brain, responses to an object pair are well approximated by the average responses to each constituent object shown alone. This is found at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in fMRI voxel response patterns in human ventral object processing regions (e.g., LO). Here, we compare how the human brain and convolutional neural networks (CNNs) represent paired objects. In human LO, we show that averaging exists in both single fMRI voxels and voxel population responses. However, in the higher layers of five CNNs pretrained for object classification varying in architecture, depth and recurrent processing, slope distribution across units and, consequently, averaging at the population level both deviated significantly from the brain data. Object representations thus interact with each other in CNNs when objects are shown together and differ from when objects are shown individually. Such distortions could significantly limit CNNs' ability to generalize object representations formed in different contexts.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, Cheongju, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA.
| |
Collapse
|
17
|
Sandbrink KJ, Mamidanna P, Michaelis C, Bethge M, Mathis MW, Mathis A. Contrasting action and posture coding with hierarchical deep neural network models of proprioception. eLife 2023; 12:e81499. [PMID: 37254843 PMCID: PMC10361732 DOI: 10.7554/elife.81499] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 05/16/2023] [Indexed: 06/01/2023] Open
Abstract
Biological motor control is versatile, efficient, and depends on proprioceptive feedback. Muscles are flexible and undergo continuous changes, requiring distributed adaptive control mechanisms that continuously account for the body's state. The canonical role of proprioception is representing the body state. We hypothesize that the proprioceptive system could also be critical for high-level tasks such as action recognition. To test this theory, we pursued a task-driven modeling approach, which allowed us to isolate the study of proprioception. We generated a large synthetic dataset of human arm trajectories tracing characters of the Latin alphabet in 3D space, together with muscle activities obtained from a musculoskeletal model and model-based muscle spindle activity. Next, we compared two classes of tasks: trajectory decoding and action recognition, which allowed us to train hierarchical models to decode either the position and velocity of the end-effector of one's posture or the character (action) identity from the spindle firing patterns. We found that artificial neural networks could robustly solve both tasks, and the networks' units show tuning properties similar to neurons in the primate somatosensory cortex and the brainstem. Remarkably, we found uniformly distributed directional selective units only with the action-recognition-trained models and not the trajectory-decoding-trained models. This suggests that proprioceptive encoding is additionally associated with higher-level functions such as action recognition and therefore provides new, experimentally testable hypotheses of how proprioception aids in adaptive motor control.
Collapse
Affiliation(s)
- Kai J Sandbrink
- The Rowland Institute at Harvard, Harvard UniversityCambridgeUnited States
| | - Pranav Mamidanna
- Tübingen AI Center, Eberhard Karls Universität Tübingen & Institute for Theoretical PhysicsTübingenGermany
| | - Claudio Michaelis
- Tübingen AI Center, Eberhard Karls Universität Tübingen & Institute for Theoretical PhysicsTübingenGermany
| | - Matthias Bethge
- Tübingen AI Center, Eberhard Karls Universität Tübingen & Institute for Theoretical PhysicsTübingenGermany
| | - Mackenzie Weygandt Mathis
- The Rowland Institute at Harvard, Harvard UniversityCambridgeUnited States
- Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de LausanneGenèveSwitzerland
| | - Alexander Mathis
- The Rowland Institute at Harvard, Harvard UniversityCambridgeUnited States
- Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de LausanneGenèveSwitzerland
| |
Collapse
|
18
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
19
|
Taylor J, Xu Y. Comparing the Dominance of Color and Form Information across the Human Ventral Visual Pathway and Convolutional Neural Networks. J Cogn Neurosci 2023; 35:816-840. [PMID: 36877074 PMCID: PMC11283826 DOI: 10.1162/jocn_a_01979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
Color and form information can be decoded in every region of the human ventral visual hierarchy, and at every layer of many convolutional neural networks (CNNs) trained to recognize objects, but how does the coding strength of these features vary over processing? Here, we characterize for these features both their absolute coding strength-how strongly each feature is represented independent of the other feature-and their relative coding strength-how strongly each feature is encoded relative to the other, which could constrain how well a feature can be read out by downstream regions across variation in the other feature. To quantify relative coding strength, we define a measure called the form dominance index that compares the relative influence of color and form on the representational geometry at each processing stage. We analyze brain and CNN responses to stimuli varying based on color and either a simple form feature, orientation, or a more complex form feature, curvature. We find that while the brain and CNNs largely differ in how the absolute coding strength of color and form vary over processing, comparing them in terms of their relative emphasis of these features reveals a striking similarity: For both the brain and for CNNs trained for object recognition (but not for untrained CNNs), orientation information is increasingly de-emphasized, and curvature information is increasingly emphasized, relative to color information over processing, with corresponding processing stages showing largely similar values of the form dominance index.
Collapse
|
20
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
21
|
Multi-center, multi-vendor validation of deep learning-based attenuation correction in SPECT MPI: data from the international flurpiridaz-301 trial. Eur J Nucl Med Mol Imaging 2023; 50:1028-1033. [PMID: 36401636 DOI: 10.1007/s00259-022-06045-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 11/13/2022] [Indexed: 11/21/2022]
Abstract
PURPOSE Although SPECT myocardial perfusion imaging (MPI) is susceptible to artifacts from soft tissue attenuation, most scans are performed without attenuation correction. Deep learning-based attenuation corrected (DLAC) polar maps improved diagnostic accuracy for detection of coronary artery disease (CAD) beyond non-attenuation-corrected (NAC) polar maps in a large single center study. However, the generalizability of this approach to other institutions with different scanner models and protocols is uncertain. In this study, we evaluated the diagnostic performance of DLAC compared to NAC for detection of CAD as defined by invasive coronary angiography (ICA) in a large multi-center trial. METHODS During the phase 3 flurpiridaz multi-center diagnostic clinical trial, conducted over 74 international sites, patients with known or suspected CAD who were referred for a clinically indicated ICA were enrolled. Using receiver operating characteristic (ROC) analysis, we evaluated the detectability of obstructive CAD, defined by quantitative coronary angiography by a core laboratory, using total perfusion deficit (TPD) as an integrated measure of defect extent and severity on DLAC polar maps compared to NAC polar maps. This was also compared against the visual scoring of three expert core lab readers. RESULTS Out of 755 patients, 722 (69% male) had evaluable SPECT and ICA for this study. ROC analysis demonstrated significant improvement in detecting per-patient obstructive CAD with DLAC over NAC with area under the curve (AUC) of 0.752 (95% CI: 0.711-0.792) for DLAC compared to 0.717 (0.675-0.759) for NAC (p value = 0.016). Compared to the consensus of expert readers AUC = 0.743 (0.701-0.784), DLAC was comparable (p value = 0.913), whereas NAC underperformed (p value = 0.051). CONCLUSION DL-based attenuation correction improves diagnostic performance of SPECT MPI for detecting CAD in data from a large multi-center clinical trial regardless of SPECT camera model or protocol. TRIAL REGISTRATION A Phase 3 Multi-center Study to Assess PET Imaging of Flurpiridaz F 18 Injection in Patients With CAD, ClinicalTrials.gov Identifier: NCT01347710, registered on 4 May 2011. https://clinicaltrials.gov/ct2/show/study/NCT01347710.
Collapse
|
22
|
Mocz V, Jeong SK, Chun M, Xu Y. Representing Multiple Visual Objects in the Human Brain and Convolutional Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.530472. [PMID: 36909506 PMCID: PMC10002658 DOI: 10.1101/2023.02.28.530472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Objects in the real world often appear with other objects. To recover the identity of an object whether or not other objects are encoded concurrently, in primate object-processing regions, neural responses to an object pair have been shown to be well approximated by the average responses to each constituent object shown alone, indicating the whole is equal to the average of its parts. This is present at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in response patterns of fMRI voxels in human ventral object processing regions (e.g., LO). Here we show that averaging exists in both single fMRI voxels and voxel population responses in human LO, with better averaging in single voxels leading to better averaging in fMRI response patterns, demonstrating a close correspondence of averaging at the fMRI unit and population levels. To understand if a similar averaging mechanism exists in convolutional neural networks (CNNs) pretrained for object classification, we examined five CNNs with varying architecture, depth and the presence/absence of recurrent processing. We observed averaging at the CNN unit level but rarely at the population level, with CNN unit response distribution in most cases did not resemble human LO or macaque IT responses. The whole is thus not equal to the average of its parts in CNNs, potentially rendering the individual objects in a pair less accessible in CNNs during visual processing than they are in the human brain.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
23
|
Revsine C, Gonzalez-Castillo J, Merriam EP, Bandettini PA, Ramírez FM. A unifying model for discordant and concordant results in human neuroimaging studies of facial viewpoint selectivity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.08.527219. [PMID: 36945636 PMCID: PMC10028835 DOI: 10.1101/2023.02.08.527219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Our ability to recognize faces regardless of viewpoint is a key property of the primate visual system. Traditional theories hold that facial viewpoint is represented by view-selective mechanisms at early visual processing stages and that representations become increasingly tolerant to viewpoint changes in higher-level visual areas. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest an additional intermediate processing stage invariant to mirror-symmetric face views. Consistent with traditional theories, human studies combining neuroimaging and multivariate pattern analysis (MVPA) methods have provided evidence of view-selectivity in early visual cortex. However, contradictory results have been reported in higher-level visual areas concerning the existence in humans of mirror-symmetrically tuned representations. We believe these results reflect low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two popular face databases. Analyses of mean image luminance and contrast revealed biases across face views described by even polynomials-i.e., mirror-symmetric. To explain major trends across human neuroimaging studies of viewpoint selectivity, we constructed a network model that incorporates three biological constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network layers is sufficient to replicate findings of mirror-symmetry in high-level processing stages, as well as view-tuning in early processing stages. Data analysis decisions-pattern dissimilarity measure and data recentering-accounted for the variable observation of mirror-symmetry in late processing stages. The model provides a unifying explanation of MVPA studies of viewpoint selectivity. We also show how common analysis choices can lead to erroneous conclusions.
Collapse
Affiliation(s)
- Cambria Revsine
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Department of Psychology, University of Chicago, Chicago, IL
| | - Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Elisha P Merriam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Peter A Bandettini
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Functional MRI Core, National Institutes of Health, Bethesda, MD
| | - Fernando M Ramírez
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| |
Collapse
|
24
|
Zhang Y, Aghajan ZM, Ison M, Lu Q, Tang H, Kalender G, Monsoor T, Zheng J, Kreiman G, Roychowdhury V, Fried I. Decoding of human identity by computer vision and neuronal vision. Sci Rep 2023; 13:651. [PMID: 36635322 PMCID: PMC9837190 DOI: 10.1038/s41598-022-26946-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 12/22/2022] [Indexed: 01/14/2023] Open
Abstract
Extracting meaning from a dynamic and variable flow of incoming information is a major goal of both natural and artificial intelligence. Computer vision (CV) guided by deep learning (DL) has made significant strides in recognizing a specific identity despite highly variable attributes. This is the same challenge faced by the nervous system and partially addressed by the concept cells-neurons exhibiting selective firing in response to specific persons/places, described in the human medial temporal lobe (MTL) . Yet, access to neurons representing a particular concept is limited due to these neurons' sparse coding. It is conceivable, however, that the information required for such decoding is present in relatively small neuronal populations. To evaluate how well neuronal populations encode identity information in natural settings, we recorded neuronal activity from multiple brain regions of nine neurosurgical epilepsy patients implanted with depth electrodes, while the subjects watched an episode of the TV series "24". First, we devised a minimally supervised CV algorithm (with comparable performance against manually-labeled data) to detect the most prevalent characters (above 1% overall appearance) in each frame. Next, we implemented DL models that used the time-varying population neural data as inputs and decoded the visual presence of the four main characters throughout the episode. This methodology allowed us to compare "computer vision" with "neuronal vision"-footprints associated with each character present in the activity of a subset of neurons-and identify the brain regions that contributed to this decoding process. We then tested the DL models during a recognition memory task following movie viewing where subjects were asked to recognize clip segments from the presented episode. DL model activations were not only modulated by the presence of the corresponding characters but also by participants' subjective memory of whether they had seen the clip segment, and by the associative strengths of the characters in the narrative plot. The described approach can offer novel ways to probe the representation of concepts in time-evolving dynamic behavioral tasks. Further, the results suggest that the information required to robustly decode concepts is present in the population activity of only tens of neurons even in brain regions beyond MTL.
Collapse
Affiliation(s)
- Yipeng Zhang
- grid.19006.3e0000 0000 9632 6718Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA USA
| | - Zahra M. Aghajan
- grid.19006.3e0000 0000 9632 6718Department of Neurosurgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Matias Ison
- grid.4563.40000 0004 1936 8868School of Psychology, University of Nottingham, Nottingham, UK
| | - Qiujing Lu
- grid.19006.3e0000 0000 9632 6718Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA USA
| | - Hanlin Tang
- grid.38142.3c000000041936754XChildren’s Hospital, Harvard Medical School, Boston, MA USA
| | - Guldamla Kalender
- grid.19006.3e0000 0000 9632 6718Department of Neurosurgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Tonmoy Monsoor
- grid.19006.3e0000 0000 9632 6718Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA USA
| | - Jie Zheng
- grid.38142.3c000000041936754XChildren’s Hospital, Harvard Medical School, Boston, MA USA
| | - Gabriel Kreiman
- grid.38142.3c000000041936754XChildren’s Hospital, Harvard Medical School, Boston, MA USA ,grid.116068.80000 0001 2341 2786Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Vwani Roychowdhury
- grid.19006.3e0000 0000 9632 6718Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA USA
| | - Itzhak Fried
- Department of Neurosurgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Psychiatry and Biobehavioral Sciences, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA. .,Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
25
|
Jinsi O, Henderson MM, Tarr MJ. Early experience with low-pass filtered images facilitates visual category learning in a neural network model. PLoS One 2023; 18:e0280145. [PMID: 36608003 PMCID: PMC9821476 DOI: 10.1371/journal.pone.0280145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
Humans are born with very low contrast sensitivity, meaning that inputs to the infant visual system are both blurry and low contrast. Is this solely a byproduct of maturational processes or is there a functional advantage for beginning life with poor visual acuity? We addressed the impact of poor vision during early learning by exploring whether reduced visual acuity facilitated the acquisition of basic-level categories in a convolutional neural network model (CNN), as well as whether any such benefit transferred to subordinate-level category learning. Using the ecoset dataset to simulate basic-level category learning, we manipulated model training curricula along three dimensions: presence of blurred inputs early in training, rate of blur reduction over time, and grayscale versus color inputs. First, a training regime where blur was initially high and was gradually reduced over time-as in human development-improved basic-level categorization performance in a CNN relative to a regime in which non-blurred inputs were used throughout training. Second, when basic-level models were fine-tuned on a task including both basic-level and subordinate-level categories (using the ImageNet dataset), models initially trained with blurred inputs showed a greater performance benefit as compared to models trained exclusively on non-blurred inputs, suggesting that the benefit of blurring generalized from basic-level to subordinate-level categorization. Third, analogous to the low sensitivity to color that infants experience during the first 4-6 months of development, these advantages were observed only when grayscale images were used as inputs. We conclude that poor visual acuity in human newborns may confer functional advantages, including, as demonstrated here, more rapid and accurate acquisition of visual object categories at multiple levels.
Collapse
Affiliation(s)
- Omisa Jinsi
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Margaret M. Henderson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Michael J. Tarr
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
26
|
Hagio T, Murthy VL. Deep learning: Opening a third eye to myocardial perfusion imaging. J Nucl Cardiol 2022; 29:3311-3314. [PMID: 35554868 DOI: 10.1007/s12350-022-02959-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/09/2022] [Indexed: 01/18/2023]
Affiliation(s)
- Tomoe Hagio
- INVIA Medical Imaging Solutions, 3025 Boardwalk St, Suite 200, Ann Arbor, MI, 48108, USA.
| | - Venkatesh L Murthy
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
27
|
Prince JS, Charest I, Kurzawski JW, Pyles JA, Tarr MJ, Kay KN. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 2022; 11:77599. [PMID: 36444984 PMCID: PMC9708069 DOI: 10.7554/elife.77599] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 10/15/2022] [Indexed: 11/30/2022] Open
Abstract
Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle, a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses (glmsingle.org). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions.
Collapse
Affiliation(s)
- Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, United States
| | - Ian Charest
- Center for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, United Kingdom.,cerebrUM, Département de Psychologie, Université de Montréal, Montréal, Canada
| | - Jan W Kurzawski
- Department of Psychology, New York University, New York, United States
| | - John A Pyles
- Center for Human Neuroscience, Department of Psychology, University of Washington, Seattle, United States
| | - Michael J Tarr
- Department of Psychology, Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Kendrick N Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, United States
| |
Collapse
|
28
|
Kuo JY, Denman AJ, Beacher NJ, Glanzberg JT, Zhang Y, Li Y, Lin DT. Using deep learning to study emotional behavior in rodent models. Front Behav Neurosci 2022; 16:1044492. [PMID: 36483523 PMCID: PMC9722968 DOI: 10.3389/fnbeh.2022.1044492] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/02/2022] [Indexed: 11/25/2023] Open
Abstract
Quantifying emotional aspects of animal behavior (e.g., anxiety, social interactions, reward, and stress responses) is a major focus of neuroscience research. Because manual scoring of emotion-related behaviors is time-consuming and subjective, classical methods rely on easily quantified measures such as lever pressing or time spent in different zones of an apparatus (e.g., open vs. closed arms of an elevated plus maze). Recent advancements have made it easier to extract pose information from videos, and multiple approaches for extracting nuanced information about behavioral states from pose estimation data have been proposed. These include supervised, unsupervised, and self-supervised approaches, employing a variety of different model types. Representations of behavioral states derived from these methods can be correlated with recordings of neural activity to increase the scope of connections that can be drawn between the brain and behavior. In this mini review, we will discuss how deep learning techniques can be used in behavioral experiments and how different model architectures and training paradigms influence the type of representation that can be obtained.
Collapse
Affiliation(s)
- Jessica Y. Kuo
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Alexander J. Denman
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Nicholas J. Beacher
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Joseph T. Glanzberg
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Yan Zhang
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Yun Li
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY, United States
| | - Da-Ting Lin
- Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| |
Collapse
|
29
|
Mocz V, Vaziri-Pashkam M, Chun M, Xu Y. Predicting Identity-Preserving Object Transformations in Human Posterior Parietal Cortex and Convolutional Neural Networks. J Cogn Neurosci 2022; 34:2406-2435. [PMID: 36122358 PMCID: PMC9988239 DOI: 10.1162/jocn_a_01916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous research shows that, within human occipito-temporal cortex (OTC), we can use a general linear mapping function to link visual object responses across nonidentity feature changes, including Euclidean features (e.g., position and size) and non-Euclidean features (e.g., image statistics and spatial frequency). Although the learned mapping is capable of predicting responses of objects not included in training, these predictions are better for categories included than those not included in training. These findings demonstrate a near-orthogonal representation of object identity and nonidentity features throughout human OTC. Here, we extended these findings to examine the mapping across both Euclidean and non-Euclidean feature changes in human posterior parietal cortex (PPC), including functionally defined regions in inferior and superior intraparietal sulcus. We additionally examined responses in five convolutional neural networks (CNNs) pretrained with object classification, as CNNs are considered as the current best model of the primate ventral visual system. We separately compared results from PPC and CNNs with those of OTC. We found that a linear mapping function could successfully link object responses in different states of nonidentity transformations in human PPC and CNNs for both Euclidean and non-Euclidean features. Overall, we found that object identity and nonidentity features are represented in a near-orthogonal, rather than complete-orthogonal, manner in PPC and CNNs, just like they do in OTC. Meanwhile, some differences existed among OTC, PPC, and CNNs. These results demonstrate the similarities and differences in how visual object information across an identity-preserving image transformation may be represented in OTC, PPC, and CNNs.
Collapse
|
30
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
31
|
Utsumi A. A test of indirect grounding of abstract concepts using multimodal distributional semantics. Front Psychol 2022; 13:906181. [PMID: 36267060 PMCID: PMC9577286 DOI: 10.3389/fpsyg.2022.906181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
How are abstract concepts grounded in perceptual experiences for shaping human conceptual knowledge? Recent studies on abstract concepts emphasizing the role of language have argued that abstract concepts are grounded indirectly in perceptual experiences and language (or words) functions as a bridge between abstract concepts and perceptual experiences. However, this “indirect grounding” view remains largely speculative and has hardly been supported directly by empirical evidence. In this paper, therefore, we test the indirect grounding view by means of multimodal distributional semantics, in which the meaning of a word (i.e., a concept) is represented as the combination of textual and visual vectors. The newly devised multimodal distributional semantic model incorporates the indirect grounding view by computing the visual vector of an abstract word through the visual vectors of concrete words semantically related to that abstract word. An evaluation experiment is conducted in which conceptual representation is predicted from multimodal vectors using a multilayer feed-forward neural network. The analysis of prediction performance demonstrates that the indirect grounding model achieves significantly better performance in predicting human conceptual representation of abstract words than other models that mimic competing views on abstract concepts, especially than the direct grounding model in which the visual vectors of abstract words are computed directly from the images of abstract concepts. This result lends some plausibility to the indirect grounding view as a cognitive mechanism of grounding abstract concepts.
Collapse
|
32
|
Lepori MA, Firestone C. Can You Hear Me
Now
? Sensitive Comparisons of Human and Machine Perception. Cogn Sci 2022; 46:e13191. [DOI: 10.1111/cogs.13191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/07/2022] [Accepted: 06/26/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Michael A. Lepori
- Department of Psychological & Brain Sciences Johns Hopkins University
| | - Chaz Firestone
- Department of Psychological & Brain Sciences Johns Hopkins University
| |
Collapse
|
33
|
Janini D, Hamblin C, Deza A, Konkle T. General object-based features account for letter perception. PLoS Comput Biol 2022; 18:e1010522. [PMID: 36155642 PMCID: PMC9536565 DOI: 10.1371/journal.pcbi.1010522] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 10/06/2022] [Accepted: 08/29/2022] [Indexed: 11/30/2022] Open
Abstract
After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To explore this question, we first measured the perceptual similarity of letters in two behavioral tasks, visual search and letter categorization. Then, we trained deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, as a way to operationalize possible specialized letter features and general object-based features, respectively. We found that the general object-based features more robustly correlated with the perceptual similarity of letters. We then operationalized additional forms of experience-dependent letter specialization by altering object-trained networks with varied forms of letter training; however, none of these forms of letter specialization improved the match to human behavior. Thus, our findings reveal that it is not necessary to appeal to specialized letter representations to account for perceptual similarity of letters. Instead, we argue that it is more likely that the perception of letters depends on domain-general visual features. For over a century, scientists have conducted behavioral experiments to investigate how the visual system recognizes letters, but it has proven difficult to propose a model of the feature space underlying this capacity. Here we leveraged recent advances in machine learning to model a wide variety of features ranging from specialized letter features to general object-based features. Across two large-scale behavioral experiments we find that general object-based features account well for letter perception, and that adding letter specialization did not improve the correspondence to human behavior. It is plausible that the ability to recognize letters largely relies on general visual features unaltered by letter learning.
Collapse
Affiliation(s)
- Daniel Janini
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| | - Chris Hamblin
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Arturo Deza
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
34
|
Ren Y, Bu X, Wang M, Gong Y, Wang J, Yang Y, Li G, Zhang M, Zhou Y, Han ST. Synaptic plasticity in self-powered artificial striate cortex for binocular orientation selectivity. Nat Commun 2022; 13:5585. [PMID: 36151070 PMCID: PMC9508249 DOI: 10.1038/s41467-022-33393-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
Get in-depth understanding of each part of visual pathway yields insights to conquer the challenges that classic computer vision is facing. Here, we first report the bioinspired striate cortex with binocular and orientation selective receptive field based on the crossbar array of self-powered memristors which is solution-processed monolithic all-perovskite system with each cross-point containing one CsFAPbI3 solar cell directly stacking on the CsPbBr2I memristor. The plasticity of self-powered memristor can be modulated by optical stimuli following triplet-STDP rules. Furthermore, plasticity of 3 × 3 flexible crossbar array of self-powered memristors has been successfully modulated based on generalized BCM learning rule for optical-encoded pattern recognition. Finally, we implemented artificial striate cortex with binocularity and orientation selectivity based on two simulated 9 × 9 self-powered memristors networks. The emulation of striate cortex with binocular and orientation selectivity will facilitate the brisk edge and corner detection for machine vision in the future applications. Designing efficient bio-inspired vision systems remains a challenge. Here, the authors report a bio-inspired striate visual cortex with binocular and orientation selective receptive field based on self-powered memristor to enable machine vision with brisk edge and corner detection in the future applications.
Collapse
Affiliation(s)
- Yanyun Ren
- Institute for Microscale Optoelectronics, Shenzhen University, Shenzhen, 518060, PR China.,Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, PR China
| | - Xiaobo Bu
- Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, PR China
| | - Ming Wang
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Yue Gong
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Junjie Wang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Yuyang Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Guijun Li
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Meng Zhang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, PR China
| | - Ye Zhou
- Institute for Advanced Study, Shenzhen University, Shenzhen, 518060, PR China
| | - Su-Ting Han
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, PR China.
| |
Collapse
|
35
|
Li Y, Wang T, Yang Y, Dai W, Wu Y, Li L, Han C, Zhong L, Li L, Wang G, Dou F, Xing D. Cascaded normalizations for spatial integration in the primary visual cortex of primates. Cell Rep 2022; 40:111221. [PMID: 35977486 DOI: 10.1016/j.celrep.2022.111221] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 11/03/2022] Open
Abstract
Spatial integration of visual information is an important function in the brain. However, neural computation for spatial integration in the visual cortex remains unclear. In this study, we recorded laminar responses in V1 of awake monkeys driven by visual stimuli with grating patches and annuli of different sizes. We find three important response properties related to spatial integration that are significantly different between input and output layers: neurons in output layers have stronger surround suppression, smaller receptive field (RF), and higher sensitivity to grating annuli partially covering their RFs. These interlaminar differences can be explained by a descriptive model composed of two global divisions (normalization) and a local subtraction. Our results suggest suppressions with cascaded normalizations (CNs) are essential for spatial integration and laminar processing in the visual cortex. Interestingly, the features of spatial integration in convolutional neural networks, especially in lower layers, are different from our findings in V1.
Collapse
Affiliation(s)
- Yang Li
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Tian Wang
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China; College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Yi Yang
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Weifeng Dai
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Yujie Wu
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Lianfeng Li
- China Academy of Launch Vehicle Technology, Beijing 100076, China
| | - Chuanliang Han
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Lvyan Zhong
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Liang Li
- Beijing Institute of Basic Medical Sciences, Beijing 100005, China
| | - Gang Wang
- Beijing Institute of Basic Medical Sciences, Beijing 100005, China
| | - Fei Dou
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China; College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Dajun Xing
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China.
| |
Collapse
|
36
|
Tang K, Chin M, Chun M, Xu Y. The contribution of object identity and configuration to scene representation in convolutional neural networks. PLoS One 2022; 17:e0270667. [PMID: 35763531 PMCID: PMC9239439 DOI: 10.1371/journal.pone.0270667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Scene perception involves extracting the identities of the objects comprising a scene in conjunction with their configuration (the spatial layout of the objects in the scene). How object identity and configuration information is weighted during scene processing and how this weighting evolves over the course of scene processing however, is not fully understood. Recent developments in convolutional neural networks (CNNs) have demonstrated their aptitude at scene processing tasks and identified correlations between processing in CNNs and in the human brain. Here we examined four CNN architectures (Alexnet, Resnet18, Resnet50, Densenet161) and their sensitivity to changes in object and configuration information over the course of scene processing. Despite differences among the four CNN architectures, across all CNNs, we observed a common pattern in the CNN's response to object identity and configuration changes. Each CNN demonstrated greater sensitivity to configuration changes in early stages of processing and stronger sensitivity to object identity changes in later stages. This pattern persists regardless of the spatial structure present in the image background, the accuracy of the CNN in classifying the scene, and even the task used to train the CNN. Importantly, CNNs' sensitivity to a configuration change is not the same as their sensitivity to any type of position change, such as that induced by a uniform translation of the objects without a configuration change. These results provide one of the first documentations of how object identity and configuration information are weighted in CNNs during scene processing.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Matthew Chin
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Marvin Chun
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
37
|
Sp A. Trailblazers in Neuroscience: Using compositionality to understand how parts combine in whole objects. Eur J Neurosci 2022; 56:4378-4392. [PMID: 35760552 PMCID: PMC10084036 DOI: 10.1111/ejn.15746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/27/2022]
Abstract
A fundamental question for any visual system is whether its image representation can be understood in terms of its components. Decomposing any image into components is challenging because there are many possible decompositions with no common dictionary, and enumerating them leads to a combinatorial explosion. Even in perception, many objects are readily seen as containing parts, but there are many exceptions. These exceptions include objects that are not perceived as containing parts, properties like symmetry that cannot be localized to any single part, and also special categories like words and faces whose perception is widely believed to be holistic. Here, I describe a novel approach we have used to address these issues and evaluate compositionality at the behavioral and neural levels. The key design principle is to create a large number of objects by combining a small number of pre-defined components in all possible ways. This allows for building component-based models that explain whole objects using a combination of these components. Importantly, any systematic error in model fits can be used to detect the presence of emergent or holistic properties. Using this approach, we have found that whole object representations are surprisingly predictable from their components, that some components are preferred to others in perception, and that emergent properties can be discovered or explained using compositional models. Thus, compositionality is a powerful approach for understanding how whole objects relate to their parts.
Collapse
Affiliation(s)
- Arun Sp
- Centre for Neuroscience, Indian Institute of Science Bangalore
| |
Collapse
|
38
|
Malhotra G, Dujmović M, Bowers JS. Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Comput Biol 2022; 18:e1009572. [PMID: 35560155 PMCID: PMC9132323 DOI: 10.1371/journal.pcbi.1009572] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 05/25/2022] [Accepted: 03/19/2022] [Indexed: 12/02/2022] Open
Abstract
Humans rely heavily on the shape of objects to recognise them. Recently, it has been argued that Convolutional Neural Networks (CNNs) can also show a shape-bias, provided their learning environment contains this bias. This has led to the proposal that CNNs provide good mechanistic models of shape-bias and, more generally, human visual processing. However, it is also possible that humans and CNNs show a shape-bias for very different reasons, namely, shape-bias in humans may be a consequence of architectural and cognitive constraints whereas CNNs show a shape-bias as a consequence of learning the statistics of the environment. We investigated this question by exploring shape-bias in humans and CNNs when they learn in a novel environment. We observed that, in this new environment, humans (i) focused on shape and overlooked many non-shape features, even when non-shape features were more diagnostic, (ii) learned based on only one out of multiple predictive features, and (iii) failed to learn when global features, such as shape, were absent. This behaviour contrasted with the predictions of a statistical inference model with no priors, showing the strong role that shape-bias plays in human feature selection. It also contrasted with CNNs that (i) preferred to categorise objects based on non-shape features, and (ii) increased reliance on these non-shape features as they became more predictive. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results suggest that shape-bias has a different source in humans and CNNs: while learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects. Any object consists of hundreds of visual features that can be used to recognise it. How do humans select which feature to use? Do we always choose features that are best at predicting the object? In a series of experiments using carefully designed stimuli, we find that humans frequently ignore many features that are clearly visible and highly predictive. This behaviour is statistically inefficient and we show that it contrasts with statistical inference models such as state-of-the-art neural networks. Unlike humans, these models learn to rely on the most predictive feature when trained on the same data. We argue that the reason underlying human behaviour may be a bias to look for features that are less hungry for cognitive resources and generalise better to novel instances. Models that incorporate cognitive constraints may not only allow us to better understand human vision but also help us develop machine learning models that are more robust to changes in incidental features of objects.
Collapse
Affiliation(s)
- Gaurav Malhotra
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- * E-mail:
| | - Marin Dujmović
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| | - Jeffrey S. Bowers
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
39
|
Spagnuolo EJ, Wilf P, Serre T. Decoding family-level features for modern and fossil leaves from computer-vision heat maps. AMERICAN JOURNAL OF BOTANY 2022; 109:768-788. [PMID: 35319778 DOI: 10.1002/ajb2.1842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 06/14/2023]
Abstract
PREMISE Angiosperm leaves present a classic identification problem due to their morphological complexity. Computer-vision algorithms can identify diagnostic regions in images, and heat map outputs illustrate those regions for identification, providing novel insights through visual feedback. We investigate the potential of analyzing leaf heat maps to reveal novel, human-friendly botanical information with applications for extant- and fossil-leaf identification. METHODS We developed a manual scoring system for hotspot locations on published computer-vision heat maps of cleared leaves that showed diagnostic regions for family identification. Heat maps of 3114 cleared leaves of 930 genera in 14 angiosperm families were analyzed. The top-5 and top-1 hotspot regions of highest diagnostic value were scored for 21 leaf locations. The resulting data were viewed using box plots and analyzed using cluster and principal component analyses. We manually identified similar features in fossil leaves to informally demonstrate potential fossil applications. RESULTS The method successfully mapped machine strategy using standard botanical language, and distinctive patterns emerged for each family. Hotspots were concentrated on secondary veins (Salicaceae, Myrtaceae, Anacardiaceae), tooth apices (Betulaceae, Rosaceae), and on the little-studied margins of untoothed leaves (Rubiaceae, Annonaceae, Ericaceae). Similar features drove the results from multivariate analyses. The results echo many traditional observations, while also showing that most diagnostic leaf features remain undescribed. CONCLUSIONS Machine-derived heat maps that initially appear to be dominated by noise can be translated into human-interpretable knowledge, highlighting paths forward for botanists and paleobotanists to discover new diagnostic botanical characters.
Collapse
Affiliation(s)
- Edward J Spagnuolo
- Department of Geosciences and Earth and Environmental Systems Institute, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
- Millennium Scholars Program, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
- Schreyer Honors College, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Peter Wilf
- Department of Geosciences and Earth and Environmental Systems Institute, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Thomas Serre
- Department of Cognitive, Linguistic and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, Rhode Island, 02912, USA
| |
Collapse
|
40
|
Neri P. Deep networks may capture biological behavior for shallow, but not deep, empirical characterizations. Neural Netw 2022; 152:244-266. [PMID: 35567948 DOI: 10.1016/j.neunet.2022.04.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 04/15/2022] [Accepted: 04/20/2022] [Indexed: 11/19/2022]
Abstract
We assess whether deep convolutional networks (DCN) can account for a most fundamental property of human vision: detection/discrimination of elementary image elements (bars) at different contrast levels. The human visual process can be characterized to varying degrees of "depth," ranging from percentage of correct detection to detailed tuning and operating characteristics of the underlying perceptual mechanism. We challenge deep networks with the same stimuli/tasks used with human observers and apply equivalent characterization of the stimulus-response coupling. In general, we find that popular DCN architectures do not account for signature properties of the human process. For shallow depth of characterization, some variants of network-architecture/training-protocol produce human-like trends; however, more articulate empirical descriptors expose glaring discrepancies. Networks can be coaxed into learning those richer descriptors by shadowing a human surrogate in the form of a tailored circuit perturbed by unstructured input, thus ruling out the possibility that human-model misalignment in standard protocols may be attributable to insufficient representational power. These results urge caution in assessing whether neural networks do or do not capture human behavior: ultimately, our ability to assess "success" in this area can only be as good as afforded by the depth of behavioral characterization against which the network is evaluated. We propose a novel set of metrics/protocols that impose stringent constraints on the evaluation of DCN behavior as an adequate approximation to biological processes.
Collapse
Affiliation(s)
- Peter Neri
- Laboratoire des Systèmes Perceptifs (UMR8248), École normale supérieure, PSL Research University, Paris, France.
| |
Collapse
|
41
|
Charles Leek E, Leonardis A, Heinke D. Deep neural networks and image classification in biological vision. Vision Res 2022; 197:108058. [PMID: 35487146 DOI: 10.1016/j.visres.2022.108058] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 10/18/2022]
Abstract
In this paper we consider recent advances in the use of deep convolutional neural networks to understanding biological vision. We focus on claims about the plausibility of feedforward deep convolutional neural networks (fDCNNs) as models of image classification in the biological system. Despite the putative similarity of these networks to some properties of the biological vision system, and the remarkable levels of performance accuracy of some fDCNNs, we argue that their plausibility as a framework for understanding image classification remains unclear. We highlight two key issues that we suggest are relevant to the evaluation of any form of DNN used to examine biological vision: (1) Network transparency under analysis - that is, the challenge of understanding what networks do, and how they do it. (2) Identifying appropriate benchmarks for comparing network performance and the biological system using both quantitative and qualitative performance measures. We show that there are important divergences between fDCNNs and biological vision that reflect fundamental differences in computational architectures, and representational structures, supporting image classification in these networks and the biological system.
Collapse
Affiliation(s)
| | | | - Dietmar Heinke
- School of Computer Science, University of Birmingham, UK
| |
Collapse
|
42
|
Hagio T, Poitrasson-Rivière A, Moody JB, Renaud JM, Arida-Moody L, Shah RV, Ficaro EP, Murthy VL. "Virtual" attenuation correction: improving stress myocardial perfusion SPECT imaging using deep learning. Eur J Nucl Med Mol Imaging 2022; 49:3140-3149. [PMID: 35312837 DOI: 10.1007/s00259-022-05735-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 02/13/2022] [Indexed: 12/26/2022]
Abstract
PURPOSE Myocardial perfusion imaging (MPI) using single-photon emission computed tomography (SPECT) is widely used for coronary artery disease (CAD) evaluation. Although attenuation correction is recommended to diminish image artifacts and improve diagnostic accuracy, approximately 3/4ths of clinical MPI worldwide remains non-attenuation-corrected (NAC). In this work, we propose a novel deep learning (DL) algorithm to provide "virtual" DL attenuation-corrected (DLAC) perfusion polar maps solely from NAC data without concurrent computed tomography (CT) imaging or additional scans. METHODS SPECT MPI studies (N = 11,532) with paired NAC and CTAC images were retrospectively identified. A convolutional neural network-based DL algorithm was developed and trained on half of the population to predict DLAC polar maps from NAC polar maps. Total perfusion deficit (TPD) was evaluated for all polar maps. TPDs from NAC and DLAC polar maps were compared to CTAC TPDs in linear regression analysis. Moreover, receiver-operating characteristic analysis was performed on NAC, CTAC, and DLAC TPDs to predict obstructive CAD as diagnosed from invasive coronary angiography. RESULTS DLAC TPDs exhibited significantly improved linear correlation (p < 0.001) with CTAC (R2 = 0.85) compared to NAC vs. CTAC (R2 = 0.68). The diagnostic performance of TPD was also improved with DLAC compared to NAC with an area under the curve (AUC) of 0.827 vs. 0.780 (p = 0.012) with no statistically significant difference between AUC for CTAC and DLAC. At 88% sensitivity, specificity was improved by 18.9% for DLAC and 25.6% for CTAC. CONCLUSIONS The proposed DL algorithm provided attenuation correction comparable to CTAC without the need for additional scans. Compared to conventional NAC perfusion imaging, DLAC significantly improved diagnostic accuracy.
Collapse
Affiliation(s)
- Tomoe Hagio
- INVIA Medical Imaging Solutions, 3025 Boardwalk St, Suite 200, Ann Arbor, MI, 48108, USA.
| | | | - Jonathan B Moody
- INVIA Medical Imaging Solutions, 3025 Boardwalk St, Suite 200, Ann Arbor, MI, 48108, USA
| | - Jennifer M Renaud
- INVIA Medical Imaging Solutions, 3025 Boardwalk St, Suite 200, Ann Arbor, MI, 48108, USA
| | - Liliana Arida-Moody
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Ravi V Shah
- Department of Cardiology, Massachusetts General Hospital, Boston, MA, USA
| | - Edward P Ficaro
- INVIA Medical Imaging Solutions, 3025 Boardwalk St, Suite 200, Ann Arbor, MI, 48108, USA.,Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Venkatesh L Murthy
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
43
|
Abstract
With the increase in artificial intelligence in real-world applications, there is interest in building hybrid systems that take both human and machine predictions into account. Previous work has shown the benefits of separately combining the predictions of diverse machine classifiers or groups of people. Using a Bayesian modeling framework, we extend these results by systematically investigating the factors that influence the performance of hybrid combinations of human and machine classifiers while taking into account the unique ways human and algorithmic confidence is expressed. Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybrid performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms.
Collapse
|
44
|
Baran SW, Bratcher N, Dennis J, Gaburro S, Karlsson EM, Maguire S, Makidon P, Noldus LPJJ, Potier Y, Rosati G, Ruiter M, Schaevitz L, Sweeney P, LaFollette MR. Emerging Role of Translational Digital Biomarkers Within Home Cage Monitoring Technologies in Preclinical Drug Discovery and Development. Front Behav Neurosci 2022; 15:758274. [PMID: 35242017 PMCID: PMC8885444 DOI: 10.3389/fnbeh.2021.758274] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/29/2021] [Indexed: 02/05/2023] Open
Abstract
In drug discovery and development, traditional assessment of human patients and preclinical subjects occurs at limited time points in potentially stressful surroundings (i.e., the clinic or a test arena), which can impact data quality and welfare. However, recent advances in remote digital monitoring technologies enable the assessment of human patients and preclinical subjects across multiple time points in familiar surroundings. The ability to monitor a patient throughout disease progression provides an opportunity for more relevant and efficient diagnosis as well as improved assessment of drug efficacy and safety. In preclinical in vivo animal models, these digital technologies allow for continuous, longitudinal, and non-invasive monitoring in the home environment. This manuscript provides an overview of digital monitoring technologies for use in preclinical studies including their history and evolution, current engagement through use cases, and impact of digital biomarkers (DBs) on drug discovery and the 3Rs. We also discuss barriers to implementation and strategies to overcome them. Finally, we address data consistency and technology standards from the perspective of technology providers, end-users, and subject matter experts. Overall, this review establishes an improved understanding of the value and implementation of digital biomarker (DB) technologies in preclinical research.
Collapse
Affiliation(s)
- Szczepan W. Baran
- Novartis Institutes for BioMedical Research, Cambridge, MA, United States
- *Correspondence: Szczepan W. Baran,
| | - Natalie Bratcher
- Office of Global Animal Welfare, AbbVie, North Chicago, IL, United States
| | - John Dennis
- United States Food and Drug Administration, Silver Spring, MD, United States
| | | | | | - Sean Maguire
- GlaxoSmithKline, Collegeville, PA, United States
| | - Paul Makidon
- Comparative Medicine, AbbVie, South San Francisco, CA, United States
| | - Lucas P. J. J. Noldus
- Noldus Information Technology BV, Wageningen, Netherlands
- Department of Biophysics, Radboud University, Nijmegen, Netherlands
| | - Yohann Potier
- Tessera Therapeutics Inc., Cambridge, MA, United States
| | | | - Matt Ruiter
- Unified Information Devices Inc., Lake Villa, IL, United States
| | - Laura Schaevitz
- Recursion Pharmaceuticals Inc., Salt Lake City, UT, United States
| | - Patrick Sweeney
- Actual Analytics Ltd., Edinburgh, United Kingdom
- Naason Science, Inc., Cheongju-si, South Korea
| | | |
Collapse
|
45
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
46
|
Konkle T, Alvarez GA. A self-supervised domain-general learning framework for human ventral stream representation. Nat Commun 2022; 13:491. [PMID: 35078981 PMCID: PMC8789817 DOI: 10.1038/s41467-022-28091-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 12/13/2021] [Indexed: 12/25/2022] Open
Abstract
Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to represent individual images, rather than categories, such that views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find that category information implicitly emerges in the local similarity structure of this feature space. Further, these models learn hierarchical features which capture the structure of brain responses across the human ventral visual stream, on par with category-supervised models. These results provide computational support for a domain-general framework guiding the formation of visual representation, where the proximate goal is not explicitly about category information, but is instead to learn unique, compressed descriptions of the visual world.
Collapse
Affiliation(s)
- Talia Konkle
- Department of Psychology & Center for Brain Science, Harvard University, Cambridge, MA, USA.
| | - George A Alvarez
- Department of Psychology & Center for Brain Science, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
47
|
Sa-Couto L, Wichert A. “What-Where” sparse distributed invariant representations of visual patterns. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06759-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
48
|
|
49
|
McGenity C, Wright A, Treanor D. AIM in Surgical Pathology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Deep learning-based robust automatic non-invasive measurement of blood pressure using Korotkoff sounds. Sci Rep 2021; 11:23365. [PMID: 34862399 PMCID: PMC8642395 DOI: 10.1038/s41598-021-02513-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 11/17/2021] [Indexed: 11/09/2022] Open
Abstract
This paper proposes a method that automatically measures non-invasive blood pressure (BP) based on an auscultatory approach using Korotkoff sounds (K-sounds). There have been methods utilizing K-sounds that were more accurate in general than those using cuff pressure signals only under well-controlled environments, but most were vulnerable to the measurement conditions and to external noise because blood pressure is simply determined based on threshold values in the sound signal. The proposed method enables robust and precise BP measurements by evaluating the probability that each sound pulse is an audible K-sound based on a deep learning using a convolutional neural network (CNN). Instead of classifying sound pulses into two categories, audible K-sounds and others, the proposed CNN model outputs probability values. These values in a Korotkoff cycle are arranged in time order, and the blood pressure is determined. The proposed method was tested with a dataset acquired in practice that occasionally contains considerable noise, which can degrade the performance of the threshold-based methods. The results demonstrate that the proposed method outperforms a previously reported CNN-based classification method using K-sounds. With larger amounts of various types of data, the proposed method can potentially achieve more precise and robust results.
Collapse
|