1
|
Arcaro M, Livingstone M. A Whole-Brain Topographic Ontology. Annu Rev Neurosci 2024; 47:21-40. [PMID: 38360565 DOI: 10.1146/annurev-neuro-082823-073701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
It is a common view that the intricate array of specialized domains in the ventral visual pathway is innately prespecified. What this review postulates is that it is not. We explore the origins of domain specificity, hypothesizing that the adult brain emerges from an interplay between a domain-general map-based architecture, shaped by intrinsic mechanisms, and experience. We argue that the most fundamental innate organization of cortex in general, and not just the visual pathway, is a map-based topography that governs how the environment maps onto the brain, how brain areas interconnect, and ultimately, how the brain processes information.
Collapse
Affiliation(s)
- Michael Arcaro
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
2
|
Farzmahdi A, Zarco W, Freiwald WA, Kriegeskorte N, Golan T. Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks. eLife 2024; 13:e90256. [PMID: 38661128 PMCID: PMC11142642 DOI: 10.7554/elife.90256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/25/2024] [Indexed: 04/26/2024] Open
Abstract
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g. left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
Collapse
Affiliation(s)
- Amirhossein Farzmahdi
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- School of Cognitive Sciences, Institute for Research in Fundamental SciencesTehranIslamic Republic of Iran
| | - Wilbert Zarco
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
| | - Winrich A Freiwald
- Laboratory of Neural Systems, The Rockefeller UniversityNew YorkUnited States
- The Center for Brains, Minds & MachinesCambridgeUnited States
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
- Department of Psychology, Columbia UniversityNew YorkUnited States
- Department of Neuroscience, Columbia UniversityNew YorkUnited States
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
| | - Tal Golan
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
3
|
Zhang Q, Zhang Y, Liu N, Sun X. Understanding of facial features in face perception: insights from deep convolutional neural networks. Front Comput Neurosci 2024; 18:1209082. [PMID: 38655070 PMCID: PMC11035738 DOI: 10.3389/fncom.2024.1209082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Introduction Face recognition has been a longstanding subject of interest in the fields of cognitive neuroscience and computer vision research. One key focus has been to understand the relative importance of different facial features in identifying individuals. Previous studies in humans have demonstrated the crucial role of eyebrows in face recognition, potentially even surpassing the importance of the eyes. However, eyebrows are not only vital for face recognition but also play a significant role in recognizing facial expressions and intentions, which might occur simultaneously and influence the face recognition process. Methods To address these challenges, our current study aimed to leverage the power of deep convolutional neural networks (DCNNs), an artificial face recognition system, which can be specifically tailored for face recognition tasks. In this study, we investigated the relative importance of various facial features in face recognition by selectively blocking feature information from the input to the DCNN. Additionally, we conducted experiments in which we systematically blurred the information related to eyebrows to varying degrees. Results Our findings aligned with previous human research, revealing that eyebrows are the most critical feature for face recognition, followed by eyes, mouth, and nose, in that order. The results demonstrated that the presence of eyebrows was more crucial than their specific high-frequency details, such as edges and textures, compared to other facial features, where the details also played a significant role. Furthermore, our results revealed that, unlike other facial features, the activation map indicated that the significance of eyebrows areas could not be readily adjusted to compensate for the absence of eyebrow information. This finding explains why masking eyebrows led to more significant deficits in face recognition performance. Additionally, we observed a synergistic relationship among facial features, providing evidence for holistic processing of faces within the DCNN. Discussion Overall, our study sheds light on the underlying mechanisms of face recognition and underscores the potential of using DCNNs as valuable tools for further exploration in this field.
Collapse
Affiliation(s)
- Qianqian Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Yueyi Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Ning Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyan Sun
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| |
Collapse
|
4
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.16.537079. [PMID: 37163104 PMCID: PMC10168209 DOI: 10.1101/2023.04.16.537079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that (1) in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and (2) lesioning these neurons by setting their output to 0 or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
5
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
6
|
Cadoni M, Lagorio A, Grosso E. Face detection based on a human attention guided multi-scale model. BIOLOGICAL CYBERNETICS 2023; 117:453-466. [PMID: 38038793 PMCID: PMC10752920 DOI: 10.1007/s00422-023-00978-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Abstract
Multiscale models are among the cutting-edge technologies used for face detection and recognition. An example is Deformable part-based models (DPMs), which encode a face as a multiplicity of local areas (parts) at different resolution scales and their hierarchical and spatial relationship. Although these models have proven successful and incredibly efficient in practical applications, the mutual position and spatial resolution of the parts involved are arbitrarily defined by a human specialist and the final choice of the optimal scales and parts is based on heuristics. This work seeks to understand whether a multi-scale model can take inspiration from human fixations to select specific areas and spatial scales. In more detail, it shows that a multi-scale pyramid representation can be adopted to extract interesting points, and that human attention can be used to select the points at the scales that lead to the best face detection performance. Human fixations can therefore provide a valid methodological basis on which to build a multiscale model, by selecting the spatial scales and areas of interest that are most relevant to humans.
Collapse
Affiliation(s)
- Marinella Cadoni
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy.
| | - Andrea Lagorio
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy
| | - Enrico Grosso
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy
| |
Collapse
|
7
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
8
|
Vinken K, Prince JS, Konkle T, Livingstone MS. The neural code for "face cells" is not face-specific. SCIENCE ADVANCES 2023; 9:eadg1736. [PMID: 37647400 PMCID: PMC10468123 DOI: 10.1126/sciadv.adg1736] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/27/2023] [Indexed: 09/01/2023]
Abstract
Face cells are neurons that respond more to faces than to non-face objects. They are found in clusters in the inferotemporal cortex, thought to process faces specifically, and, hence, studied using faces almost exclusively. Analyzing neural responses in and around macaque face patches to hundreds of objects, we found graded response profiles for non-face objects that predicted the degree of face selectivity and provided information on face-cell tuning beyond that from actual faces. This relationship between non-face and face responses was not predicted by color and simple shape properties but by information encoded in deep neural networks trained on general objects rather than face classification. These findings contradict the long-standing assumption that face versus non-face selectivity emerges from face-specific features and challenge the practice of focusing on only the most effective stimulus. They provide evidence instead that category-selective neurons are best understood by their tuning directions in a domain-general object space.
Collapse
Affiliation(s)
- Kasper Vinken
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jacob S. Prince
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA 02478, USA
| | | |
Collapse
|
9
|
Lee H, Choi W, Lee D, Paik SB. Comparison of visual quantities in untrained neural networks. Cell Rep 2023; 42:112900. [PMID: 37516959 DOI: 10.1016/j.celrep.2023.112900] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 05/25/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023] Open
Abstract
The ability to compare quantities of visual objects with two distinct measures, proportion and difference, is observed even in newborn animals. However, how this function originates in the brain, even before visual experience, remains unknown. Here, we propose a model in which neuronal tuning for quantity comparisons can arise spontaneously in completely untrained neural circuits. Using a biologically inspired model neural network, we find that single units selective to proportions and differences between visual quantities emerge in randomly initialized feedforward wirings and that they enable the network to perform quantity comparison tasks. Notably, we find that two distinct tunings to proportion and difference originate from a random summation of monotonic, nonlinear neural activities and that a slight difference in the nonlinear response function determines the type of measure. Our results suggest that visual quantity comparisons are primitive types of functions that can emerge spontaneously before learning in young brains.
Collapse
Affiliation(s)
- Hyeonsu Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Woochul Choi
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Dongil Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Se-Bum Paik
- Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea.
| |
Collapse
|
10
|
Hesse JK, Tsao DY. Functional modules for visual scene segmentation in macaque visual cortex. Proc Natl Acad Sci U S A 2023; 120:e2221122120. [PMID: 37523552 PMCID: PMC10410728 DOI: 10.1073/pnas.2221122120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/26/2023] [Indexed: 08/02/2023] Open
Abstract
Segmentation, the computation of object boundaries, is one of the most important steps in intermediate visual processing. Previous studies have reported cells across visual cortex that are modulated by segmentation features, but the functional role of these cells remains unclear. First, it is unclear whether these cells encode segmentation consistently since most studies used only a limited variety of stimulus types. Second, it is unclear whether these cells are organized into specialized modules or instead randomly scattered across the visual cortex: the former would lend credence to a functional role for putative segmentation cells. Here, we used fMRI-guided electrophysiology to systematically characterize the consistency and spatial organization of segmentation-encoding cells across the visual cortex. Using fMRI, we identified a set of patches in V2, V3, V3A, V4, and V4A that were more active for stimuli containing figures compared to ground, regardless of whether figures were defined by texture, motion, luminance, or disparity. We targeted these patches for single-unit recordings and found that cells inside segmentation patches were tuned to both figure-ground and borders more consistently across types of stimuli than cells in the visual cortex outside the patches. Remarkably, we found clusters of cells inside segmentation patches that showed the same border-ownership preference across all stimulus types. Finally, using a population decoding approach, we found that segmentation could be decoded with higher accuracy from segmentation patches than from either color-selective or control regions. Overall, our results suggest that segmentation signals are preferentially encoded in spatially discrete patches.
Collapse
Affiliation(s)
- Janis K. Hesse
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA94720
| | - Doris Y. Tsao
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA94720
| |
Collapse
|
11
|
Farzmahdi A, Zarco W, Freiwald W, Kriegeskorte N, Golan T. Emergence of brain-like mirror-symmetric viewpoint tuning in convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522909. [PMID: 36711779 PMCID: PMC9881894 DOI: 10.1101/2023.01.05.522909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g., left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.
Collapse
|
12
|
Gong C, Jing C, Chen X, Pun CM, Huang G, Saha A, Nieuwoudt M, Li HX, Hu Y, Wang S. Generative AI for brain image computing and brain network computing: a review. Front Neurosci 2023; 17:1203104. [PMID: 37383107 PMCID: PMC10293625 DOI: 10.3389/fnins.2023.1203104] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 05/22/2023] [Indexed: 06/30/2023] Open
Abstract
Recent years have witnessed a significant advancement in brain imaging techniques that offer a non-invasive approach to mapping the structure and function of the brain. Concurrently, generative artificial intelligence (AI) has experienced substantial growth, involving using existing data to create new content with a similar underlying pattern to real-world data. The integration of these two domains, generative AI in neuroimaging, presents a promising avenue for exploring various fields of brain imaging and brain network computing, particularly in the areas of extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks. Therefore, this study reviewed the advanced models, tasks, challenges, and prospects of brain imaging and brain network computing techniques and intends to provide a comprehensive picture of current generative AI techniques in brain imaging. This review is focused on novel methodological approaches and applications of related new methods. It discussed fundamental theories and algorithms of four classic generative models and provided a systematic survey and categorization of tasks, including co-registration, super-resolution, enhancement, classification, segmentation, cross-modality, brain network analysis, and brain decoding. This paper also highlighted the challenges and future directions of the latest work with the expectation that future research can be beneficial.
Collapse
Affiliation(s)
- Changwei Gong
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Changhong Jing
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Xuhang Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Chi Man Pun
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Guoli Huang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ashirbani Saha
- Department of Oncology and School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada
| | - Martin Nieuwoudt
- Institute for Biomedical Engineering, Stellenbosch University, Stellenbosch, South Africa
| | - Han-Xiong Li
- Department of Systems Engineering, City University of Hong Kong, Hong Kong, China
| | - Yong Hu
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Hong Kong, China
| | - Shuqiang Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Qayyum A, Ilahi I, Shamshad F, Boussaid F, Bennamoun M, Qadir J. Untrained Neural Network Priors for Inverse Imaging Problems: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6511-6536. [PMID: 36063506 DOI: 10.1109/tpami.2022.3204527] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In recent years, advancements in machine learning (ML) techniques, in particular, deep learning (DL) methods have gained a lot of momentum in solving inverse imaging problems, often surpassing the performance provided by hand-crafted approaches. Traditionally, analytical methods have been used to solve inverse imaging problems such as image restoration, inpainting, and superresolution. Unlike analytical methods for which the problem is explicitly defined and the domain knowledge is carefully engineered into the solution, DL models do not benefit from such prior knowledge and instead make use of large datasets to predict an unknown solution to the inverse problem. Recently, a new paradigm of training deep models using a single image, named untrained neural network prior (UNNP) has been proposed to solve a variety of inverse tasks, e.g., restoration and inpainting. Since then, many researchers have proposed various applications and variants of UNNP. In this paper, we present a comprehensive review of such studies and various UNNP applications for different tasks and highlight various open research problems which require further research.
Collapse
|
14
|
Kanwisher N, Khosla M, Dobs K. Using artificial neural networks to ask 'why' questions of minds and brains. Trends Neurosci 2023; 46:240-254. [PMID: 36658072 DOI: 10.1016/j.tins.2022.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/29/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Neuroscientists have long characterized the properties and functions of the nervous system, and are increasingly succeeding in answering how brains perform the tasks they do. But the question 'why' brains work the way they do is asked less often. The new ability to optimize artificial neural networks (ANNs) for performance on human-like tasks now enables us to approach these 'why' questions by asking when the properties of networks optimized for a given task mirror the behavioral and neural characteristics of humans performing the same task. Here we highlight the recent success of this strategy in explaining why the visual and auditory systems work the way they do, at both behavioral and neural levels.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Meenakshi Khosla
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.
| |
Collapse
|
15
|
Zou W, Li C, Huang H. Ensemble perspective for understanding temporal credit assignment. Phys Rev E 2023; 107:024307. [PMID: 36932505 DOI: 10.1103/physreve.107.024307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
Recurrent neural networks are widely used for modeling spatiotemporal sequences in both nature language processing and neural population dynamics. However, understanding the temporal credit assignment is hard. Here, we propose that each individual connection in the recurrent computation is modeled by a spike and slab distribution, rather than a precise weight value. We then derive the mean-field algorithm to train the network at the ensemble level. The method is then applied to classify handwritten digits when pixels are read in sequence, and to the multisensory integration task that is a fundamental cognitive function of animals. Our model reveals important connections that determine the overall performance of the network. The model also shows how spatiotemporal information is processed through the hyperparameters of the distribution, and moreover reveals distinct types of emergent neural selectivity. To provide a mechanistic analysis of the ensemble learning, we first derive an analytic solution of the learning at the infinitely large network limit. We then carry out a low-dimensional projection of both neural and synaptic dynamics, analyze symmetry breaking in the parameter space, and finally demonstrate the role of stochastic plasticity in the recurrent computation. Therefore, our study sheds light on mechanisms of how weight uncertainty impacts the temporal credit assignment in recurrent neural networks from the ensemble perspective.
Collapse
Affiliation(s)
- Wenxuan Zou
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Chan Li
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Haiping Huang
- PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
- Guangdong Provincial Key Laboratory of Magnetoelectric Physics and Devices, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| |
Collapse
|
16
|
Cheon J, Baek S, Paik SB. Invariance of object detection in untrained deep neural networks. Front Comput Neurosci 2022; 16:1030707. [DOI: 10.3389/fncom.2022.1030707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
The ability to perceive visual objects with various types of transformations, such as rotation, translation, and scaling, is crucial for consistent object recognition. In machine learning, invariant object detection for a network is often implemented by augmentation with a massive number of training images, but the mechanism of invariant object detection in biological brains—how invariance arises initially and whether it requires visual experience—remains elusive. Here, using a model neural network of the hierarchical visual pathway of the brain, we show that invariance of object detection can emerge spontaneously in the complete absence of learning. First, we found that units selective to a particular object class arise in randomly initialized networks even before visual training. Intriguingly, these units show robust tuning to images of each object class under a wide range of image transformation types, such as viewpoint rotation. We confirmed that this “innate” invariance of object selectivity enables untrained networks to perform an object-detection task robustly, even with images that have been significantly modulated. Our computational model predicts that invariant object tuning originates from combinations of non-invariant units via random feedforward projections, and we confirmed that the predicted profile of feedforward projections is observed in untrained networks. Our results suggest that invariance of object detection is an innate characteristic that can emerge spontaneously in random feedforward networks.
Collapse
|
17
|
Chen Y, Allison O, Green HL, Kuschner ES, Liu S, Kim M, Slinger M, Mol K, Chiang T, Bloy L, Roberts TPL, Edgar JC. Maturational trajectory of fusiform gyrus neural activity when viewing faces: From 4 months to 4 years old. Front Hum Neurosci 2022; 16:917851. [PMID: 36034116 PMCID: PMC9411513 DOI: 10.3389/fnhum.2022.917851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/20/2022] [Indexed: 11/26/2022] Open
Abstract
Infant and young child electrophysiology studies have provided information regarding the maturation of face-encoding neural processes. A limitation of previous research is that very few studies have examined face-encoding processes in children 12-48 months of age, a developmental period characterized by rapid changes in the ability to encode facial information. The present study sought to fill this gap in the literature via a longitudinal study examining the maturation of a primary node in the face-encoding network-the left and right fusiform gyrus (FFG). Whole-brain magnetoencephalography (MEG) data were obtained from 25 infants with typical development at 4-12 months, and with follow-up MEG exams every ∼12 months until 3-4 years old. Children were presented with color images of Face stimuli and visual noise images (matched on spatial frequency, color distribution, and outer contour) that served as Non-Face stimuli. Using distributed source modeling, left and right face-sensitive FFG evoked waveforms were obtained from each child at each visit, with face-sensitive activity identified via examining the difference between the Non-Face and Face FFG timecourses. Before 24 months of age (Visits 1 and 2) the face-sensitive FFG M290 response was the dominant response, observed in the left and right FFG ∼250-450 ms post-stimulus. By 3-4 years old (Visit 4), the left and right face-sensitive FFG response occurred at a latency consistent with a face-sensitive M170 response ∼100-250 ms post-stimulus. Face-sensitive left and right FFG peak latencies decreased as a function of age (with age explaining greater than 70% of the variance in face-sensitive FFG latency), and with an adult-like FFG latency observed at 3-4 years old. Study findings thus showed face-sensitive FFG maturational changes across the first 4 years of life. Whereas a face-sensitive M290 response was observed under 2 years of age, by 3-4 years old, an adult-like face-sensitive M170 response was observed bilaterally. Future studies evaluating the maturation of face-sensitive FFG activity in infants at risk for neurodevelopmental disorders are of interest, with the present findings suggesting age-specific face-sensitive neural markers of a priori interest.
Collapse
Affiliation(s)
- Yuhan Chen
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Olivia Allison
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Heather L. Green
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Emily S. Kuschner
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Song Liu
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Mina Kim
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Michelle Slinger
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Kylie Mol
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Taylor Chiang
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Luke Bloy
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Timothy P. L. Roberts
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - J. Christopher Edgar
- Lurie Family Foundations MEG Imaging Center, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
18
|
Lu SY, Wang SH, Zhang YD. SAFNet: A deep spatial attention network with classifier fusion for breast cancer detection. Comput Biol Med 2022; 148:105812. [PMID: 35834967 DOI: 10.1016/j.compbiomed.2022.105812] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/15/2022] [Accepted: 07/03/2022] [Indexed: 11/28/2022]
Abstract
Breast cancer is a top dangerous killer for women. An accurate early diagnosis of breast cancer is the primary step for treatment. A novel breast cancer detection model called SAFNet is proposed based on ultrasound images and deep learning. We employ a pre-trained ResNet-18 embedded with the spatial attention mechanism as the backbone model. Three randomized network models are trained for prediction in the SAFNet, which are fused by majority voting to produce more accurate results. A public ultrasound image dataset is utilized to evaluate the generalization ability of our SAFNet using 5-fold cross-validation. The simulation experiments reveal that the SAFNet can produce higher classification results compared with four existing breast cancer classification methods. Therefore, our SAFNet is an accurate tool to detect breast cancer that can be applied in clinical diagnosis.
Collapse
Affiliation(s)
- Si-Yuan Lu
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK.
| | - Shui-Hua Wang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK.
| | - Yu-Dong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK.
| |
Collapse
|
19
|
Face identity coding in the deep neural network and primate brain. Commun Biol 2022; 5:611. [PMID: 35725902 PMCID: PMC9209415 DOI: 10.1038/s42003-022-03557-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/01/2022] [Indexed: 01/01/2023] Open
Abstract
A central challenge in face perception research is to understand how neurons encode face identities. This challenge has not been met largely due to the lack of simultaneous access to the entire face processing neural network and the lack of a comprehensive multifaceted model capable of characterizing a large number of facial features. Here, we addressed this challenge by conducting in silico experiments using a pre-trained face recognition deep neural network (DNN) with a diverse array of stimuli. We identified a subset of DNN units selective to face identities, and these identity-selective units demonstrated generalized discriminability to novel faces. Visualization and manipulation of the network revealed the importance of identity-selective units in face recognition. Importantly, using our monkey and human single-neuron recordings, we directly compared the response of artificial units with real primate neurons to the same stimuli and found that artificial units shared a similar representation of facial features as primate neurons. We also observed a region-based feature coding mechanism in DNN units as in human neurons. Together, by directly linking between artificial and primate neural systems, our results shed light on how the primate brain performs face recognition tasks.
Collapse
|
20
|
Tian F, Xie H, Song Y, Hu S, Liu J. The Face Inversion Effect in Deep Convolutional Neural Networks. Front Comput Neurosci 2022; 16:854218. [PMID: 35615057 PMCID: PMC9124772 DOI: 10.3389/fncom.2022.854218] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
The face inversion effect (FIE) is a behavioral marker of face-specific processing that the recognition of inverted faces is disproportionately disrupted than that of inverted non-face objects. One hypothesis is that while upright faces are represented by face-specific mechanism, inverted faces are processed as objects. However, evidence from neuroimaging studies is inconclusive, possibly because the face system, such as the fusiform face area, is interacted with the object system, and therefore the observation from the face system may indirectly reflect influences from the object system. Here we examined the FIE in an artificial face system, visual geometry group network-face (VGG-Face), a deep convolutional neural network (DCNN) specialized for identifying faces. In line with neuroimaging studies on humans, a stronger FIE was found in VGG-Face than that in DCNN pretrained for processing objects. Critically, further classification error analysis revealed that in VGG-Face, inverted faces were miscategorized as objects behaviorally, and the analysis on internal representations revealed that VGG-Face represented inverted faces in a similar fashion as objects. In short, our study supported the hypothesis that inverted faces are represented as objects in a pure face system.
Collapse
Affiliation(s)
- Fang Tian
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| | - Hailun Xie
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Yiying Song
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
- *Correspondence: Yiying Song
| | - Siyuan Hu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Jia Liu
- Department of Psychology & Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| |
Collapse
|