1
|
Meng L, Tang Z, Liu Y. Reconstruction of natural images from human fMRI using a three-stage multi-level deep fusion model. J Neurosci Methods 2024; 411:110269. [PMID: 39222796 DOI: 10.1016/j.jneumeth.2024.110269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/28/2024] [Accepted: 08/25/2024] [Indexed: 09/04/2024]
Abstract
BACKGROUND Image reconstruction is a critical task in brain decoding research, primarily utilizing functional magnetic resonance imaging (fMRI) data. However, due to challenges such as limited samples in fMRI data, the quality of reconstruction results often remains poor. NEW METHOD We proposed a three-stage multi-level deep fusion model (TS-ML-DFM). The model employed a three-stage training process, encompassing components such as image encoders, generators, discriminators, and fMRI encoders. In this method, we incorporated distinct supplementary features derived separately from depth images and original images. Additionally, the method integrated several components, including a random shift module, dual attention module, and multi-level feature fusion module. RESULTS In both qualitative and quantitative comparisons on the Horikawa17 and VanGerven10 datasets, our method exhibited excellent performance. COMPARISON WITH EXISTING METHODS For example, on the primary Horikawa17 dataset, our method was compared with other leading methods based on metrics the average hash value, histogram similarity, mutual information, structural similarity accuracy, AlexNet(2), AlexNet(5), and pairwise human perceptual similarity accuracy. Compared to the second-ranked results in each metric, the proposed method achieved improvements of 0.99 %, 3.62 %, 3.73 %, 2.45 %, 3.51 %, 0.62 %, and 1.03 %, respectively. In terms of the SwAV top-level semantic metric, a substantial improvement of 10.53 % was achieved compared to the second-ranked result in the pixel-level reconstruction methods. CONCLUSIONS The TS-ML-DFM method proposed in this study, when applied to decoding brain visual patterns using fMRI data, has outperformed previous algorithms, thereby facilitating further advancements in research within this field.
Collapse
Affiliation(s)
- Lu Meng
- School of Information Science and Engineering, Northeastern University, Shenyang 110819, China.
| | - Zhenxuan Tang
- School of Information Science and Engineering, Northeastern University, Shenyang 110819, China
| | | |
Collapse
|
2
|
Susan S. Neuroscientific insights about computer vision models: a concise review. BIOLOGICAL CYBERNETICS 2024:10.1007/s00422-024-00998-9. [PMID: 39382577 DOI: 10.1007/s00422-024-00998-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 09/12/2024] [Indexed: 10/10/2024]
Abstract
The development of biologically-inspired computational models has been the focus of study ever since the artificial neuron was introduced by McCulloch and Pitts in 1943. However, a scrutiny of literature reveals that most attempts to replicate the highly efficient and complex biological visual system have been futile or have met with limited success. The recent state-of the-art computer vision models, such as pre-trained deep neural networks and vision transformers, may not be biologically inspired per se. Nevertheless, certain aspects of biological vision are still found embedded, knowingly or unknowingly, in the architecture and functioning of these models. This paper explores several principles related to visual neuroscience and the biological visual pathway that resonate, in some manner, in the architectural design and functioning of contemporary computer vision models. The findings of this survey can provide useful insights for building futuristic bio-inspired computer vision models. The survey is conducted from a historical perspective, tracing the biological connections of computer vision models starting with the basic artificial neuron to modern technologies such as deep convolutional neural network (CNN) and spiking neural networks (SNN). One spotlight of the survey is a discussion on biologically plausible neural networks and bio-inspired unsupervised learning mechanisms adapted for computer vision tasks in recent times.
Collapse
Affiliation(s)
- Seba Susan
- Department of Information Technology, Delhi Technological University, Delhi, India.
| |
Collapse
|
3
|
Yin X, Wu Z, Wang H. A novel DRL-guided sparse voxel decoding model for reconstructing perceived images from brain activity. J Neurosci Methods 2024; 412:110292. [PMID: 39299579 DOI: 10.1016/j.jneumeth.2024.110292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/31/2024] [Accepted: 09/15/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Due to the sparse encoding character of the human visual cortex and the scarcity of paired training samples for {images, fMRIs}, voxel selection is an effective means of reconstructing perceived images from fMRI. However, the existing data-driven voxel selection methods have not achieved satisfactory results. NEW METHOD Here, a novel deep reinforcement learning-guided sparse voxel (DRL-SV) decoding model is proposed to reconstruct perceived images from fMRI. We innovatively describe voxel selection as a Markov decision process (MDP), training agents to select voxels that are highly involved in specific visual encoding. RESULTS Experimental results on two public datasets verify the effectiveness of the proposed DRL-SV, which can accurately select voxels highly involved in neural encoding, thereby improving the quality of visual image reconstruction. COMPARISON WITH EXISTING METHODS We qualitatively and quantitatively compared our results with the state-of-the-art (SOTA) methods, getting better reconstruction results. We compared the proposed DRL-SV with traditional data-driven baseline methods, obtaining sparser voxel selection results, but better reconstruction performance. CONCLUSIONS DRL-SV can accurately select voxels involved in visual encoding on few-shot, compared to data-driven voxel selection methods. The proposed decoding model provides a new avenue to improving the image reconstruction quality of the primary visual cortex.
Collapse
Affiliation(s)
- Xu Yin
- Key Laboratory of Child Development and Learning Science of Ministry of Education, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Zhengping Wu
- School of Innovations, Sanjiang University, China; School of Electronic Science and Engineering, Nanjing University, China
| | - Haixian Wang
- Key Laboratory of Child Development and Learning Science of Ministry of Education, School of Biological Science & Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China.
| |
Collapse
|
4
|
Lifanov-Carr J, Griffiths BJ, Linde-Domingo J, Ferreira CS, Wilson M, Mayhew SD, Charest I, Wimber M. Reconstructing Spatiotemporal Trajectories of Visual Object Memories in the Human Brain. eNeuro 2024; 11:ENEURO.0091-24.2024. [PMID: 39242212 PMCID: PMC11439564 DOI: 10.1523/eneuro.0091-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 07/03/2024] [Accepted: 08/09/2024] [Indexed: 09/09/2024] Open
Abstract
How the human brain reconstructs, step-by-step, the core elements of past experiences is still unclear. Here, we map the spatiotemporal trajectories along which visual object memories are reconstructed during associative recall. Specifically, we inquire whether retrieval reinstates feature representations in a copy-like but reversed direction with respect to the initial perceptual experience, or alternatively, this reconstruction involves format transformations and regions beyond initial perception. Participants from two cohorts studied new associations between verbs and randomly paired object images, and subsequently recalled the objects when presented with the corresponding verb cue. We first analyze multivariate fMRI patterns to map where in the brain high- and low-level object features can be decoded during perception and retrieval, showing that retrieval is dominated by conceptual features, represented in comparatively late visual and parietal areas. A separately acquired EEG dataset is then used to track the temporal evolution of the reactivated patterns using similarity-based EEG-fMRI fusion. This fusion suggests that memory reconstruction proceeds from anterior frontotemporal to posterior occipital and parietal regions, in line with a conceptual-to-perceptual gradient but only partly following the same trajectories as during perception. Specifically, a linear regression statistically confirms that the sequential activation of ventral visual stream regions is reversed between image perception and retrieval. The fusion analysis also suggests an information relay to frontoparietal areas late during retrieval. Together, the results shed light onto the temporal dynamics of memory recall and the transformations that the information undergoes between the initial experience and its later reconstruction from memory.
Collapse
Affiliation(s)
- Julia Lifanov-Carr
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Benjamin J Griffiths
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Juan Linde-Domingo
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
- Department of Experimental Psychology, Mind, Brain and Behavior Research Center (CIMCYC), University of Granada, 18011 Granada, Spain
- Center for Adaptive Rationality, Max Planck Institute for Human Development, 14195 Berlin, Germany
| | - Catarina S Ferreira
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Martin Wilson
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Stephen D Mayhew
- Institute of Health and Neurodevelopment (IHN), School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Quebec H2V 2S9, Canada
| | - Maria Wimber
- School of Psychology and Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham B15 2TT, United Kingdom
- School of Psychology & Neuroscience and Centre for Cognitive Neuroimaging (CCNi), University of Glasgow, Glasgow G12 8QB, United Kingdom
| |
Collapse
|
5
|
Jang G, Kragel PA. Understanding human amygdala function with artificial neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605621. [PMID: 39131372 PMCID: PMC11312467 DOI: 10.1101/2024.07.29.605621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
The amygdala is a cluster of subcortical nuclei that receives diverse sensory inputs and projects to the cortex, midbrain and other subcortical structures. Numerous accounts of amygdalar contributions to social and emotional behavior have been offered, yet an overarching description of amygdala function remains elusive. Here we adopt a computationally explicit framework that aims to develop a model of amygdala function based on the types of sensory inputs it receives, rather than individual constructs such as threat, arousal, or valence. Characterizing human fMRI signal acquired as participants viewed a full-length film, we developed encoding models that predict both patterns of amygdala activity and self-reported valence evoked by naturalistic images. We use deep image synthesis to generate artificial stimuli that distinctly engage encoding models of amygdala subregions that systematically differ from one another in terms of their low-level visual properties. These findings characterize how the amygdala compresses high-dimensional sensory inputs into low-dimensional representations relevant for behavior.
Collapse
|
6
|
Lahner B, Dwivedi K, Iamshchinina P, Graumann M, Lascelles A, Roig G, Gifford AT, Pan B, Jin S, Ratan Murty NA, Kay K, Oliva A, Cichy R. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat Commun 2024; 15:6241. [PMID: 39048577 PMCID: PMC11269733 DOI: 10.1038/s41467-024-50310-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/04/2024] [Indexed: 07/27/2024] Open
Abstract
Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.
Collapse
Affiliation(s)
- Benjamin Lahner
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
| | - Kshitij Dwivedi
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Polina Iamshchinina
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Monika Graumann
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Alex Lascelles
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Gemma Roig
- Department of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany
- The Hessian Center for AI (hessian.AI), Darmstadt, Germany
| | | | - Bowen Pan
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - SouYoung Jin
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - N Apurva Ratan Murty
- Department of Brain and Cognitive Science, MIT, Cambridge, MA, USA
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Radoslaw Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
7
|
Ferrante M, Boccato T, Passamonti L, Toschi N. Retrieving and reconstructing conceptually similar images from fMRI with latent diffusion models and a neuro-inspired brain decoding model. J Neural Eng 2024; 21:046001. [PMID: 38885689 DOI: 10.1088/1741-2552/ad593c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/17/2024] [Indexed: 06/20/2024]
Abstract
Objective.Brain decoding is a field of computational neuroscience that aims to infer mental states or internal representations of perceptual inputs from measurable brain activity. This study proposes a novel approach to brain decoding that relies on semantic and contextual similarity.Approach.We use several functional magnetic resonance imaging (fMRI) datasets of natural images as stimuli and create a deep learning decoding pipeline inspired by the bottom-up and top-down processes in human vision. Our pipeline includes a linear brain-to-feature model that maps fMRI activity to semantic visual stimuli features. We assume that the brain projects visual information onto a space that is homeomorphic to the latent space of last layer of a pretrained neural network, which summarizes and highlights similarities and differences between concepts. These features are categorized in the latent space using a nearest-neighbor strategy, and the results are used to retrieve images or condition a generative latent diffusion model to create novel images.Main results.We demonstrate semantic classification and image retrieval on three different fMRI datasets: Generic Object Decoding (vision perception and imagination), BOLD5000, and NSD. In all cases, a simple mapping between fMRI and a deep semantic representation of the visual stimulus resulted in meaningful classification and retrieved or generated images. We assessed quality using quantitative metrics and a human evaluation experiment that reproduces the multiplicity of conscious and unconscious criteria that humans use to evaluate image similarity. Our method achieved correct evaluation in over 80% of the test set.Significance.Our study proposes a novel approach to brain decoding that relies on semantic and contextual similarity. The results demonstrate that measurable neural correlates can be linearly mapped onto the latent space of a neural network to synthesize images that match the original content. These findings have implications for both cognitive neuroscience and artificial intelligence.
Collapse
Affiliation(s)
- Matteo Ferrante
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Tommaso Boccato
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Luca Passamonti
- CNR, Istituto di Bioimmagini e Fisiologia Molecolare, Milan, Italy
| | - Nicola Toschi
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
- Martinos Center for Biomedical Imaging, MGH and Harvard Medical School, Boston, MA, United States of America
| |
Collapse
|
8
|
Miao HY, Tong F. Convolutional neural network models applied to neuronal responses in macaque V1 reveal limited nonlinear processing. J Vis 2024; 24:1. [PMID: 38829629 PMCID: PMC11156204 DOI: 10.1167/jov.24.6.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/03/2024] [Indexed: 06/05/2024] Open
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple nonlinearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more nonlinear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven nonlinear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although the predictive accuracy of VGG-19 was somewhat better than that of standard AlexNet, we found that a modified version of AlexNet could match the performance of VGG-19 after only a few nonlinear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for nonlinear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few nonlinear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
9
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
10
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. Nat Commun 2024; 15:1989. [PMID: 38443349 PMCID: PMC10915141 DOI: 10.1038/s41467-024-45679-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 01/30/2024] [Indexed: 03/07/2024] Open
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide multi-faceted neurocomputational evidence that blurry visual experiences may be critical for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea.
| | - Frank Tong
- Department of Psychology, Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
11
|
Koide-Majima N, Nishimoto S, Majima K. Mental image reconstruction from human brain activity: Neural decoding of mental imagery via deep neural network-based Bayesian estimation. Neural Netw 2024; 170:349-363. [PMID: 38016230 DOI: 10.1016/j.neunet.2023.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 09/22/2023] [Accepted: 11/08/2023] [Indexed: 11/30/2023]
Abstract
Visual images observed by humans can be reconstructed from their brain activity. However, the visualization (externalization) of mental imagery is challenging. Only a few studies have reported successful visualization of mental imagery, and their visualizable images have been limited to specific domains such as human faces or alphabetical letters. Therefore, visualizing mental imagery for arbitrary natural images stands as a significant milestone. In this study, we achieved this by enhancing a previous method. Specifically, we demonstrated that the visual image reconstruction method proposed in the seminal study by Shen et al. (2019) heavily relied on low-level visual information decoded from the brain and could not efficiently utilize the semantic information that would be recruited during mental imagery. To address this limitation, we extended the previous method to a Bayesian estimation framework and introduced the assistance of semantic information into it. Our proposed framework successfully reconstructed both seen images (i.e., those observed by the human eye) and imagined images from brain activity. Quantitative evaluation showed that our framework could identify seen and imagined images highly accurately compared to the chance accuracy (seen: 90.7%, imagery: 75.6%, chance accuracy: 50.0%). In contrast, the previous method could only identify seen images (seen: 64.3%, imagery: 50.4%). These results suggest that our framework would provide a unique tool for directly investigating the subjective contents of the brain such as illusions, hallucinations, and dreams.
Collapse
Affiliation(s)
- Naoko Koide-Majima
- Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology, Osaka 565-0871, Japan; Graduate School of Frontier Biosciences, Osaka University, Osaka 565-0871, Japan
| | - Shinji Nishimoto
- Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology, Osaka 565-0871, Japan; Graduate School of Frontier Biosciences, Osaka University, Osaka 565-0871, Japan; Graduate School of Medicine, Osaka University, Osaka 565-0871, Japan
| | - Kei Majima
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba 263-8555, Japan; JST PRESTO, Saitama 332-0012, Japan.
| |
Collapse
|
12
|
Guimarães P, Serranho P, Duarte JV, Crisóstomo J, Moreno C, Gomes L, Bernardes R, Castelo-Branco M. The hemodynamic response function as a type 2 diabetes biomarker: a data-driven approach. Front Neuroinform 2024; 17:1321178. [PMID: 38250018 PMCID: PMC10796780 DOI: 10.3389/fninf.2023.1321178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/14/2023] [Indexed: 01/23/2024] Open
Abstract
Introduction There is a need to better understand the neurophysiological changes associated with early brain dysfunction in Type 2 diabetes mellitus (T2DM) before vascular or structural lesions. Our aim was to use a novel unbiased data-driven approach to detect and characterize hemodynamic response function (HRF) alterations in T2DM patients, focusing on their potential as biomarkers. Methods We meshed task-based event-related (visual speed discrimination) functional magnetic resonance imaging with DL to show, from an unbiased perspective, that T2DM patients' blood-oxygen-level dependent response is altered. Relevance analysis determined which brain regions were more important for discrimination. We combined explainability with deconvolution generalized linear model to provide a more accurate picture of the nature of the neural changes. Results The proposed approach to discriminate T2DM patients achieved up to 95% accuracy. Higher performance was achieved at higher stimulus (speed) contrast, showing a direct relationship with stimulus properties, and in the hemispherically dominant left visual hemifield, demonstrating biological interpretability. Differences are explained by physiological asymmetries in cortical spatial processing (right hemisphere dominance) and larger neural signal-to-noise ratios related to stimulus contrast. Relevance analysis revealed the most important regions for discrimination, such as extrastriate visual cortex, parietal cortex, and insula. These are disease/task related, providing additional evidence for pathophysiological significance. Our data-driven design allowed us to compute the unbiased HRF without assumptions. Conclusion We can accurately differentiate T2DM patients using a data-driven classification of the HRF. HRF differences hold promise as biomarkers and could contribute to a deeper understanding of neurophysiological changes associated with T2DM.
Collapse
Affiliation(s)
- Pedro Guimarães
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
| | - Pedro Serranho
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
- Department of Sciences and Technology, Universidade Aberta, Lisbon, Portugal
| | - João V. Duarte
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
- University of Coimbra, Faculty of Medicine (FMUC), Coimbra, Portugal
| | - Joana Crisóstomo
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
| | - Carolina Moreno
- Department of Endocrinology, University Hospital of Coimbra (CHUC), Coimbra, Portugal
| | - Leonor Gomes
- Department of Endocrinology, University Hospital of Coimbra (CHUC), Coimbra, Portugal
| | - Rui Bernardes
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
- University of Coimbra, Clinical Academic Center of Coimbra (CACC), Faculty of Medicine (FMUC), Coimbra, Portugal
| | - Miguel Castelo-Branco
- University of Coimbra, Coimbra Institute for Biomedical Imaging and Translational Research (CIBIT), Institute for Nuclear Sciences Applied to Health (ICNAS), Coimbra, Portugal
- University of Coimbra, Clinical Academic Center of Coimbra (CACC), Faculty of Medicine (FMUC), Coimbra, Portugal
| |
Collapse
|
13
|
Hopp FR, Amir O, Fisher JT, Grafton S, Sinnott-Armstrong W, Weber R. Moral foundations elicit shared and dissociable cortical activation modulated by political ideology. Nat Hum Behav 2023; 7:2182-2198. [PMID: 37679440 DOI: 10.1038/s41562-023-01693-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 08/03/2023] [Indexed: 09/09/2023]
Abstract
Moral foundations theory (MFT) holds that moral judgements are driven by modular and ideologically variable moral foundations but where and how these foundations are represented in the brain and shaped by political beliefs remains an open question. Using a moral vignette judgement task (n = 64), we probed the neural (dis)unity of moral foundations. Univariate analyses revealed that moral judgement of moral foundations, versus conventional norms, reliably recruits core areas implicated in theory of mind. Yet, multivariate pattern analysis demonstrated that each moral foundation elicits dissociable neural representations distributed throughout the cortex. As predicted by MFT, individuals' liberal or conservative orientation modulated neural responses to moral foundations. Our results confirm that each moral foundation recruits domain-general mechanisms of social cognition but also has a dissociable neural signature malleable by sociomoral experience. We discuss these findings in view of unified versus dissociable accounts of morality and their neurological support for MFT.
Collapse
Affiliation(s)
- Frederic R Hopp
- Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, the Netherlands
| | - Ori Amir
- Pomona College, Claremont, CA, USA
| | - Jacob T Fisher
- Department of Communication, Michigan State University, Lansing, MI, USA
| | - Scott Grafton
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, USA
| | | | - René Weber
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, USA.
- Department of Communication, Media Neuroscience Lab, University of California, Santa Barbara, CA, USA.
- School of Communication and Media, Ewha Womans University, Seoul, South Korea.
| |
Collapse
|
14
|
Rastegarnia S, St-Laurent M, DuPre E, Pinsard B, Bellec P. Brain decoding of the Human Connectome Project tasks in a dense individual fMRI dataset. Neuroimage 2023; 283:120395. [PMID: 37832707 DOI: 10.1016/j.neuroimage.2023.120395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023] Open
Abstract
Brain decoding aims to infer cognitive states from patterns of brain activity. Substantial inter-individual variations in functional brain organization challenge accurate decoding performed at the group level. In this paper, we tested whether accurate brain decoding models can be trained entirely at the individual level. We trained several classifiers on a dense individual functional magnetic resonance imaging (fMRI) dataset for which six participants completed the entire Human Connectome Project (HCP) task battery >13 times over ten separate fMRI sessions. We evaluated nine decoding methods, from Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) to Graph Convolutional Neural Networks (GCN). All decoders were trained to classify single fMRI volumes into 21 experimental conditions simultaneously, using ∼7 h of fMRI data per participant. The best prediction accuracies were achieved with GCN and MLP models, whose performance (57-67 % accuracy) approached state-of-the-art accuracy (76 %) with models trained at the group level on >1 K hours of data from the original HCP sample. Our SVM model also performed very well (54-62 % accuracy). Feature importance maps derived from MLP -our best-performing model- revealed informative features in regions relevant to particular cognitive domains, notably in the motor cortex. We also observed that inter-subject classification achieved substantially lower accuracy than subject-specific models, indicating that our decoders learned individual-specific features. This work demonstrates that densely-sampled neuroimaging datasets can be used to train accurate brain decoding models at the individual level. We expect this work to become a useful benchmark for techniques that improve model generalization across multiple subjects and acquisition conditions.
Collapse
Affiliation(s)
- Shima Rastegarnia
- Université de Montréal, Montréal, QC, Canada; Centre de Recherche de L'Institut Universitaire de Gériatrie de Montréal, Montréal, Canada.
| | - Marie St-Laurent
- Centre de Recherche de L'Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
| | | | - Basile Pinsard
- Centre de Recherche de L'Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
| | - Pierre Bellec
- Université de Montréal, Montréal, QC, Canada; Centre de Recherche de L'Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
| |
Collapse
|
15
|
Morgenroth E, Vilaclara L, Muszynski M, Gaviria J, Vuilleumier P, Van De Ville D. Probing neurodynamics of experienced emotions-a Hitchhiker's guide to film fMRI. Soc Cogn Affect Neurosci 2023; 18:nsad063. [PMID: 37930850 PMCID: PMC10656947 DOI: 10.1093/scan/nsad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/04/2023] [Accepted: 11/01/2023] [Indexed: 11/08/2023] Open
Abstract
Film functional magnetic resonance imaging (fMRI) has gained tremendous popularity in many areas of neuroscience. However, affective neuroscience remains somewhat behind in embracing this approach, even though films lend themselves to study how brain function gives rise to complex, dynamic and multivariate emotions. Here, we discuss the unique capabilities of film fMRI for emotion research, while providing a general guide of conducting such research. We first give a brief overview of emotion theories as these inform important design choices. Next, we discuss films as experimental paradigms for emotion elicitation and address the process of annotating them. We then situate film fMRI in the context of other fMRI approaches, and present an overview of results from extant studies so far with regard to advantages of film fMRI. We also give an overview of state-of-the-art analysis techniques including methods that probe neurodynamics. Finally, we convey limitations of using film fMRI to study emotion. In sum, this review offers a practitioners' guide to the emerging field of film fMRI and underscores how it can advance affective neuroscience.
Collapse
Affiliation(s)
- Elenor Morgenroth
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Geneva 1202, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva 1202, Switzerland
- Swiss Center for Affective Sciences, University of Geneva, Geneva 1202, Switzerland
| | - Laura Vilaclara
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Geneva 1202, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva 1202, Switzerland
| | - Michal Muszynski
- Department of Basic Neurosciences, University of Geneva, Geneva 1202, Switzerland
| | - Julian Gaviria
- Swiss Center for Affective Sciences, University of Geneva, Geneva 1202, Switzerland
- Department of Basic Neurosciences, University of Geneva, Geneva 1202, Switzerland
- Department of Psychiatry, University of Geneva, Geneva 1202, Switzerland
| | - Patrik Vuilleumier
- Swiss Center for Affective Sciences, University of Geneva, Geneva 1202, Switzerland
- Department of Basic Neurosciences, University of Geneva, Geneva 1202, Switzerland
- CIBM Center for Biomedical Imaging, Geneva 1202, Switzerland
| | - Dimitri Van De Ville
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Geneva 1202, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva 1202, Switzerland
- CIBM Center for Biomedical Imaging, Geneva 1202, Switzerland
| |
Collapse
|
16
|
Cheng FL, Horikawa T, Majima K, Tanaka M, Abdelhack M, Aoki SC, Hirano J, Kamitani Y. Reconstructing visual illusory experiences from human brain activity. SCIENCE ADVANCES 2023; 9:eadj3906. [PMID: 37967184 PMCID: PMC10651116 DOI: 10.1126/sciadv.adj3906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 10/13/2023] [Indexed: 11/17/2023]
Abstract
Visual illusions provide valuable insights into the brain's interpretation of the world given sensory inputs. However, the precise manner in which brain activity translates into illusory experiences remains largely unknown. Here, we leverage a brain decoding technique combined with deep neural network (DNN) representations to reconstruct illusory percepts as images from brain activity. The reconstruction model was trained on natural images to establish a link between brain activity and perceptual features and then tested on two types of illusions: illusory lines and neon color spreading. Reconstructions revealed lines and colors consistent with illusory experiences, which varied across the source visual cortical areas. This framework offers a way to materialize subjective experiences, shedding light on the brain's internal representations of the world.
Collapse
Affiliation(s)
- Fan L. Cheng
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
- ATR Computational Neuroscience Laboratories, Soraku, Kyoto 619-0288, Japan
| | - Tomoyasu Horikawa
- ATR Computational Neuroscience Laboratories, Soraku, Kyoto 619-0288, Japan
| | - Kei Majima
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Misato Tanaka
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Mohamed Abdelhack
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Shuntaro C. Aoki
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Jin Hirano
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Yukiyasu Kamitani
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
- ATR Computational Neuroscience Laboratories, Soraku, Kyoto 619-0288, Japan
| |
Collapse
|
17
|
Hu Y, Yu Q. Spatiotemporal dynamics of self-generated imagery reveal a reverse cortical hierarchy from cue-induced imagery. Cell Rep 2023; 42:113242. [PMID: 37831604 DOI: 10.1016/j.celrep.2023.113242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 08/25/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open
Abstract
Visual imagery allows for the construction of rich internal experience in our mental world. However, it has remained poorly understood how imagery experience derives volitionally as opposed to being cue driven. Here, using electroencephalography and functional magnetic resonance imaging, we systematically investigate the spatiotemporal dynamics of self-generated imagery by having participants volitionally imagining one of the orientations from a learned pool. We contrast self-generated imagery with cue-induced imagery, where participants imagined line orientations based on associative cues acquired previously. Our results reveal overlapping neural signatures of cue-induced and self-generated imagery. Yet, these neural signatures display substantially differential sensitivities to the two types of imagery: self-generated imagery is supported by an enhanced involvement of the anterior cortex in representing imagery contents. By contrast, cue-induced imagery is supported by enhanced imagery representations in the posterior visual cortex. These results jointly support a reverse cortical hierarchy in generating and maintaining imagery contents in self-generated versus externally cued imagery.
Collapse
Affiliation(s)
- Yiheng Hu
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Yu
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
18
|
Pham TQ, Matsui T, Chikazoe J. Evaluation of the Hierarchical Correspondence between the Human Brain and Artificial Neural Networks: A Review. BIOLOGY 2023; 12:1330. [PMID: 37887040 PMCID: PMC10604784 DOI: 10.3390/biology12101330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/22/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023]
Abstract
Artificial neural networks (ANNs) that are heavily inspired by the human brain now achieve human-level performance across multiple task domains. ANNs have thus drawn attention in neuroscience, raising the possibility of providing a framework for understanding the information encoded in the human brain. However, the correspondence between ANNs and the brain cannot be measured directly. They differ in outputs and substrates, neurons vastly outnumber their ANN analogs (i.e., nodes), and the key algorithm responsible for most of modern ANN training (i.e., backpropagation) is likely absent from the brain. Neuroscientists have thus taken a variety of approaches to examine the similarity between the brain and ANNs at multiple levels of their information hierarchy. This review provides an overview of the currently available approaches and their limitations for evaluating brain-ANN correspondence.
Collapse
Affiliation(s)
| | - Teppei Matsui
- Graduate School of Brain Science, Doshisha University, Kyoto 610-0321, Japan
| | | |
Collapse
|
19
|
Misthos LM, Krassanakis V, Merlemis N, Kesidis AL. Modeling the Visual Landscape: A Review on Approaches, Methods and Techniques. SENSORS (BASEL, SWITZERLAND) 2023; 23:8135. [PMID: 37836966 PMCID: PMC10574952 DOI: 10.3390/s23198135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/14/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023]
Abstract
Modeling the perception and evaluation of landscapes from the human perspective is a desirable goal for several scientific domains and applications. Human vision is the dominant sense, and human eyes are the sensors for apperceiving the environmental stimuli of our surroundings. Therefore, exploring the experimental recording and measurement of the visual landscape can reveal crucial aspects about human visual perception responses while viewing the natural or man-made landscapes. Landscape evaluation (or assessment) is another dimension that refers mainly to preferences of the visual landscape, involving human cognition as well, in ways that are often unpredictable. Yet, landscape can be approached by both egocentric (i.e., human view) and exocentric (i.e., bird's eye view) perspectives. The overarching approach of this review article lies in systematically presenting the different ways for modeling and quantifying the two 'modalities' of human perception and evaluation, under the two geometric perspectives, suggesting integrative approaches on these two 'diverging' dualities. To this end, several pertinent traditions/approaches, sensor-based experimental methods and techniques (e.g., eye tracking, fMRI, and EEG), and metrics are adduced and described. Essentially, this review article acts as a 'guide-map' for the delineation of the different activities related to landscape experience and/or management and to the valid or potentially suitable types of stimuli, sensors techniques, and metrics for each activity. Throughout our work, two main research directions are identified: (1) one that attempts to transfer the visual landscape experience/management from the one perspective to the other (and vice versa); (2) another one that aims to anticipate the visual perception of different landscapes and establish connections between perceptual processes and landscape preferences. As it appears, the research in the field is rapidly growing. In our opinion, it can be greatly advanced and enriched using integrative, interdisciplinary approaches in order to better understand the concepts and the mechanisms by which the visual landscape, as a complex set of stimuli, influences visual perception, potentially leading to more elaborate outcomes such as the anticipation of landscape preferences. As an effect, such approaches can support a rigorous, evidence-based, and socially just framework towards landscape management, protection, and decision making, based on a wide spectrum of well-suited and advanced sensor-based technologies.
Collapse
Affiliation(s)
- Loukas-Moysis Misthos
- Department of Surveying and Geoinformatics Engineering, University of West Attica, GR-12243 Athens, Greece; (L.-M.M.); (V.K.); (N.M.)
- Department of Public and One Health, University of Thessaly, GR-43100 Karditsa, Greece
| | - Vassilios Krassanakis
- Department of Surveying and Geoinformatics Engineering, University of West Attica, GR-12243 Athens, Greece; (L.-M.M.); (V.K.); (N.M.)
| | - Nikolaos Merlemis
- Department of Surveying and Geoinformatics Engineering, University of West Attica, GR-12243 Athens, Greece; (L.-M.M.); (V.K.); (N.M.)
| | - Anastasios L. Kesidis
- Department of Surveying and Geoinformatics Engineering, University of West Attica, GR-12243 Athens, Greece; (L.-M.M.); (V.K.); (N.M.)
| |
Collapse
|
20
|
Meng L, Yang C. Dual-Guided Brain Diffusion Model: Natural Image Reconstruction from Human Visual Stimulus fMRI. Bioengineering (Basel) 2023; 10:1117. [PMID: 37892847 PMCID: PMC10604156 DOI: 10.3390/bioengineering10101117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/29/2023] Open
Abstract
The reconstruction of visual stimuli from fMRI signals, which record brain activity, is a challenging task with crucial research value in the fields of neuroscience and machine learning. Previous studies tend to emphasize reconstructing pixel-level features (contours, colors, etc.) or semantic features (object category) of the stimulus image, but typically, these properties are not reconstructed together. In this context, we introduce a novel three-stage visual reconstruction approach called the Dual-guided Brain Diffusion Model (DBDM). Initially, we employ the Very Deep Variational Autoencoder (VDVAE) to reconstruct a coarse image from fMRI data, capturing the underlying details of the original image. Subsequently, the Bootstrapping Language-Image Pre-training (BLIP) model is utilized to provide a semantic annotation for each image. Finally, the image-to-image generation pipeline of the Versatile Diffusion (VD) model is utilized to recover natural images from the fMRI patterns guided by both visual and semantic information. The experimental results demonstrate that DBDM surpasses previous approaches in both qualitative and quantitative comparisons. In particular, the best performance is achieved by DBDM in reconstructing the semantic details of the original image; the Inception, CLIP and SwAV distances are 0.611, 0.225 and 0.405, respectively. This confirms the efficacy of our model and its potential to advance visual decoding research.
Collapse
Affiliation(s)
- Lu Meng
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
| | | |
Collapse
|
21
|
Ozcelik F, VanRullen R. Natural scene reconstruction from fMRI signals using generative latent diffusion. Sci Rep 2023; 13:15666. [PMID: 37731047 PMCID: PMC10511448 DOI: 10.1038/s41598-023-42891-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 09/15/2023] [Indexed: 09/22/2023] Open
Abstract
In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called "Brain-Diffuser". In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling "ROI-optimal" scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
Collapse
Affiliation(s)
- Furkan Ozcelik
- CerCo, CNRS UMR5549, Toulouse, France.
- Universite de Toulouse, Toulouse, France.
| | - Rufin VanRullen
- CerCo, CNRS UMR5549, Toulouse, France
- Universite de Toulouse, Toulouse, France
- ANITI, Toulouse, France
| |
Collapse
|
22
|
Li S, Zeng X, Shao Z, Yu Q. Neural Representations in Visual and Parietal Cortex Differentiate between Imagined, Perceived, and Illusory Experiences. J Neurosci 2023; 43:6508-6524. [PMID: 37582626 PMCID: PMC10513072 DOI: 10.1523/jneurosci.0592-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/09/2023] [Accepted: 08/04/2023] [Indexed: 08/17/2023] Open
Abstract
Humans constantly receive massive amounts of information, both perceived from the external environment and imagined from the internal world. To function properly, the brain needs to correctly identify the origin of information being processed. Recent work has suggested common neural substrates for perception and imagery. However, it has remained unclear how the brain differentiates between external and internal experiences with shared neural codes. Here we tested this question in human participants (male and female) by systematically investigating the neural processes underlying the generation and maintenance of visual information from voluntary imagery, veridical perception, and illusion. The inclusion of illusion allowed us to differentiate between objective and subjective internality: while illusion has an objectively internal origin and can be viewed as involuntary imagery, it is also subjectively perceived as having an external origin like perception. Combining fMRI, eye-tracking, multivariate decoding, and encoding approaches, we observed superior orientation representations in parietal cortex during imagery compared with perception, and conversely in early visual cortex. This imagery dominance gradually developed along a posterior-to-anterior cortical hierarchy from early visual to parietal cortex, emerged in the early epoch of imagery and sustained into the delay epoch, and persisted across varied imagined contents. Moreover, representational strength of illusion was more comparable to imagery in early visual cortex, but more comparable to perception in parietal cortex, suggesting content-specific representations in parietal cortex differentiate between subjectively internal and external experiences, as opposed to early visual cortex. These findings together support a domain-general engagement of parietal cortex in internally generated experience.SIGNIFICANCE STATEMENT How does the brain differentiate between imagined and perceived experiences? Combining fMRI, eye-tracking, multivariate decoding, and encoding approaches, the current study revealed enhanced stimulus-specific representations in visual imagery originating from parietal cortex, supporting the subjective experience of imagery. This neural principle was further validated by evidence from visual illusion, wherein illusion resembled perception and imagery at different levels of cortical hierarchy. Our findings provide direct evidence for the critical role of parietal cortex as a domain-general region for content-specific imagery, and offer new insights into the neural mechanisms underlying the differentiation between subjectively internal and external experiences.
Collapse
Affiliation(s)
- Siyi Li
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xuemei Zeng
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhujun Shao
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Yu
- Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
23
|
Du C, Fu K, Li J, He H. Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:10760-10777. [PMID: 37030711 DOI: 10.1109/tpami.2023.3263181] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired (stimuli-responses) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli.
Collapse
|
24
|
Ren Z, Li J, Xue X, Li X, Yang F, Jiao Z, Gao X. Reconstructing controllable faces from brain activity with hierarchical multiview representations. Neural Netw 2023; 166:487-500. [PMID: 37574622 DOI: 10.1016/j.neunet.2023.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 05/21/2023] [Accepted: 07/12/2023] [Indexed: 08/15/2023]
Abstract
Reconstructing visual experience from brain responses measured by functional magnetic resonance imaging (fMRI) is a challenging yet important research topic in brain decoding, especially it has proved more difficult to decode visually similar stimuli, such as faces. Although face attributes are known as the key to face recognition, most existing methods generally ignore how to decode facial attributes more precisely in perceived face reconstruction, which often leads to indistinguishable reconstructed faces. To solve this problem, we propose a novel neural decoding framework called VSPnet (voxel2style2pixel) by establishing hierarchical encoding and decoding networks with disentangled latent representations as media, so that to recover visual stimuli more elaborately. And we design a hierarchical visual encoder (named HVE) to pre-extract features containing both high-level semantic knowledge and low-level visual details from stimuli. The proposed VSPnet consists of two networks: Multi-branch cognitive encoder and style-based image generator. The encoder network is constructed by multiple linear regression branches to map brain signals to the latent space provided by the pre-extracted visual features and obtain representations containing hierarchical information consistent to the corresponding stimuli. We make the generator network inspired by StyleGAN to untangle the complexity of fMRI representations and generate images. And the HVE network is composed of a standard feature pyramid over a ResNet backbone. Extensive experimental results on the latest public datasets have demonstrated the reconstruction accuracy of our proposed method outperforms the state-of-the-art approaches and the identifiability of different reconstructed faces has been greatly improved. In particular, we achieve feature editing for several facial attributes in fMRI domain based on the multiview (i.e., visual stimuli and evoked fMRI) latent representations.
Collapse
Affiliation(s)
- Ziqi Ren
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Jie Li
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Xuetong Xue
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Xin Li
- Group 42 (G42), Abu Dhabi, United Arab Emirates
| | - Fan Yang
- Group 42 (G42), Abu Dhabi, United Arab Emirates
| | - Zhicheng Jiao
- The Warren Alpert Medical School, Brown University, RI, USA; Department of Diagnostic Imaging, Rhode Island Hospital, RI, USA
| | - Xinbo Gao
- School of Electronic Engineering, Xidian University, Xi'an 710071, China.
| |
Collapse
|
25
|
Zhao Y, Chen Y, Cheng K, Huang W. Artificial intelligence based multimodal language decoding from brain activity: A review. Brain Res Bull 2023; 201:110713. [PMID: 37487829 DOI: 10.1016/j.brainresbull.2023.110713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/26/2023] [Accepted: 07/20/2023] [Indexed: 07/26/2023]
Abstract
Decoding brain activity is conducive to the breakthrough of brain-computer interface (BCI) technology. The development of artificial intelligence (AI) continually promotes the progress of brain language decoding technology. Existent research has mainly focused on a single modality and paid insufficient attention to AI methods. Therefore, our objective is to provide an overview of relevant decoding research from the perspective of different modalities and methodologies. The modalities involve text, speech, image, and video, whereas the core method is using AI-built decoders to translate brain signals induced by multimodal stimuli into text or vocal language. The semantic information of brain activity can be successfully decoded into a language at various levels, ranging from words through sentences to discourses. However, the decoding effect is affected by various factors, such as the decoding model, vector representation model, and brain regions. Challenges and future directions are also discussed. The advances in brain language decoding and BCI technology will potentially assist patients with clinical aphasia in regaining the ability to communicate.
Collapse
Affiliation(s)
- Yuhao Zhao
- College of Language Intelligence, Sichuan International Studies University, Chongqing 400031, PR China
| | - Yu Chen
- Technical College for the Deaf, Tianjin University of Technology, Tianjin 300384, PR China
| | - Kaiwen Cheng
- College of Language Intelligence, Sichuan International Studies University, Chongqing 400031, PR China.
| | - Wei Huang
- Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 611731, PR China.
| |
Collapse
|
26
|
Miao HY, Tong F. Convolutional neural network models of neuronal responses in macaque V1 reveal limited non-linear processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.26.554952. [PMID: 37693397 PMCID: PMC10491131 DOI: 10.1101/2023.08.26.554952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple non-linearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more non-linear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower-layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven non-linear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although VGG-19's predictive accuracy was somewhat better than standard AlexNet, we found that a modified version of AlexNet could match VGG-19's performance after only a few non-linear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for non-linear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few non-linear processing stages.
Collapse
Affiliation(s)
- Hui-Yuan Miao
- Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
| | - Frank Tong
- Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA
- Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, 37240, USA
| |
Collapse
|
27
|
Wang C, Yan H, Huang W, Sheng W, Wang Y, Fan YS, Liu T, Zou T, Li R, Chen H. Neural encoding with unsupervised spiking convolutional neural network. Commun Biol 2023; 6:880. [PMID: 37640808 PMCID: PMC10462614 DOI: 10.1038/s42003-023-05257-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 08/18/2023] [Indexed: 08/31/2023] Open
Abstract
Accurately predicting the brain responses to various stimuli poses a significant challenge in neuroscience. Despite recent breakthroughs in neural encoding using convolutional neural networks (CNNs) in fMRI studies, there remain critical gaps between the computational rules of traditional artificial neurons and real biological neurons. To address this issue, a spiking CNN (SCNN)-based framework is presented in this study to achieve neural encoding in a more biologically plausible manner. The framework utilizes unsupervised SCNN to extract visual features of image stimuli and employs a receptive field-based regression algorithm to predict fMRI responses from the SCNN features. Experimental results on handwritten characters, handwritten digits and natural images demonstrate that the proposed approach can achieve remarkably good encoding performance and can be utilized for "brain reading" tasks such as image reconstruction and identification. This work suggests that SNN can serve as a promising tool for neural encoding.
Collapse
Affiliation(s)
- Chong Wang
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hongmei Yan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Wei Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Wei Sheng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuting Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yun-Shuang Fan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Tao Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Ting Zou
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Rong Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Huafu Chen
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 611731, China.
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- MOE Key Lab for Neuroinformation; High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| |
Collapse
|
28
|
Schütt HH, Kipnis AD, Diedrichsen J, Kriegeskorte N. Statistical inference on representational geometries. eLife 2023; 12:e82566. [PMID: 37610302 PMCID: PMC10446828 DOI: 10.7554/elife.82566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 08/07/2023] [Indexed: 08/24/2023] Open
Abstract
Neuroscience has recently made much progress, expanding the complexity of both neural activity measurements and brain-computational models. However, we lack robust methods for connecting theory and experiment by evaluating our new big models with our new big data. Here, we introduce new inference methods enabling researchers to evaluate and compare models based on the accuracy of their predictions of representational geometries: A good model should accurately predict the distances among the neural population representations (e.g. of a set of stimuli). Our inference methods combine novel 2-factor extensions of crossvalidation (to prevent overfitting to either subjects or conditions from inflating our estimates of model accuracy) and bootstrapping (to enable inferential model comparison with simultaneous generalization to both new subjects and new conditions). We validate the inference methods on data where the ground-truth model is known, by simulating data with deep neural networks and by resampling of calcium-imaging and functional MRI data. Results demonstrate that the methods are valid and conclusions generalize correctly. These data analysis methods are available in an open-source Python toolbox (rsatoolbox.readthedocs.io).
Collapse
Affiliation(s)
- Heiko H Schütt
- Zuckerman Institute, Columbia UniversityNew YorkUnited States
| | | | | | | |
Collapse
|
29
|
Akamatsu Y, Maeda K, Ogawa T, Haseyama M. Zero-Shot Neural Decoding with Semi-Supervised Multi-View Embedding. SENSORS (BASEL, SWITZERLAND) 2023; 23:6903. [PMID: 37571685 PMCID: PMC10422201 DOI: 10.3390/s23156903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 07/20/2023] [Accepted: 07/31/2023] [Indexed: 08/13/2023]
Abstract
Zero-shot neural decoding aims to decode image categories, which were not previously trained, from functional magnetic resonance imaging (fMRI) activity evoked when a person views images. However, having insufficient training data due to the difficulty in collecting fMRI data causes poor generalization capability. Thus, models suffer from the projection domain shift problem when novel target categories are decoded. In this paper, we propose a zero-shot neural decoding approach with semi-supervised multi-view embedding. We introduce the semi-supervised approach that utilizes additional images related to the target categories without fMRI activity patterns. Furthermore, we project fMRI activity patterns into a multi-view embedding space, i.e., visual and semantic feature spaces of viewed images to effectively exploit the complementary information. We define several source and target groups whose image categories are very different and verify the zero-shot neural decoding performance. The experimental results demonstrate that the proposed approach rectifies the projection domain shift problem and outperforms existing methods.
Collapse
Affiliation(s)
- Yusuke Akamatsu
- Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Keisuke Maeda
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| |
Collapse
|
30
|
Bernáez Timón L, Ekelmans P, Kraynyukova N, Rose T, Busse L, Tchumatchenko T. How to incorporate biological insights into network models and why it matters. J Physiol 2023; 601:3037-3053. [PMID: 36069408 DOI: 10.1113/jp282755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/24/2022] [Indexed: 11/08/2022] Open
Abstract
Due to the staggering complexity of the brain and its neural circuitry, neuroscientists rely on the analysis of mathematical models to elucidate its function. From Hodgkin and Huxley's detailed description of the action potential in 1952 to today, new theories and increasing computational power have opened up novel avenues to study how neural circuits implement the computations that underlie behaviour. Computational neuroscientists have developed many models of neural circuits that differ in complexity, biological realism or emergent network properties. With recent advances in experimental techniques for detailed anatomical reconstructions or large-scale activity recordings, rich biological data have become more available. The challenge when building network models is to reflect experimental results, either through a high level of detail or by finding an appropriate level of abstraction. Meanwhile, machine learning has facilitated the development of artificial neural networks, which are trained to perform specific tasks. While they have proven successful at achieving task-oriented behaviour, they are often abstract constructs that differ in many features from the physiology of brain circuits. Thus, it is unclear whether the mechanisms underlying computation in biological circuits can be investigated by analysing artificial networks that accomplish the same function but differ in their mechanisms. Here, we argue that building biologically realistic network models is crucial to establishing causal relationships between neurons, synapses, circuits and behaviour. More specifically, we advocate for network models that consider the connectivity structure and the recorded activity dynamics while evaluating task performance.
Collapse
Affiliation(s)
- Laura Bernáez Timón
- Institute for Physiological Chemistry, University of Mainz Medical Center, Mainz, Germany
| | - Pierre Ekelmans
- Frankfurt Institute for Advanced Studies, Frankfurt, Germany
| | - Nataliya Kraynyukova
- Institute of Experimental Epileptology and Cognition Research, University of Bonn Medical Center, Bonn, Germany
| | - Tobias Rose
- Institute of Experimental Epileptology and Cognition Research, University of Bonn Medical Center, Bonn, Germany
| | - Laura Busse
- Division of Neurobiology, Faculty of Biology, LMU Munich, Munich, Germany
- Bernstein Center for Computational Neuroscience, Munich, Germany
| | - Tatjana Tchumatchenko
- Institute for Physiological Chemistry, University of Mainz Medical Center, Mainz, Germany
- Institute of Experimental Epileptology and Cognition Research, University of Bonn Medical Center, Bonn, Germany
| |
Collapse
|
31
|
Jang H, Tong F. Improved modeling of human vision by incorporating robustness to blur in convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.29.551089. [PMID: 37577646 PMCID: PMC10418076 DOI: 10.1101/2023.07.29.551089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Whenever a visual scene is cast onto the retina, much of it will appear degraded due to poor resolution in the periphery; moreover, optical defocus can cause blur in central vision. However, the pervasiveness of blurry or degraded input is typically overlooked in the training of convolutional neural networks (CNNs). We hypothesized that the absence of blurry training inputs may cause CNNs to rely excessively on high spatial frequency information for object recognition, thereby causing systematic deviations from biological vision. We evaluated this hypothesis by comparing standard CNNs with CNNs trained on a combination of clear and blurry images. We show that blur-trained CNNs outperform standard CNNs at predicting neural responses to objects across a variety of viewing conditions. Moreover, blur-trained CNNs acquire increased sensitivity to shape information and greater robustness to multiple forms of visual noise, leading to improved correspondence with human perception. Our results provide novel neurocomputational evidence that blurry visual experiences are very important for conferring robustness to biological visual systems.
Collapse
Affiliation(s)
- Hojin Jang
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| | - Frank Tong
- Department of Psychology and Vanderbilt Vision Research Center Vanderbilt University
| |
Collapse
|
32
|
Simistira Liwicki F, Gupta V, Saini R, De K, Abid N, Rakesh S, Wellington S, Wilson H, Liwicki M, Eriksson J. Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition. Sci Data 2023; 10:378. [PMID: 37311807 DOI: 10.1038/s41597-023-02286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
The recognition of inner speech, which could give a 'voice' to patients that have no ability to speak or move, is a challenge for brain-computer interfaces (BCIs). A shortcoming of the available datasets is that they do not combine modalities to increase the performance of inner speech recognition. Multimodal datasets of brain data enable the fusion of neuroimaging modalities with complimentary properties, such as the high spatial resolution of functional magnetic resonance imaging (fMRI) and the temporal resolution of electroencephalography (EEG), and therefore are promising for decoding inner speech. This paper presents the first publicly available bimodal dataset containing EEG and fMRI data acquired nonsimultaneously during inner-speech production. Data were obtained from four healthy, right-handed participants during an inner-speech task with words in either a social or numerical category. Each of the 8-word stimuli were assessed with 40 trials, resulting in 320 trials in each modality for each participant. The aim of this work is to provide a publicly available bimodal dataset on inner speech, contributing towards speech prostheses.
Collapse
Affiliation(s)
- Foteini Simistira Liwicki
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden.
| | - Vibha Gupta
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Rajkumar Saini
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Kanjar De
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Nosheen Abid
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Sumit Rakesh
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | | | - Holly Wilson
- University of Bath, Department of Computer Science, Bath, UK
| | - Marcus Liwicki
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Johan Eriksson
- Umeå University, Department of Integrative Medical Biology (IMB) and Umeå Center for Functional Brain Imaging (UFBI), Umeå, Sweden
| |
Collapse
|
33
|
Akamatsu K, Nishino T, Miyawaki Y. Spatiotemporal bias of the human gaze toward hierarchical visual features during natural scene viewing. Sci Rep 2023; 13:8104. [PMID: 37202449 DOI: 10.1038/s41598-023-34829-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 05/09/2023] [Indexed: 05/20/2023] Open
Abstract
The human gaze is directed at various locations from moment to moment in acquiring information necessary to recognize the external environment at the fine resolution of foveal vision. Previous studies showed that the human gaze is attracted to particular locations in the visual field at a particular time, but it remains unclear what visual features produce such spatiotemporal bias. In this study, we used a deep convolutional neural network model to extract hierarchical visual features from natural scene images and evaluated how much the human gaze is attracted to the visual features in space and time. Eye movement measurement and visual feature analysis using the deep convolutional neural network model showed that the gaze was more strongly attracted to spatial locations containing higher-order visual features than to locations containing lower-order visual features or to locations predicted by conventional saliency. Analysis of the time course of gaze attraction revealed that the bias to higher-order visual features was prominent within a short period after the beginning of observation of the natural scene images. These results demonstrate that higher-order visual features are a strong gaze attractor in both space and time, suggesting that the human visual system uses foveal vision resources to extract information from higher-order visual features with higher spatiotemporal priority.
Collapse
Affiliation(s)
- Kazuaki Akamatsu
- Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan
| | - Tomohiro Nishino
- Faculty of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan
| | - Yoichi Miyawaki
- Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan.
- Center for Neuroscience and Biomedical Engineering (CNBE), The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan.
| |
Collapse
|
34
|
Watanabe N, Miyoshi K, Jimura K, Shimane D, Keerativittayayut R, Nakahara K, Takeda M. Multimodal deep neural decoding reveals highly resolved spatiotemporal profile of visual object representation in humans. Neuroimage 2023; 275:120164. [PMID: 37169115 DOI: 10.1016/j.neuroimage.2023.120164] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 05/02/2023] [Accepted: 05/09/2023] [Indexed: 05/13/2023] Open
Abstract
Perception and categorization of objects in a visual scene are essential to grasp the surrounding situation. Recently, neural decoding schemes, such as machine learning in functional magnetic resonance imaging (fMRI), has been employed to elucidate the underlying neural mechanisms. However, it remains unclear as to how spatially distributed brain regions temporally represent visual object categories and sub-categories. One promising strategy to address this issue is neural decoding with concurrently obtained neural response data of high spatial and temporal resolution. In this study, we explored the spatial and temporal organization of visual object representations using concurrent fMRI and electroencephalography (EEG), combined with neural decoding using deep neural networks (DNNs). We hypothesized that neural decoding by multimodal neural data with DNN would show high classification performance in visual object categorization (faces or non-face objects) and sub-categorization within faces and objects. Visualization of the fMRI DNN was more sensitive than that in the univariate approach and revealed that visual categorization occurred in brain-wide regions. Interestingly, the EEG DNN valued the earlier phase of neural responses for categorization and the later phase of neural responses for sub-categorization. Combination of the two DNNs improved the classification performance for both categorization and sub-categorization compared with fMRI DNN or EEG DNN alone. These deep learning-based results demonstrate a categorization principle in which visual objects are represented in a spatially organized and coarse-to-fine manner, and provide strong evidence of the ability of multimodal deep learning to uncover spatiotemporal neural machinery in sensory processing.
Collapse
Affiliation(s)
- Noriya Watanabe
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Kosuke Miyoshi
- Narrative Nights, Inc., Yokohama, Kanagawa, 236-0011, Japan
| | - Koji Jimura
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Department of Informatics, Gunma University, Maebashi, Gunma, 371-8510, Japan
| | - Daisuke Shimane
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Ruedeerat Keerativittayayut
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Chulabhorn Royal Academy, Bangkok, 10210, Thailand
| | - Kiyoshi Nakahara
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Masaki Takeda
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan.
| |
Collapse
|
35
|
Ho JK, Horikawa T, Majima K, Cheng F, Kamitani Y. Inter-individual deep image reconstruction via hierarchical neural code conversion. Neuroimage 2023; 271:120007. [PMID: 36914105 DOI: 10.1016/j.neuroimage.2023.120007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/26/2023] [Accepted: 03/07/2023] [Indexed: 03/13/2023] Open
Abstract
The sensory cortex is characterized by general organizational principles such as topography and hierarchy. However, measured brain activity given identical input exhibits substantially different patterns across individuals. Although anatomical and functional alignment methods have been proposed in functional magnetic resonance imaging (fMRI) studies, it remains unclear whether and how hierarchical and fine-grained representations can be converted between individuals while preserving the encoded perceptual content. In this study, we trained a method of functional alignment called neural code converter that predicts a target subject's brain activity pattern from a source subject given the same stimulus, and analyzed the converted patterns by decoding hierarchical visual features and reconstructing perceived images. The converters were trained on fMRI responses to identical sets of natural images presented to pairs of individuals, using the voxels on the visual cortex that covers from V1 through the ventral object areas without explicit labels of the visual areas. We decoded the converted brain activity patterns into the hierarchical visual features of a deep neural network using decoders pre-trained on the target subject and then reconstructed images via the decoded features. Without explicit information about the visual cortical hierarchy, the converters automatically learned the correspondence between visual areas of the same levels. Deep neural network feature decoding at each layer showed higher decoding accuracies from corresponding levels of visual areas, indicating that hierarchical representations were preserved after conversion. The visual images were reconstructed with recognizable silhouettes of objects even with relatively small numbers of data for converter training. The decoders trained on pooled data from multiple individuals through conversions led to a slight improvement over those trained on a single individual. These results demonstrate that the hierarchical and fine-grained representation can be converted by functional alignment, while preserving sufficient visual information to enable inter-individual visual image reconstruction.
Collapse
Affiliation(s)
- Jun Kai Ho
- Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Tomoyasu Horikawa
- Department of Neuroinformatics, ATR Computational Neuroscience Laboratories, Hikaridai, Seika, Soraku, Kyoto, 619-0288, Japan
| | - Kei Majima
- Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Fan Cheng
- Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan; Department of Neuroinformatics, ATR Computational Neuroscience Laboratories, Hikaridai, Seika, Soraku, Kyoto, 619-0288, Japan
| | - Yukiyasu Kamitani
- Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan; Department of Neuroinformatics, ATR Computational Neuroscience Laboratories, Hikaridai, Seika, Soraku, Kyoto, 619-0288, Japan.
| |
Collapse
|
36
|
Nakai T, Nishimoto S. Artificial neural network modelling of the neural population code underlying mathematical operations. Neuroimage 2023; 270:119980. [PMID: 36848969 DOI: 10.1016/j.neuroimage.2023.119980] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 02/10/2023] [Accepted: 02/23/2023] [Indexed: 02/28/2023] Open
Abstract
Mathematical operations have long been regarded as a sparse, symbolic process in neuroimaging studies. In contrast, advances in artificial neural networks (ANN) have enabled extracting distributed representations of mathematical operations. Recent neuroimaging studies have compared distributed representations of the visual, auditory and language domains in ANNs and biological neural networks (BNNs). However, such a relationship has not yet been examined in mathematics. Here we hypothesise that ANN-based distributed representations can explain brain activity patterns of symbolic mathematical operations. We used the fMRI data of a series of mathematical problems with nine different combinations of operators to construct voxel-wise encoding/decoding models using both sparse operator and latent ANN features. Representational similarity analysis demonstrated shared representations between ANN and BNN, an effect particularly evident in the intraparietal sulcus. Feature-brain similarity (FBS) analysis served to reconstruct a sparse representation of mathematical operations based on distributed ANN features in each cortical voxel. Such reconstruction was more efficient when using features from deeper ANN layers. Moreover, latent ANN features allowed the decoding of novel operators not used during model training from brain activity. The current study provides novel insights into the neural code underlying mathematical thought.
Collapse
Affiliation(s)
- Tomoya Nakai
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Lyon Neuroscience Research Center (CRNL), INSERM U1028 - CNRS UMR5292, University of Lyon, Bron, France.
| | - Shinji Nishimoto
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Graduate School of Frontier Biosciences, Osaka University, Suita, Japan; Graduate School of Medicine, Osaka University, Suita, Japan
| |
Collapse
|
37
|
Hebart MN, Contier O, Teichmann L, Rockter AH, Zheng CY, Kidder A, Corriveau A, Vaziri-Pashkam M, Baker CI. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 2023; 12:e82580. [PMID: 36847339 PMCID: PMC10038662 DOI: 10.7554/elife.82580] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 02/25/2023] [Indexed: 03/01/2023] Open
Abstract
Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience.
Collapse
Affiliation(s)
- Martin N Hebart
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Department of Medicine, Justus Liebig University GiessenGiessenGermany
| | - Oliver Contier
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Max Planck School of Cognition, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Lina Teichmann
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Adam H Rockter
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Charles Y Zheng
- Machine Learning Core, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Alexis Kidder
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Anna Corriveau
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
38
|
Olson JA, Cyr M, Artenie DZ, Strandberg T, Hall L, Tompkins ML, Raz A, Johansson P. Emulating future neurotechnology using magic. Conscious Cogn 2023; 107:103450. [PMID: 36566673 DOI: 10.1016/j.concog.2022.103450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 11/24/2022] [Accepted: 11/25/2022] [Indexed: 12/24/2022]
Abstract
Recent developments in neuroscience and artificial intelligence have allowed machines to decode mental processes with growing accuracy. Neuroethicists have speculated that perfecting these technologies may result in reactions ranging from an invasion of privacy to an increase in self-understanding. Yet, evaluating these predictions is difficult given that people are poor at forecasting their reactions. To address this, we developed a paradigm using elements of performance magic to emulate future neurotechnologies. We led 59 participants to believe that a (sham) neurotechnological machine could infer their preferences, detect their errors, and reveal their deep-seated attitudes. The machine gave participants randomly assigned positive or negative feedback about their brain's supposed attitudes towards charity. Around 80% of participants in both groups provided rationalisations for this feedback, which shifted their attitudes in the manipulated direction but did not influence donation behaviour. Our paradigm reveals how people may respond to prospective neurotechnologies, which may inform neuroethical frameworks.
Collapse
Affiliation(s)
- Jay A Olson
- Department of Psychology, McGill University, 2001 McGill College Ave., Montreal, QC H3A 1G1, Canada.
| | - Mariève Cyr
- Faculty of Medicine and Health Sciences, McGill University, 3605 De la Montagne St., Montreal, QC H3G 2M1, Canada
| | - Despina Z Artenie
- Department of Psychology, Université du Québec à Montréal, 100 Sherbrooke St. W., Montreal, QC H2X 3P2, Canada
| | - Thomas Strandberg
- Lund University Cognitive Science, Lund University, Box 192, S-221 00, Lund, Sweden
| | - Lars Hall
- Lund University Cognitive Science, Lund University, Box 192, S-221 00, Lund, Sweden
| | - Matthew L Tompkins
- Lund University Cognitive Science, Lund University, Box 192, S-221 00, Lund, Sweden
| | - Amir Raz
- Institute for Interdisciplinary Behavioral and Brain Sciences, Chapman University, 9401 Jeronimo Road, Irvine, CA 92618, USA
| | - Petter Johansson
- Lund University Cognitive Science, Lund University, Box 192, S-221 00, Lund, Sweden.
| |
Collapse
|
39
|
Gifford AT, Dwivedi K, Roig G, Cichy RM. A large and rich EEG dataset for modeling human visual object recognition. Neuroimage 2022; 264:119754. [PMID: 36400378 PMCID: PMC9771828 DOI: 10.1016/j.neuroimage.2022.119754] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 09/14/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
The human brain achieves visual object recognition through multiple stages of linear and nonlinear transformations operating at a millisecond scale. To predict and explain these rapid transformations, computational neuroscientists employ machine learning modeling techniques. However, state-of-the-art models require massive amounts of data to properly train, and to the present day there is a lack of vast brain datasets which extensively sample the temporal dynamics of visual object recognition. Here we collected a large and rich dataset of high temporal resolution EEG responses to images of objects on a natural background. This dataset includes 10 participants, each with 82,160 trials spanning 16,740 image conditions. Through computational modeling we established the quality of this dataset in five ways. First, we trained linearizing encoding models that successfully synthesized the EEG responses to arbitrary images. Second, we correctly identified the recorded EEG data image conditions in a zero-shot fashion, using EEG synthesized responses to hundreds of thousands of candidate image conditions. Third, we show that both the high number of conditions as well as the trial repetitions of the EEG dataset contribute to the trained models' prediction accuracy. Fourth, we built encoding models whose predictions well generalize to novel participants. Fifth, we demonstrate full end-to-end training of randomly initialized DNNs that output EEG responses for arbitrary input images. We release this dataset as a tool to foster research in visual neuroscience and computer vision.
Collapse
Affiliation(s)
- Alessandro T Gifford
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany.
| | - Kshitij Dwivedi
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Gemma Roig
- Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
40
|
Ma S, Wang L, Chen P, Qin R, Hou L, Yan B. A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity. Brain Sci 2022; 12:brainsci12121633. [PMID: 36552093 PMCID: PMC9775903 DOI: 10.3390/brainsci12121633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/18/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
Research on visual encoding models for functional magnetic resonance imaging derived from deep neural networks, especially CNN (e.g., VGG16), has been developed. However, CNNs typically use smaller kernel sizes (e.g., 3 × 3) for feature extraction in visual encoding models. Although the receptive field size of CNN can be enlarged by increasing the network depth or subsampling, it is limited by the small size of the convolution kernel, leading to an insufficient receptive field size. In biological research, the size of the neuronal population receptive field of high-level visual encoding regions is usually three to four times that of low-level visual encoding regions. Thus, CNNs with a larger receptive field size align with the biological findings. The RepLKNet model directly expands the convolution kernel size to obtain a larger-scale receptive field. Therefore, this paper proposes a mixed model to replace CNN for feature extraction in visual encoding models. The proposed model mixes RepLKNet and VGG so that the mixed model has a receptive field of different sizes to extract more feature information from the image. The experimental results indicate that the mixed model achieves better encoding performance in multiple regions of the visual cortex than the traditional convolutional model. Also, a larger-scale receptive field should be considered in building visual encoding models so that the convolution network can play a more significant role in visual representations.
Collapse
|
41
|
Prince JS, Charest I, Kurzawski JW, Pyles JA, Tarr MJ, Kay KN. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 2022; 11:77599. [PMID: 36444984 PMCID: PMC9708069 DOI: 10.7554/elife.77599] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 10/15/2022] [Indexed: 11/30/2022] Open
Abstract
Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle, a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses (glmsingle.org). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions.
Collapse
Affiliation(s)
- Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, United States
| | - Ian Charest
- Center for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, United Kingdom.,cerebrUM, Département de Psychologie, Université de Montréal, Montréal, Canada
| | - Jan W Kurzawski
- Department of Psychology, New York University, New York, United States
| | - John A Pyles
- Center for Human Neuroscience, Department of Psychology, University of Washington, Seattle, United States
| | - Michael J Tarr
- Department of Psychology, Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Kendrick N Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, United States
| |
Collapse
|
42
|
Le L, Ambrogioni L, Seeliger K, Güçlütürk Y, van Gerven M, Güçlü U. Brain2Pix: Fully convolutional naturalistic video frame reconstruction from brain activity. Front Neurosci 2022; 16:940972. [PMID: 36452333 PMCID: PMC9703977 DOI: 10.3389/fnins.2022.940972] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 09/09/2022] [Indexed: 09/10/2024] Open
Abstract
Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here, we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance imaging data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction techniques.
Collapse
Affiliation(s)
- Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Luca Ambrogioni
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
43
|
Kamran M, Rehman SU, Meraj T, Alnowibet KA, Rauf HT. Camouflage Object Segmentation Using an Optimized Deep-Learning Approach. MATHEMATICS 2022; 10:4219. [DOI: 10.3390/math10224219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Camouflage objects hide information physically based on the feature matching of the texture or boundary line within the background. Texture matching and similarities between the camouflage objects and surrounding maps make differentiation difficult with generic and salient objects, thus making camouflage object detection (COD) more challenging. The existing techniques perform well. However, the challenging nature of camouflage objects demands more accuracy in detection and segmentation. To overcome this challenge, an optimized modular framework for COD tasks, named Optimize Global Refinement (OGR), is presented. This framework comprises a parallelism approach in feature extraction for the enhancement of learned parameters and globally refined feature maps for the abstraction of all intuitive feature sets at each extraction block’s outcome. Additionally, an optimized local best feature node-based rule is proposed to reduce the complexity of the proposed model. In light of the baseline experiments, OGR was applied and evaluated on a benchmark. The publicly available datasets were outperformed by achieving state-of-the-art structural similarity of 94%, 93%, and 96% for the Kvasir-SEG, COD10K, and Camouflaged Object (CAMO) datasets, respectively. The OGR is generalized and can be integrated into real-time applications for future development.
Collapse
|
44
|
Peters MA. Towards characterizing the canonical computations generating phenomenal experience. Neurosci Biobehav Rev 2022; 142:104903. [DOI: 10.1016/j.neubiorev.2022.104903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/01/2022] [Indexed: 10/31/2022]
|
45
|
Favila SE, Kuhl BA, Winawer J. Perception and memory have distinct spatial tuning properties in human visual cortex. Nat Commun 2022; 13:5864. [PMID: 36257949 PMCID: PMC9579130 DOI: 10.1038/s41467-022-33161-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 09/06/2022] [Indexed: 11/12/2022] Open
Abstract
Reactivation of earlier perceptual activity is thought to underlie long-term memory recall. Despite evidence for this view, it is unclear whether mnemonic activity exhibits the same tuning properties as feedforward perceptual activity. Here, we leverage population receptive field models to parameterize fMRI activity in human visual cortex during spatial memory retrieval. Though retinotopic organization is present during both perception and memory, large systematic differences in tuning are also evident. Whereas there is a three-fold decline in spatial precision from early to late visual areas during perception, this pattern is not observed during memory retrieval. This difference cannot be explained by reduced signal-to-noise or poor performance on memory trials. Instead, by simulating top-down activity in a network model of cortex, we demonstrate that this property is well explained by the hierarchical structure of the visual system. Together, modeling and empirical results suggest that computational constraints imposed by visual system architecture limit the fidelity of memory reactivation in sensory cortex.
Collapse
Affiliation(s)
- Serra E Favila
- Department of Psychology, New York University, New York, NY, 10003, USA.
- Department of Psychology, Columbia University, New York, NY, 10027, USA.
| | - Brice A Kuhl
- Department of Psychology, University of Oregon, Eugene, OR, 97403, USA
- Institute of Neuroscience, University of Oregon, Eugene, OR, 97403, USA
| | - Jonathan Winawer
- Department of Psychology, New York University, New York, NY, 10003, USA
- Center for Neural Science, New York University, New York, NY, 10003, USA
| |
Collapse
|
46
|
Meng L, Ge K. Decoding Visual fMRI Stimuli from Human Brain Based on Graph Convolutional Neural Network. Brain Sci 2022; 12:brainsci12101394. [PMID: 36291327 PMCID: PMC9599823 DOI: 10.3390/brainsci12101394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 10/12/2022] [Accepted: 10/14/2022] [Indexed: 11/21/2022] Open
Abstract
Brain decoding is to predict the external stimulus information from the collected brain response activities, and visual information is one of the most important sources of external stimulus information. Decoding functional magnetic resonance imaging (fMRI) based on visual stimulation is helpful in understanding the working mechanism of the brain visual function regions. Traditional brain decoding algorithms cannot accurately extract stimuli features from fMRI. To address these shortcomings, this paper proposed a brain decoding algorithm based on a graph convolution network (GCN). Firstly, 11 regions of interest (ROI) were selected according to the human brain visual function regions, which can avoid the noise interference of the non-visual regions of the human brain; then, a deep three-dimensional convolution neural network was specially designed to extract the features of these 11 regions; next, the GCN was used to extract the functional correlation features between the different human brain visual regions. Furthermore, to avoid the problem of gradient disappearance when there were too many layers of graph convolutional neural network, the residual connections were adopted in our algorithm, which helped to integrate different levels of features in order to improve the accuracy of the proposed GCN. The proposed algorithm was tested on the public dataset, and the recognition accuracy reached 98.67%. Compared with the other state-of-the-art algorithms, the proposed algorithm performed the best.
Collapse
|
47
|
Zhang YJ, Yu ZF, Liu JK, Huang TJ. Neural Decoding of Visual Information Across Different Neural Recording Modalities and Approaches. MACHINE INTELLIGENCE RESEARCH 2022. [PMCID: PMC9283560 DOI: 10.1007/s11633-022-1335-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Vision plays a peculiar role in intelligence. Visual information, forming a large part of the sensory information, is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents. Recent advances have led to the development of brain-inspired algorithms and models for machine vision. One of the key components of these methods is the utilization of the computational principles underlying biological neurons. Additionally, advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information. Thus, there is a high demand for mapping out functional models for reading out visual information from neural signals. Here, we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals, from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography (EEG) and functional magnetic resonance imaging recordings of brain signals.
Collapse
|
48
|
Shimizu H, Srinivasan R. Improving classification and reconstruction of imagined images from EEG signals. PLoS One 2022; 17:e0274847. [PMID: 36129927 PMCID: PMC9491577 DOI: 10.1371/journal.pone.0274847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/05/2022] [Indexed: 11/19/2022] Open
Abstract
Decoding brain activity related to specific tasks, such as imagining something, is important for brain computer interface (BCI) control. While decoding of brain signals, such as functional magnetic resonance imaging (fMRI) signals and electroencephalography (EEG) signals, during observing visual images and while imagining images has been previously reported, further development of methods for improving training, performance, and interpretation of brain data was the goal of this study. We applied a Sinc-EEGNet to decode brain activity during perception and imagination of visual stimuli, and added an attention module to extract the importance of each electrode or frequency band. We also reconstructed images from brain activity by using a generative adversarial network (GAN). By combining the EEG recorded during a visual task (perception) and an imagination task, we have successfully boosted the accuracy of classifying EEG data in the imagination task and improved the quality of reconstruction by GAN. Our result indicates that the brain activity evoked during the visual task is present in the imagination task and can be used for better classification of the imagined image. By using the attention module, we can derive the spatial weights in each frequency band and contrast spatial or frequency importance between tasks from our model. Imagination tasks are classified by low frequency EEG signals over temporal cortex, while perception tasks are classified by high frequency EEG signals over occipital and frontal cortex. Combining data sets in training results in a balanced model improving classification of the imagination task without significantly changing performance in the visual task. Our approach not only improves performance and interpretability but also potentially reduces the burden on training since we can improve the accuracy of classifying a relatively hard task with high variability (imagination) by combining with the data of the relatively easy task, observing visual images.
Collapse
Affiliation(s)
- Hirokatsu Shimizu
- Department of Cognitive Sciences, University of California, Irvine, CA, United States of America
- * E-mail:
| | - Ramesh Srinivasan
- Department of Cognitive Sciences, University of California, Irvine, CA, United States of America
- Department of Biomedical Engineering, University of California, Irvine, CA, United States of America
| |
Collapse
|
49
|
van Dyck LE, Denzler SJ, Gruber WR. Guiding visual attention in deep convolutional neural networks based on human eye movements. Front Neurosci 2022; 16:975639. [PMID: 36177359 PMCID: PMC9514055 DOI: 10.3389/fnins.2022.975639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 11/13/2022] Open
Abstract
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models’ visual attention during object recognition in natural images either toward or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.
Collapse
Affiliation(s)
- Leonard Elia van Dyck
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
- *Correspondence: Leonard Elia van Dyck,
| | | | - Walter Roland Gruber
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| |
Collapse
|
50
|
Higashi T, Maeda K, Ogawa T, Haseyama M. Brain Decoding of Multiple Subjects for Estimating Visual Information Based on a Probabilistic Generative Model. SENSORS (BASEL, SWITZERLAND) 2022; 22:6148. [PMID: 36015909 PMCID: PMC9416613 DOI: 10.3390/s22166148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/10/2022] [Accepted: 08/14/2022] [Indexed: 06/15/2023]
Abstract
Brain decoding is a process of decoding human cognitive contents from brain activities. However, improving the accuracy of brain decoding remains difficult due to the unique characteristics of the brain, such as the small sample size and high dimensionality of brain activities. Therefore, this paper proposes a method that effectively uses multi-subject brain activities to improve brain decoding accuracy. Specifically, we distinguish between the shared information common to multi-subject brain activities and the individual information based on each subject's brain activities, and both types of information are used to decode human visual cognition. Both types of information are extracted as features belonging to a latent space using a probabilistic generative model. In the experiment, an publicly available dataset and five subjects were used, and the estimation accuracy was validated on the basis of a confidence score ranging from 0 to 1, and a large value indicates superiority. The proposed method achieved a confidence score of 0.867 for the best subject and an average of 0.813 for the five subjects, which was the best compared to other methods. The experimental results show that the proposed method can accurately decode visual cognition compared with other existing methods in which the shared information is not distinguished from the individual information.
Collapse
Affiliation(s)
- Takaaki Higashi
- Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Keisuke Maeda
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
| |
Collapse
|