1
|
Ranjbar A, Suratgar AA, Menhaj MB, Abbasi-Asl R. Structurally-constrained encoding framework using a multi-voxel reduced-rank latent model for human natural vision. J Neural Eng 2024; 21:046027. [PMID: 38986451 DOI: 10.1088/1741-2552/ad6184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 07/10/2024] [Indexed: 07/12/2024]
Abstract
Objective. Voxel-wise visual encoding models based on convolutional neural networks (CNNs) have emerged as one of the prominent predictive tools of human brain activity via functional magnetic resonance imaging signals. While CNN-based models imitate the hierarchical structure of the human visual cortex to generate explainable features in response to natural visual stimuli, there is still a need for a brain-inspired model to predict brain responses accurately based on biomedical data.Approach. To bridge this gap, we propose a response prediction module called the Structurally Constrained Multi-Output (SCMO) module to include homologous correlations that arise between a group of voxels in a cortical region and predict more accurate responses.Main results. This module employs all the responses across a visual area to predict individual voxel-wise BOLD responses and therefore accounts for the population activity and collective behavior of voxels. Such a module can determine the relationships within each visual region by creating a structure matrix that represents the underlying voxel-to-voxel interactions. Moreover, since each response module in visual encoding tasks relies on the image features, we conducted experiments using two different feature extraction modules to assess the predictive performance of our proposed module. Specifically, we employed a recurrent CNN that integrates both feedforward and recurrent interactions, as well as the popular AlexNet model that utilizes feedforward connections.Significance.We demonstrate that the proposed framework provides a reliable predictive ability to generate brain responses across multiple areas, outperforming benchmark models in terms of stability and coherency of features.
Collapse
Affiliation(s)
- Amin Ranjbar
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Amir Abolfazl Suratgar
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Mohammad Bagher Menhaj
- Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
- Distributed and Intelligence Optimization Research Laboratory (DIOR Lab.), Tehran, Iran
| | - Reza Abbasi-Asl
- Department of Neurology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States of America
- UCSF Weill Institute for Neurosciences, San Francisco, CA, United States of America
| |
Collapse
|
2
|
Jawed S, Faye I, Malik AS. Deep Learning-Based Assessment Model for Real-Time Identification of Visual Learners Using Raw EEG. IEEE Trans Neural Syst Rehabil Eng 2024; 32:378-390. [PMID: 38194390 DOI: 10.1109/tnsre.2024.3351694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Automatic identification of visual learning style in real time using raw electroencephalogram (EEG) is challenging. In this work, inspired by the powerful abilities of deep learning techniques, deep learning-based models are proposed to learn high-level feature representation for EEG visual learning identification. Existing computer-aided systems that use electroencephalograms and machine learning can reasonably assess learning styles. Despite their potential, offline processing is often necessary to eliminate artifacts and extract features, making these methods unsuitable for real-time applications. The dataset was chosen with 34 healthy subjects to measure their EEG signals during resting states (eyes open and eyes closed) and while performing learning tasks. The subjects displayed no prior knowledge of the animated educational content presented in video format. The paper presents an analysis of EEG signals measured during a resting state with closed eyes using three deep learning techniques: Long-term, short-term memory (LSTM), Long-term, short-term memory-convolutional neural network (LSTM-CNN), and Long-term, short-term memory-Fully convolutional neural network (LSTM-FCNN). The chosen techniques were based on their suitability for real-time applications with varying data lengths and the need for less computational time. The optimization of hypertuning parameters has enabled the identification of visual learners through the implementation of three techniques. LSTM-CNN technique has the highest average accuracy of 94%, a sensitivity of 80%, a specificity of 92%, and an F1 score of 94% when identifying the visual learning style of the student out of all three techniques. This research has shown that the most effective method is the deep learning-based LSTM-CNN technique, which accurately identifies a student's visual learning style.
Collapse
|
3
|
Pan H, Fu Y, Li Z, Wen F, Hu J, Wu B. Images Reconstruction from Functional Magnetic Resonance Imaging Patterns Based on the Improved Deep Generative Multiview Model. Neuroscience 2023; 509:103-112. [PMID: 36460220 DOI: 10.1016/j.neuroscience.2022.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 11/02/2022] [Accepted: 11/20/2022] [Indexed: 12/03/2022]
Abstract
Reconstructing visual stimulus images from the brain activity signals is an important research task in the field of brain decoding. Many methods of reconstructing visual stimulus images mainly focus on how to use deep learning to classify the brain activities measured by functional magnetic resonance imaging or identify visual stimulus images. Accurate reconstruction of visual stimulus images by using deep learning still remains challenging. This paper proposes an improved deep generative multiview model to further promote the accuracy of reconstructing visual stimulus images. Firstly, an encoder based on residual-in-residual dense blocks is designed to fit the deep and multiview visual features of human natural state, and extract the features of visual stimulus images. Secondly, the structure of original decoder is extended to a deeper network in the deep generative multiview model, which makes the features obtained by each deconvolution layer more distinguishable. Finally, we configure the parameters of the optimizer and compare the performance of various optimizers under different parameter values, and then the one with the best performance is chosen and adopted to the whole model. The performance evaluations conducted on two publicly available datasets demonstrate that the improved model has more accurate reconstruction effectiveness than the original deep generative multiview model.
Collapse
Affiliation(s)
- Hongguang Pan
- College of Electrical and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, China; Key Laboratory of Industrial Internet of Things & Networked Control, Ministry of Education, Chongqing 400065, China.
| | - Yunpeng Fu
- Key Laboratory of Industrial Internet of Things & Networked Control, Ministry of Education, Chongqing 400065, China.
| | - Zhuoyi Li
- Key Laboratory of Industrial Internet of Things & Networked Control, Ministry of Education, Chongqing 400065, China.
| | - Fan Wen
- Xingtang Telecommunication Technology Co., LTD, Datang Telecom Technology & Industry Group, Xi'an 710054, China.
| | - Jianchen Hu
- School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Bo Wu
- Sunny Science and Technology Co., LTD, Xi'an 710049, China.
| |
Collapse
|
4
|
Wang L, Hu X, Liu H, Zhao S, Guo L, Han J, Liu T. Functional Brain Networks Underlying Auditory Saliency During Naturalistic Listening Experience. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2020.3025947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
5
|
Du B, Cheng X, Duan Y, Ning H. fMRI Brain Decoding and Its Applications in Brain-Computer Interface: A Survey. Brain Sci 2022; 12:228. [PMID: 35203991 PMCID: PMC8869956 DOI: 10.3390/brainsci12020228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/29/2022] [Accepted: 01/30/2022] [Indexed: 11/25/2022] Open
Abstract
Brain neural activity decoding is an important branch of neuroscience research and a key technology for the brain-computer interface (BCI). Researchers initially developed simple linear models and machine learning algorithms to classify and recognize brain activities. With the great success of deep learning on image recognition and generation, deep neural networks (DNN) have been engaged in reconstructing visual stimuli from human brain activity via functional magnetic resonance imaging (fMRI). In this paper, we reviewed the brain activity decoding models based on machine learning and deep learning algorithms. Specifically, we focused on current brain activity decoding models with high attention: variational auto-encoder (VAE), generative confrontation network (GAN), and the graph convolutional network (GCN). Furthermore, brain neural-activity-decoding-enabled fMRI-based BCI applications in mental and psychological disease treatment are presented to illustrate the positive correlation between brain decoding and BCI. Finally, existing challenges and future research directions are addressed.
Collapse
Affiliation(s)
- Bing Du
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (B.D.); (X.C.)
| | - Xiaomu Cheng
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (B.D.); (X.C.)
| | - Yiping Duan
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
| | - Huansheng Ning
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (B.D.); (X.C.)
| |
Collapse
|
6
|
Rakhimberdina Z, Jodelet Q, Liu X, Murata T. Natural Image Reconstruction From fMRI Using Deep Learning: A Survey. Front Neurosci 2021; 15:795488. [PMID: 34987359 PMCID: PMC8722107 DOI: 10.3389/fnins.2021.795488] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/23/2021] [Indexed: 11/17/2022] Open
Abstract
With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this work, we survey the most recent deep learning methods for natural image reconstruction from fMRI. We examine these methods in terms of architectural design, benchmark datasets, and evaluation metrics and present a fair performance evaluation across standardized evaluation metrics. Finally, we discuss the strengths and limitations of existing studies and present potential future directions.
Collapse
Affiliation(s)
- Zarina Rakhimberdina
- Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
- AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan
| | - Quentin Jodelet
- Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
- AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan
| | - Xin Liu
- AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Digital Architecture Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Tsuyoshi Murata
- Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
- AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan
| |
Collapse
|
7
|
A Visual Encoding Model Based on Contrastive Self-Supervised Learning for Human Brain Activity along the Ventral Visual Stream. Brain Sci 2021; 11:brainsci11081004. [PMID: 34439623 PMCID: PMC8391143 DOI: 10.3390/brainsci11081004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 07/23/2021] [Accepted: 07/26/2021] [Indexed: 11/30/2022] Open
Abstract
Visual encoding models are important computational models for understanding how information is processed along the visual stream. Many improved visual encoding models have been developed from the perspective of the model architecture and the learning objective, but these are limited to the supervised learning method. From the view of unsupervised learning mechanisms, this paper utilized a pre-trained neural network to construct a visual encoding model based on contrastive self-supervised learning for the ventral visual stream measured by functional magnetic resonance imaging (fMRI). We first extracted features using the ResNet50 model pre-trained in contrastive self-supervised learning (ResNet50-CSL model), trained a linear regression model for each voxel, and finally calculated the prediction accuracy of different voxels. Compared with the ResNet50 model pre-trained in a supervised classification task, the ResNet50-CSL model achieved an equal or even relatively better encoding performance in multiple visual cortical areas. Moreover, the ResNet50-CSL model performs hierarchical representation of input visual stimuli, which is similar to the human visual cortex in its hierarchical information processing. Our experimental results suggest that the encoding model based on contrastive self-supervised learning is a strong computational model to compete with supervised models, and contrastive self-supervised learning proves an effective learning method to extract human brain-like representations.
Collapse
|
8
|
CUI YIBO, ZHANG CHI, WANG LINYUAN, YAN BIN, TONG LI. DENSE-GWP: AN IMPROVED PRIMARY VISUAL ENCODING MODEL BASED ON DENSE GABOR FEATURES. J MECH MED BIOL 2021. [DOI: 10.1142/s0219519421400170] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Brain visual encoding models based on functional magnetic resonance imaging are growing increasingly popular. The Gabor wavelet pyramid model (GWP) is a classic example, exhibiting a good prediction performance for the primary visual cortex (V1, V2, and V3). However, the local variations in the visual stimulation are quite convoluted in terms of spatial frequency, orientation, and position, posing a challenge for visual encoding models. Whether the GWP model can thoroughly extract informative and effective features from visual stimulus remains unclear. To this end, this paper proposes a dense GWP visual encoding model by ameliorating the composition of the Gabor wavelet basis from three aspects: spatial frequency, orientation, and position. The improved model named Dense-GWP model could extract denser features from the image stimulus. A regularization optimization algorithm was used to select informative and effective features, which were crucial for predicting voxel activity in the region of interest. Extensive experimental results showed that the Dense-GWP model exhibits an improved prediction performance and can therefore help further understand the human visual perception mechanism.
Collapse
Affiliation(s)
- YIBO CUI
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force, Information Engineering University, Zhengzhou 450001, P. R. China
| | - CHI ZHANG
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force, Information Engineering University, Zhengzhou 450001, P. R. China
| | - LINYUAN WANG
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force, Information Engineering University, Zhengzhou 450001, P. R. China
| | - BIN YAN
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force, Information Engineering University, Zhengzhou 450001, P. R. China
| | - LI TONG
- Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force, Information Engineering University, Zhengzhou 450001, P. R. China
| |
Collapse
|
9
|
Choupan J, Douglas PK, Gal Y, Cohen MS, Reutens DC, Yang Z. Temporal embedding and spatiotemporal feature selection boost multi-voxel pattern analysis decoding accuracy. J Neurosci Methods 2020; 345:108836. [PMID: 32726664 DOI: 10.1016/j.jneumeth.2020.108836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 06/24/2020] [Accepted: 06/28/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND In fMRI decoding, temporal embedding of spatial features of the brain allows the incorporation of brain activity dynamics into the multivariate pattern classification process, and provides enriched information about stimulus-specific response patterns and potentially improved prediction accuracy. NEW METHOD This study investigates the possibility of enhancing the classification performance by exploring temporal embedding, to identify the optimum combination of spatiotemporal features based on their classification performance. We investigated the importance of spatiotemporal feature selection using a slow event-related design adapted from the classic Haxby study (Haxby et al., 2001). Data were collected using a multiband fMRI sequence with temporal resolution of 0.568 s. COMPARISON WITH EXISTING METHODS A wide range of spatiotemporal observations were created as various combinations of spatiotemporal features. Using both random forest, and support vector machine, classifiers prediction accuracies for these combinations were then compared with the single spatial multivariate pattern approach that uses only a single temporal observation. RESULTS Our findings showed that, on average, spatiotemporal feature selection improved prediction accuracy. Moreover, the random forest algorithm outperformed the support vector machine and benefitted from temporal information to a greater extent. CONCLUSIONS As expected, the most influential temporal durations were found to be around the peak of the hemodynamic response function, a few seconds after the stimuli onset until -4 s after the peak of the hemodynamic response function. The superiority of spatiotemporal feature selection over single time-point spatial approaches invites future work to design optimal approaches that incorporate spatiotemporal dependencies into feature selection for decoding.
Collapse
Affiliation(s)
- Jeiran Choupan
- Centre for Advanced Imaging, The University of Queensland, Brisbane, Australia; Queensland Brain Institute, The University of Queensland, Brisbane, Australia; Department of Psychology, USC Dornsife College of Letters, Arts and Sciences, University of Southern California, USA; Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Pamela K Douglas
- Center for Cognitive Neuroscience, University of California, Los Angeles, CA, USA; Modeling & Simulation, and Computer Science Departments, UCF, Florida, USA
| | - Yaniv Gal
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
| | - Mark S Cohen
- Neuropsychiatric Institute, University of California, Los Angeles, CA, USA; Departments of Psychiatry and Behavioral Sciences, Neurology, Radiological Sciences, Biomedical Physics, Psychology, and Bioengineering, University of California, Los Angeles, CA, USA; California Nanosystems Institute UCLA School of Medicine, Los Angeles, CA, USA
| | - David C Reutens
- Centre for Advanced Imaging, The University of Queensland, Brisbane, Australia
| | - Zhengyi Yang
- Centre for Advanced Imaging, The University of Queensland, Brisbane, Australia; School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia; Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
10
|
Yu Z, Zhang C, Wang L, Tong L, Yan B. A Comparative Analysis of Visual Encoding Models Based on Classification and Segmentation Task-Driven CNNs. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:5408942. [PMID: 32802150 PMCID: PMC7416280 DOI: 10.1155/2020/5408942] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 05/31/2020] [Accepted: 06/06/2020] [Indexed: 11/17/2022]
Abstract
Nowadays, visual encoding models use convolution neural networks (CNNs) with outstanding performance in computer vision to simulate the process of human information processing. However, the prediction performances of encoding models will have differences based on different networks driven by different tasks. Here, the impact of network tasks on encoding models is studied. Using functional magnetic resonance imaging (fMRI) data, the features of natural visual stimulation are extracted using a segmentation network (FCN32s) and a classification network (VGG16) with different visual tasks but similar network structure. Then, using three sets of features, i.e., segmentation, classification, and fused features, the regularized orthogonal matching pursuit (ROMP) method is used to establish the linear mapping from features to voxel responses. The analysis results indicate that encoding models based on networks performing different tasks can effectively but differently predict stimulus-induced responses measured by fMRI. The prediction accuracy of the encoding model based on VGG is found to be significantly better than that of the model based on FCN in most voxels but similar to that of fused features. The comparative analysis demonstrates that the CNN performing the classification task is more similar to human visual processing than that performing the segmentation task.
Collapse
Affiliation(s)
- Ziya Yu
- PLA Strategy Support Force Information Engineering University, Zhengzhou 450001, China
| | - Chi Zhang
- PLA Strategy Support Force Information Engineering University, Zhengzhou 450001, China
| | - Linyuan Wang
- PLA Strategy Support Force Information Engineering University, Zhengzhou 450001, China
| | - Li Tong
- PLA Strategy Support Force Information Engineering University, Zhengzhou 450001, China
| | - Bin Yan
- PLA Strategy Support Force Information Engineering University, Zhengzhou 450001, China
| |
Collapse
|
11
|
Babakmehr M, St-Yves G, Naselaris T. Working with high-dimensional feature spaces. Mach Learn 2020. [DOI: 10.1016/b978-0-12-815739-8.00015-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Zhang C, Qiao K, Wang L, Tong L, Hu G, Zhang RY, Yan B. A visual encoding model based on deep neural networks and transfer learning for brain activity measured by functional magnetic resonance imaging. J Neurosci Methods 2019; 325:108318. [DOI: 10.1016/j.jneumeth.2019.108318] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 03/29/2019] [Accepted: 06/16/2019] [Indexed: 11/28/2022]
|
13
|
Du C, Du C, Huang L, He H. Reconstructing Perceived Images From Human Brain Activities With Bayesian Deep Multiview Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2310-2323. [PMID: 30561354 DOI: 10.1109/tnnls.2018.2882456] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Neural decoding, which aims to predict external visual stimuli information from evoked brain activities, plays an important role in understanding human visual system. Many existing methods are based on linear models, and most of them only focus on either the brain activity pattern classification or visual stimuli identification. Accurate reconstruction of the perceived images from the measured human brain activities still remains challenging. In this paper, we propose a novel deep generative multiview model for the accurate visual image reconstruction from the human brain activities measured by functional magnetic resonance imaging (fMRI). Specifically, we model the statistical relationships between the two views (i.e., the visual stimuli and the evoked fMRI) by using two view-specific generators with a shared latent space. On the one hand, we adopt a deep neural network architecture for visual image generation, which mimics the stages of human visual processing. On the other hand, we design a sparse Bayesian linear model for fMRI activity generation, which can effectively capture voxel correlations, suppress data noise, and avoid overfitting. Furthermore, we devise an efficient mean-field variational inference method to train the proposed model. The proposed method can accurately reconstruct visual images via Bayesian inference. In particular, we exploit a posterior regularization technique in the Bayesian inference to regularize the model posterior. The quantitative and qualitative evaluations conducted on multiple fMRI data sets demonstrate the proposed method can reconstruct visual images more accurately than the state of the art.
Collapse
|
14
|
Wen H, Shi J, Chen W, Liu Z. Transferring and generalizing deep-learning-based neural encoding models across subjects. Neuroimage 2018; 176:152-163. [PMID: 29705690 PMCID: PMC5976558 DOI: 10.1016/j.neuroimage.2018.04.053] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Accepted: 04/23/2018] [Indexed: 12/11/2022] Open
Abstract
Recent studies have shown the value of using deep learning models for mapping and characterizing how the brain represents and organizes information for natural vision. However, modeling the relationship between deep learning models and the brain (or encoding models), requires measuring cortical responses to large and diverse sets of natural visual stimuli from single subjects. This requirement limits prior studies to few subjects, making it difficult to generalize findings across subjects or for a population. In this study, we developed new methods to transfer and generalize encoding models across subjects. To train encoding models specific to a target subject, the models trained for other subjects were used as the prior models and were refined efficiently using Bayesian inference with a limited amount of data from the target subject. To train encoding models for a population, the models were progressively trained and updated with incremental data from different subjects. For the proof of principle, we applied these methods to functional magnetic resonance imaging (fMRI) data from three subjects watching tens of hours of naturalistic videos, while a deep residual neural network driven by image recognition was used to model visual cortical processing. Results demonstrate that the methods developed herein provide an efficient and effective strategy to establish both subject-specific and population-wide predictive models of cortical representations of high-dimensional and hierarchical visual features.
Collapse
Affiliation(s)
- Haiguang Wen
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Junxing Shi
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA
| | - Wei Chen
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Zhongming Liu
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA; School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA; Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
15
|
Onal Ertugrul I, Ozay M, Yarman Vural FT. Encoding the local connectivity patterns of fMRI for cognitive task and state classification. Brain Imaging Behav 2018; 13:893-904. [PMID: 29948907 DOI: 10.1007/s11682-018-9901-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this work, we propose a novel framework to encode the local connectivity patterns of brain, using Fisher vectors (FV), vector of locally aggregated descriptors (VLAD) and bag-of-words (BoW) methods. We first obtain local descriptors, called mesh arc descriptors (MADs) from fMRI data, by forming local meshes around anatomical regions, and estimating their relationship within a neighborhood. Then, we extract a dictionary of relationships, called brain connectivity dictionary by fitting a generative Gaussian mixture model (GMM) to a set of MADs, and selecting codewords at the mean of each component of the mixture. Codewords represent connectivity patterns among anatomical regions. We also encode MADs by VLAD and BoW methods using k-Means clustering. We classify cognitive tasks using the Human Connectome Project (HCP) task fMRI dataset and cognitive states using the Emotional Memory Retrieval (EMR). We train support vector machines (SVMs) using the encoded MADs. Results demonstrate that, FV encoding of MADs can be successfully employed for classification of cognitive tasks, and outperform VLAD and BoW representations. Moreover, we identify the significant Gaussians in mixture models by computing energy of their corresponding FV parts, and analyze their effect on classification accuracy. Finally, we suggest a new method to visualize the codewords of the learned brain connectivity dictionary.
Collapse
Affiliation(s)
| | - Mete Ozay
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
| | - Fatos T Yarman Vural
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
16
|
Robust inter-subject audiovisual decoding in functional magnetic resonance imaging using high-dimensional regression. Neuroimage 2017; 163:244-263. [DOI: 10.1016/j.neuroimage.2017.09.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Revised: 09/14/2017] [Accepted: 09/17/2017] [Indexed: 11/23/2022] Open
|
17
|
Abstract
Human brain is supposed to process information in multiple frequency bands. Therefore, we can extract diverse information from functional Magnetic Resonance Imaging (fMRI) data by processing it at multiple resolutions. We propose a framework, called Hierarchical Multi-resolution Mesh Networks (HMMNs), which establishes a set of brain networks at multiple resolutions of fMRI signal to represent the underlying cognitive process. Our framework, first, decomposes the fMRI signal into various frequency subbands using wavelet transform. Then, a brain network is formed at each subband by ensembling a set of local meshes. Arc weights of each local mesh are estimated by ridge regression. Finally, adjacency matrices of mesh networks obtained at different subbands are used to train classifiers in an ensemble learning architecture, called fuzzy stacked generalization (FSG). Our decoding performances on Human Connectome Project task-fMRI dataset reflect that HMMNs can successfully discriminate tasks with 99% accuracy, across 808 subjects. Diversity of information embedded in mesh networks of multiple subbands enables the ensemble of classifiers to collaborate with each other for brain decoding. The suggested HMMNs decode the cognitive tasks better than a single classifier applied to any subband. Also mesh networks have a better representation power compared to pairwise correlations or average voxel time series. Moreover, fusion of diverse information using FSG outperforms fusion with majority voting. We conclude that, fMRI data, recorded during a cognitive task, provide diverse information in multi-resolution mesh networks. Our framework fuses this complementary information and boosts the brain decoding performances obtained at individual subbands.
Collapse
|
18
|
|
19
|
Roldan SM. Object Recognition in Mental Representations: Directions for Exploring Diagnostic Features through Visual Mental Imagery. Front Psychol 2017; 8:833. [PMID: 28588538 PMCID: PMC5441390 DOI: 10.3389/fpsyg.2017.00833] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 05/08/2017] [Indexed: 11/13/2022] Open
Abstract
One of the fundamental goals of object recognition research is to understand how a cognitive representation produced from the output of filtered and transformed sensory information facilitates efficient viewer behavior. Given that mental imagery strongly resembles perceptual processes in both cortical regions and subjective visual qualities, it is reasonable to question whether mental imagery facilitates cognition in a manner similar to that of perceptual viewing: via the detection and recognition of distinguishing features. Categorizing the feature content of mental imagery holds potential as a reverse pathway by which to identify the components of a visual stimulus which are most critical for the creation and retrieval of a visual representation. This review will examine the likelihood that the information represented in visual mental imagery reflects distinctive object features thought to facilitate efficient object categorization and recognition during perceptual viewing. If it is the case that these representational features resemble their sensory counterparts in both spatial and semantic qualities, they may well be accessible through mental imagery as evaluated through current investigative techniques. In this review, methods applied to mental imagery research and their findings are reviewed and evaluated for their efficiency in accessing internal representations, and implications for identifying diagnostic features are discussed. An argument is made for the benefits of combining mental imagery assessment methods with diagnostic feature research to advance the understanding of visual perceptive processes, with suggestions for avenues of future investigation.
Collapse
Affiliation(s)
- Stephanie M. Roldan
- Virginia Tech Visual Neuroscience Laboratory, Psychology Department, Virginia Polytechnic Institute and State University, BlacksburgVA, United States
| |
Collapse
|
20
|
Rupp K, Roos M, Milsap G, Caceres C, Ratto C, Chevillet M, Crone NE, Wolmetz M. Semantic attributes are encoded in human electrocorticographic signals during visual object recognition. Neuroimage 2017; 148:318-329. [DOI: 10.1016/j.neuroimage.2016.12.074] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 12/21/2016] [Accepted: 12/26/2016] [Indexed: 10/20/2022] Open
|
21
|
Learning Tensor-Based Features for Whole-Brain fMRI Classification. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-319-24553-9_75] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
22
|
Encoding brain network response to free viewing of videos. Cogn Neurodyn 2014; 8:389-97. [PMID: 25206932 DOI: 10.1007/s11571-014-9291-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Revised: 04/02/2014] [Accepted: 04/15/2014] [Indexed: 10/25/2022] Open
Abstract
A challenging goal for cognitive neuroscience researchers is to determine how mental representations are mapped onto the patterns of neural activity. To address this problem, functional magnetic resonance imaging (fMRI) researchers have developed a large number of encoding and decoding methods. However, previous studies typically used rather limited stimuli representation, like semantic labels and Wavelet Gabor filters, and largely focused on voxel-based brain patterns. Here, we present a new fMRI encoding model to predict the human brain's responses to free viewing of video clips which aims to deal with this limitation. In this model, we represent the stimuli using a variety of representative visual features in the computer vision community, which can describe the global color distribution, local shape and spatial information and motion information contained in videos, and apply the functional connectivity to model the brain's activity pattern evoked by these video clips. Our experimental results demonstrate that brain network responses during free viewing of videos can be robustly and accurately predicted across subjects by using visual features. Our study suggests the feasibility of exploring cognitive neuroscience studies by computational image/video analysis and provides a novel concept of using the brain encoding as a test-bed for evaluating visual feature extraction.
Collapse
|