Shen X, Zhang X, Wang Y. Kernel Temporal Difference based Reinforcement Learning for Brain Machine Interfaces
.
ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021;
2021:6721-6724. [PMID:
34892650 DOI:
10.1109/embc46164.2021.9631086]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Brain-machine interfaces (BMIs) enable people with disabilities to control external devices with their motor intentions through a decoder. Compared with supervised learning, reinforcement learning (RL) is more promising for the disabled because it can assist them to learn without actual limb movement. Current RL decoders deal with tasks with immediate reward delivery. But for tasks where the reward is only given by the end of the trial, existing RL methods may take a long time to train and are prone to becoming trapped in the local minima. In this paper, we propose to embed temporal difference method (TD) into Quantized Attention-Gated Kernel Reinforcement Learning (QAGKRL) to solve this temporal credit assignment problem. This algorithm utilizes a kernel network to ensure the global linear structure and adopts a softmax policy to efficiently explore the state-action mapping through TD error. We simulate a center-out task where the agent needs several steps to first reach a periphery target and then return to the center to get the external reward. Our proposed algorithm is tested on simulated data and compared with two state-of-the-art models. We find that introducing the TD method to QAGKRL achieves a prediction accuracy of 96.2% ± 0.77% (mean ± std), which is significantly better the other two methods.Clinical Relevance-This paper proposes a novel kernel temporal difference RL method for the multi-step task with delayed reward delivery, which potentially enables BMI online continuous decoding.
Collapse