1
|
Zhang L, Liu Q, Zhu F, Huang Z. Addressing implicit bias in adversarial imitation learning with mutual information. Neural Netw 2023; 167:847-864. [PMID: 37741067 DOI: 10.1016/j.neunet.2023.08.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/10/2023] [Accepted: 08/30/2023] [Indexed: 09/25/2023]
Abstract
Adversarial imitation learning (AIL) is a powerful method for automated decision systems due to training a policy efficiently by mimicking expert demonstrations. However, implicit bias is present in the reward function of these algorithms, which leads to sample inefficiency. To solve this issue, an algorithm, referred to as Mutual Information Generative Adversarial Imitation Learning (MI-GAIL), is proposed to correct the biases. In this study, we propose two guidelines for designing an unbiased reward function. Based on these guidelines, we shape the reward function from the discriminator by adding auxiliary information from a potential-based reward function. The primary insight is that the potential-based reward function provides more accurate rewards for actions identified in the two guidelines. We compare our algorithm with SOTA imitation learning algorithms on a family of continuous control tasks. Experiments results show that MI-GAIL is able to address the issue of bias in AIL reward functions and further improve sample efficiency and training stability.
Collapse
Affiliation(s)
- Lihua Zhang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China.
| | - Quan Liu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China; Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China; Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China.
| | - Zhigang Huang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China.
| |
Collapse
|
2
|
Zhou Y, Lu M, Liu X, Che Z, Xu Z, Tang J, Zhang Y, Peng Y, Peng Y. Distributional generative adversarial imitation learning with reproducing kernel generalization. Neural Netw 2023; 165:43-59. [PMID: 37276810 DOI: 10.1016/j.neunet.2023.05.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 04/16/2023] [Accepted: 05/16/2023] [Indexed: 06/07/2023]
Abstract
Generative adversarial imitation learning (GAIL) regards imitation learning (IL) as a distribution matching problem between the state-action distributions of the expert policy and the learned policy. In this paper, we focus on the generalization and computational properties of policy classes. We prove that the generalization can be guaranteed in GAIL when the class of policies is well controlled. With the capability of policy generalization, we introduce distributional reinforcement learning (RL) into GAIL and propose the greedy distributional soft gradient (GDSG) algorithm to solve GAIL. The main advantages of GDSG can be summarized as: (1) Q-value overestimation, a crucial factor leading to the instability of GAIL with off-policy training, can be alleviated by distributional RL. (2) By considering the maximum entropy objective, the policy can be improved in terms of performance and sample efficiency through sufficient exploration. Moreover, GDSG attains a sublinear convergence rate to a stationary solution. Comprehensive experimental verification in MuJoCo environments shows that GDSG can mimic expert demonstrations better than previous GAIL variants.
Collapse
Affiliation(s)
- Yirui Zhou
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China.
| | - Mengxiao Lu
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China.
| | - Xiaowei Liu
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China.
| | | | | | - Jian Tang
- Midea Group, Shanghai, 201702, China.
| | - Yangchun Zhang
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China.
| | - Yan Peng
- School of Artificial Intelligence, Shanghai University, Shanghai, 200444, China.
| | - Yaxin Peng
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|