Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning.

Authors :: Zhang, Rui
Wang, Zhenyu
Zheng, Mengdan
Zhao, Yangyang
Huang, Zhenhua
Source :: Neurocomputing. Oct2021, Vol. 459, p122-130. 9p.
Publication Year :: 2021
Abstract: In recent years, task-oriented dialogue systems have received extensive attention from academia and industry. Training a dialogue agent through reinforcement learning is often costly because it requires many interactions with real users. Although the Deep Dyna-Q (DDQ) framework uses simulation experience to alleviate the cost of direct reinforcement learning, it still suffers from challenges such as delayed rewards and policy degradation. This paper proposes an Emotion-Sensitive Deep Dyna-Q (ES-DDQ) model which: (1) presents an emotional world model that considers emotion-related cues to improve the ability of the traditional DDQ framework to model and simulate users, and (2) designs two kinds of emotion-related immediate rewards to mitigate the delayed reward problem. Experimental results show that our proposed approach effectively simulates users' behaviors and is superior to the state-of-the-art benchmarks. [ABSTRACT FROM AUTHOR]

Subjects :: *DEEP learning
*REWARD (Psychology)
*REINFORCEMENT learning
*DIRECT costing

Full Text Access

Tools