M0RV Model: Advancing the MuZero Algorithm Through Strategic Data Optimization Reuse and Value Function Refinement

Authors :: Xuejian Chen
Yang Cao
Hongfei Yu
Caihua Sun
Source :: IEEE Access, Vol 12, Pp 120827-120839 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: This paper introduces a model, M0RV, that improves the MuZero algorithm through data reuse and loss function optimization. It proposes reusing training trajectories generated by Monte Carlo Tree Search (MCTS) after filtering through an evaluation function trace into the training process, and on this basis, employs the Advantage-Value method to optimize the neural network loss function, ultimately optimizing the training process. A comparative analysis is conducted between the baseline MuZero algorithm, its A0GB algorithm-enhanced variant M0GB, and the further refined M0RV algorithm, across a spectrum of Atari and intricate board games. Notably, M0RV outperforms its predecessors in both the Lunar Lander and Breakout games, as well as in the board game Hex, under consistent steps parameters and unified reward benchmarks. The empirical findings demonstrate that the M0RV model, in comparison to the MuZero model, substantially enhances training efficacy, successfully fulfilling the objective of optimizing the training methodology.

Subjects :: MuZero
MCTS
game
training optimization
Electrical engineering. Electronics. Nuclear engineering
TK1-9971

Full Text Access

Tools