Back to Search Start Over

M0RV Model: Advancing the MuZero Algorithm Through Strategic Data Optimization Reuse and Value Function Refinement

Authors :
Xuejian Chen
Yang Cao
Hongfei Yu
Caihua Sun
Source :
IEEE Access, Vol 12, Pp 120827-120839 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

This paper introduces a model, M0RV, that improves the MuZero algorithm through data reuse and loss function optimization. It proposes reusing training trajectories generated by Monte Carlo Tree Search (MCTS) after filtering through an evaluation function trace into the training process, and on this basis, employs the Advantage-Value method to optimize the neural network loss function, ultimately optimizing the training process. A comparative analysis is conducted between the baseline MuZero algorithm, its A0GB algorithm-enhanced variant M0GB, and the further refined M0RV algorithm, across a spectrum of Atari and intricate board games. Notably, M0RV outperforms its predecessors in both the Lunar Lander and Breakout games, as well as in the board game Hex, under consistent steps parameters and unified reward benchmarks. The empirical findings demonstrate that the M0RV model, in comparison to the MuZero model, substantially enhances training efficacy, successfully fulfilling the objective of optimizing the training methodology.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.776ec2f4ce384110a191e1a24aebeb37
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3450297