The Big Win Strategy on Multi-Value Network

Authors :: Shun-Shii Lin
Nai-Yuan Chang
Surag Nair
Chih-Hung Chen
Source :: Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence.
Publication Year :: 2018
Publisher :: ACM, 2018.
Abstract: The AlphaZero approach got a great success and achieved superhuman performance across many challenging games, but we think there are at least three problems that can be improved. Firstly, AlphaZero only estimates win, draw, or lose but ignores how many points it will get or lose. Secondly, AlphaZero uses Monte-Carlo Tree Search to derive an average value among all the children nodes' values in a subtree. Thirdly, AlphaZero does not consider the depth rewards during the Monte-Carlo Tree Search. To solve these three problems, we introduce a general-purpose framework, the Big-Best-Quick win strategy in Monte-Carlo Tree Search, to try to surpass the AlphaZero approach. In this paper, we mainly focus on the Big-win strategy to improve the performance of AlphaZero without human knowledge. We are pleased to derive some promising results in which our Big-win approach has improved the strength of the 6x6 Othello program with win rate=63%, lose rate=28%, and draw rate=9% comparing to the original AlphaZero approach based on a fair training and playing time conditions.

Subjects :: Value (ethics)
Focus (computing)
Artificial neural network
business.industry
Computer science
0102 computer and information sciences
02 engineering and technology
01 natural sciences
Tree (data structure)
Value network
010201 computation theory & mathematics
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business

Database :: OpenAIRE
Journal :: Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence
Accession number :: edsair.doi...........74603cb9b76ad46a6630c5608bae78f3
Full Text :: https://doi.org/10.1145/3278312.3278325