Back to Search
Start Over
A unified benchmark for deep reinforcement learning-based energy management: Novel training ideas with the unweighted reward.
- Source :
-
Energy . Oct2024, Vol. 307, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- Deep reinforcement learning stands as a powerful force in the realm of intelligent control for hybrid power systems, yet some imperfections persist in the positive progression of learning-based strategies, necessitating the proposal of essential solutions to address these flaws. Firstly, a public and reliable benchmark model for hybrid powertrains and the optimization results of energy management strategies are essential. Hence, two Python-based standard deep reinforcement learning agents and four Simulink-based hybrid powertrains are employed, forming a co-simulation training approach as the reliable solution. Secondly, a detailed analysis from the perspectives of range, magnitude, and importance reveals that the optimization terms in traditional reward functions can mislead the agent during the training process and require cumbersome weight tuning. Accordingly, this paper proposes a novel training idea that combines the rule-based engine start-stop with an unweighted reward tailored for optimizing engine efficiency and facilitating training progress. Finally, a hardware-in-the-loop test is performed, treating the P2 hybrid electric vehicle as the target. The results show that two deep reinforcement learning-based energy management strategies achieved fuel economies of 6.537 L/100 km and 6.330 L/100 km, respectively, and more efficient and reasonable control sequences ensure the working state of the engine as well as the state of charge of batteries. • Relying on the Python/Simulink-based co-simulation training (standard RL agents and public hybrid powertrains) achieves more reliable results as benchmarks. • A novel training idea is proposed, an unweighted reward function is designed, and the imperfections of the traditional reward are also analyzed in detail. • The hardware-in-the-loop test is conducted, and standard models, algorithms, and results are made public. Without modifying any parameters, these results can be reproduced and used for verification. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 03605442
- Volume :
- 307
- Database :
- Academic Search Index
- Journal :
- Energy
- Publication Type :
- Academic Journal
- Accession number :
- 179172371
- Full Text :
- https://doi.org/10.1016/j.energy.2024.132687