Back to Search Start Over

A Novel Nonlinear Deep Reinforcement Learning Controller for DC–DC Power Buck Converters

Authors :
Meysam Gheisarnejad
Mohammad Hassan Khooban
Hamed Farsizadeh
Source :
Gheisarnejad, M, Farsizadeh, H & Khooban, M H 2021, ' A Novel Nonlinear Deep Reinforcement Learning Controller for DC–DC Power Buck Converters ', IEEE Transactions on Industrial Electronics, vol. 68, no. 8, pp. 6849-6858 . https://doi.org/10.1109/TIE.2020.3005071
Publication Year :
2021
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2021.

Abstract

The nonlinearities and unmodeled dynamics inevitably degrade the quality and reliability of power conversion, and as a result, pose big challenges on higher-performance voltage stabilization of dc–dc buck converters. The stability of such power electronic equipment is further threatened when feeding the nonideal constant power loads (CPLs) because of the induced negative impedance specifications. In response to these challenges, the advanced regulatory and technological mechanisms associated with the converters require to be developed to efficiently implement these interface systems in the microgrid configuration. This article addresses an intelligent proportional-integral based on sliding mode (SM) observer to mitigate the destructive impedance instabilities of nonideal CPLs with time-varying nature in the ultralocal model sense. In particular, in the current article, an auxiliary deep deterministic policy gradient (DDPG) controller is adaptively developed to decrease the observer estimation error and further ameliorate the dynamic characteristics of dc–dc buck converters. The design of the DDPG is realized in two parts: (i) an actor-network which generates the policy commands, while (ii) a critic-network evaluates the quality of the policy command generated by the actor. The suggested strategy establishes the DDPG-based control to handle for what the iPI-based SM observer is unable to compensate. In this application, the weight coefficients of the actor and critic networks are trained based on the reward feedback of the voltage error, by using the gradient descent scheme. Finally, to investigate the merits and implementation feasibility of the suggested method, some experimental results on a laboratory prototype of the dc–dc buck converter, which feeds a time-varying CPL, are presented.

Details

ISSN :
15579948 and 02780046
Volume :
68
Database :
OpenAIRE
Journal :
IEEE Transactions on Industrial Electronics
Accession number :
edsair.doi.dedup.....91742f83aa0cf8847271d85b28648bde