Back to Search Start Over

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion.

Authors :
Jiang, Xiaofeng
Xi, Hongsheng
Wang, Xiaodong
Liu, Falin
Source :
IEEE Transactions on Automatic Control. Oct2016, Vol. 61 Issue 10, p3070-3075. 6p.
Publication Year :
2016

Abstract

In this technical note, constrained partially observable Markov decision processes with discrete state and action spaces under the average reward criterion are studied from a sensitivity point of view. By analyzing the derivatives of performance criteria, we develop a simulation-based optimization algorithm to find the optimal observation-based policy on the basis of a single sample path. This algorithm does not need any overly strict assumption and can be applied to the general ergodic Markov systems with the imperfect state information. The performance is proved to converge to the optimum with probability 1. One numerical example is provided to illustrate the applicability of the algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00189286
Volume :
61
Issue :
10
Database :
Academic Search Index
Journal :
IEEE Transactions on Automatic Control
Publication Type :
Periodical
Accession number :
118364176
Full Text :
https://doi.org/10.1109/TAC.2015.2497904