This paper presents a switching control strategy as a criterion for policy selection in stochastic Dynamic Programming problems over an infinite time horizon. In particular, the Bellman operator, applied iteratively to solve such problems, is generalized to the case of stochastic policies, and formulated as a discrete-time switched affine system. Then, a Lyapunov-based policy selection strategy is designed to ensure the practical convergence of the resulting closed-loop system trajectories towards an appropriately chosen reference value function. This way, it is possible to verify how the chosen reference value function can be approached by using a stabilizing switching signal, the latter defined on a given finite set of stationary stochastic policies. Finally, the presented method is applied to the Value Iteration algorithm, and an illustrative example of a recycling robot is provided to demonstrate its effectiveness in terms of convergence
A switching control strategy for policy selection in stochastic Dynamic Programming problems / Tipaldi, Massimo; Iervolino, Raffaele; Massenio, Paolo Roberto; Naso, David. - In: AUTOMATICA. - ISSN 0005-1098. - 171:(2024). [10.1016/j.automatica.2024.111884]
A switching control strategy for policy selection in stochastic Dynamic Programming problems
Tipaldi, Massimo;Massenio, Paolo Roberto;Naso, David
2024-01-01
Abstract
This paper presents a switching control strategy as a criterion for policy selection in stochastic Dynamic Programming problems over an infinite time horizon. In particular, the Bellman operator, applied iteratively to solve such problems, is generalized to the case of stochastic policies, and formulated as a discrete-time switched affine system. Then, a Lyapunov-based policy selection strategy is designed to ensure the practical convergence of the resulting closed-loop system trajectories towards an appropriately chosen reference value function. This way, it is possible to verify how the chosen reference value function can be approached by using a stabilizing switching signal, the latter defined on a given finite set of stationary stochastic policies. Finally, the presented method is applied to the Value Iteration algorithm, and an illustrative example of a recycling robot is provided to demonstrate its effectiveness in terms of convergenceI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.