This paper presents a switching control strategy as a criterion for policy selection in stochastic Dynamic Programming problems over an infinite time horizon. In particular, the Bellman operator, applied iteratively to solve such problems, is generalized to the case of stochastic policies, and formulated as a discrete-time switched affine system. Then, a Lyapunov-based policy selection strategy is designed to ensure the practical convergence of the resulting closed-loop system trajectories towards an appropriately chosen reference value function. This way, it is possible to verify how the chosen reference value function can be approached by using a stabilizing switching signal, the latter defined on a given finite set of stationary stochastic policies. Finally, the presented method is applied to the Value Iteration algorithm, and an illustrative example of a recycling robot is provided to demonstrate its effectiveness in terms of convergence

A switching control strategy for policy selection in stochastic Dynamic Programming problems / Tipaldi, Massimo; Iervolino, Raffaele; Massenio, Paolo Roberto; Naso, David. - In: AUTOMATICA. - ISSN 0005-1098. - 171:(2025). [10.1016/j.automatica.2024.111884]

A switching control strategy for policy selection in stochastic Dynamic Programming problems

Tipaldi, Massimo;Massenio, Paolo Roberto;Naso, David
2025-01-01

Abstract

This paper presents a switching control strategy as a criterion for policy selection in stochastic Dynamic Programming problems over an infinite time horizon. In particular, the Bellman operator, applied iteratively to solve such problems, is generalized to the case of stochastic policies, and formulated as a discrete-time switched affine system. Then, a Lyapunov-based policy selection strategy is designed to ensure the practical convergence of the resulting closed-loop system trajectories towards an appropriately chosen reference value function. This way, it is possible to verify how the chosen reference value function can be approached by using a stabilizing switching signal, the latter defined on a given finite set of stationary stochastic policies. Finally, the presented method is applied to the Value Iteration algorithm, and an illustrative example of a recycling robot is provided to demonstrate its effectiveness in terms of convergence
2025
A switching control strategy for policy selection in stochastic Dynamic Programming problems / Tipaldi, Massimo; Iervolino, Raffaele; Massenio, Paolo Roberto; Naso, David. - In: AUTOMATICA. - ISSN 0005-1098. - 171:(2025). [10.1016/j.automatica.2024.111884]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/274483
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact