Automated insulin delivery (AID) requires con- trollers that are both adaptive and safe. This study proposes a hybrid control framework that combines a reinforcement learning (RL) agent with a language-guided advisory layer in the SimGlucose simulator of the FDA-approved UVA/Padova type 1 diabetes model. A Proximal Policy Optimization (PPO) agent learns insulin dosing via an asymmetric, safety-weighted reward, while a fine-tuned Falcon-RW-1B model provides guideline-consistent recommendations. A supervisory fusion rule merges both outputs according to policy uncertainty and medical constraints (suspension < 90 mg/dL, recovery cap > 70 mg/dL, rate limit 0.03 U/min). Across ten virtual adults and twenty stochastic meal scenarios, the hybrid RL+LLM controller achieved a time-in-range of 86%±7.3% and reduced exposure to hypoglycemia compared with the considered base- lines, while maintaining total insulin delivery within a limited deviation from reference levels. A quasi-counterfactual analysis indicated strong alignment between rule activations and action changes (fidelity ≈1.0, validity ≥90%). These results suggest that hybrid RL–LLM architectures are a promising direction for safe and adaptive closed-loop insulin control.

Bridging Clinical Knowledge and Reinforcement Learning in Automated Insulin Delivery: An LLM-in-the-Loop Approach / Lops, Giada; Ramdan, Taha; Racanelli, Vito Andrea; De Cicco, Luca; Mascolo, Saverio. - (2026). ( European Control Conference (ECC) 2026 Reykjavík, Iceland July 7-10 2026).

Bridging Clinical Knowledge and Reinforcement Learning in Automated Insulin Delivery: An LLM-in-the-Loop Approach

Giada Lops
;
Vito Andrea Racanelli;Luca De Cicco;Saverio Mascolo
2026

Abstract

Automated insulin delivery (AID) requires con- trollers that are both adaptive and safe. This study proposes a hybrid control framework that combines a reinforcement learning (RL) agent with a language-guided advisory layer in the SimGlucose simulator of the FDA-approved UVA/Padova type 1 diabetes model. A Proximal Policy Optimization (PPO) agent learns insulin dosing via an asymmetric, safety-weighted reward, while a fine-tuned Falcon-RW-1B model provides guideline-consistent recommendations. A supervisory fusion rule merges both outputs according to policy uncertainty and medical constraints (suspension < 90 mg/dL, recovery cap > 70 mg/dL, rate limit 0.03 U/min). Across ten virtual adults and twenty stochastic meal scenarios, the hybrid RL+LLM controller achieved a time-in-range of 86%±7.3% and reduced exposure to hypoglycemia compared with the considered base- lines, while maintaining total insulin delivery within a limited deviation from reference levels. A quasi-counterfactual analysis indicated strong alignment between rule activations and action changes (fidelity ≈1.0, validity ≥90%). These results suggest that hybrid RL–LLM architectures are a promising direction for safe and adaptive closed-loop insulin control.
2026
European Control Conference (ECC) 2026
Bridging Clinical Knowledge and Reinforcement Learning in Automated Insulin Delivery: An LLM-in-the-Loop Approach / Lops, Giada; Ramdan, Taha; Racanelli, Vito Andrea; De Cicco, Luca; Mascolo, Saverio. - (2026). ( European Control Conference (ECC) 2026 Reykjavík, Iceland July 7-10 2026).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/300341
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact