Automated insulin delivery (AID) requires con- trollers that are both adaptive and safe. This study proposes a hybrid control framework that combines a reinforcement learning (RL) agent with a language-guided advisory layer in the SimGlucose simulator of the FDA-approved UVA/Padova type 1 diabetes model. A Proximal Policy Optimization (PPO) agent learns insulin dosing via an asymmetric, safety-weighted reward, while a fine-tuned Falcon-RW-1B model provides guideline-consistent recommendations. A supervisory fusion rule merges both outputs according to policy uncertainty and medical constraints (suspension < 90 mg/dL, recovery cap > 70 mg/dL, rate limit 0.03 U/min). Across ten virtual adults and twenty stochastic meal scenarios, the hybrid RL+LLM controller achieved a time-in-range of 86%±7.3% and reduced exposure to hypoglycemia compared with the considered base- lines, while maintaining total insulin delivery within a limited deviation from reference levels. A quasi-counterfactual analysis indicated strong alignment between rule activations and action changes (fidelity ≈1.0, validity ≥90%). These results suggest that hybrid RL–LLM architectures are a promising direction for safe and adaptive closed-loop insulin control.
Bridging Clinical Knowledge and Reinforcement Learning in Automated Insulin Delivery: An LLM-in-the-Loop Approach / Lops, Giada; Ramdan, Taha; Racanelli, Vito Andrea; De Cicco, Luca; Mascolo, Saverio. - (2026). ( European Control Conference (ECC) 2026 Reykjavík, Iceland July 7-10 2026).
Bridging Clinical Knowledge and Reinforcement Learning in Automated Insulin Delivery: An LLM-in-the-Loop Approach
Giada Lops
;Vito Andrea Racanelli;Luca De Cicco;Saverio Mascolo
2026
Abstract
Automated insulin delivery (AID) requires con- trollers that are both adaptive and safe. This study proposes a hybrid control framework that combines a reinforcement learning (RL) agent with a language-guided advisory layer in the SimGlucose simulator of the FDA-approved UVA/Padova type 1 diabetes model. A Proximal Policy Optimization (PPO) agent learns insulin dosing via an asymmetric, safety-weighted reward, while a fine-tuned Falcon-RW-1B model provides guideline-consistent recommendations. A supervisory fusion rule merges both outputs according to policy uncertainty and medical constraints (suspension < 90 mg/dL, recovery cap > 70 mg/dL, rate limit 0.03 U/min). Across ten virtual adults and twenty stochastic meal scenarios, the hybrid RL+LLM controller achieved a time-in-range of 86%±7.3% and reduced exposure to hypoglycemia compared with the considered base- lines, while maintaining total insulin delivery within a limited deviation from reference levels. A quasi-counterfactual analysis indicated strong alignment between rule activations and action changes (fidelity ≈1.0, validity ≥90%). These results suggest that hybrid RL–LLM architectures are a promising direction for safe and adaptive closed-loop insulin control.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

