We present a systematic study of provider-side data poisoning in retrieval-Augmented recommender systems (RAG-based). By modifying only a small fraction of tokens within item descriptions-for instance, adding emotional keywords or borrowing phrases from semantically related items-An attacker can significantly promote or demote targeted items. We formalize these attacks under token-edit and semantic-similarity constraints, and we examine their effectiveness in both promotion (long-Tail items) and demotion (short-head items) scenarios. Our experiments on MovieLens, using two large language model (LLM) retrieval modules, show that even subtle attacks shift final rankings and item exposures while eluding naive detection. The results underscore the vulnerability of RAG-based pipelines to small-scale metadata rewrites, and emphasize the need for robust textual consistency checks and provenance tracking to thwart stealthy provider-side poisoning.

Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems / Nazary, Fatemeh; Deldjoo, Yashar; Di Noia, Tommaso; Di Sciascio, Eugenio. - ELETTRONICO. - (2025), pp. 98-102. ( 33rd Conference on User Modeling, Adaptation and Personalization, UMAP 2025 New York City June 16-19, 2025) [10.1145/3708319.3733675].

Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems

Nazary, Fatemeh;Deldjoo, Yashar;Di Noia, Tommaso;Di Sciascio, Eugenio
2025

Abstract

We present a systematic study of provider-side data poisoning in retrieval-Augmented recommender systems (RAG-based). By modifying only a small fraction of tokens within item descriptions-for instance, adding emotional keywords or borrowing phrases from semantically related items-An attacker can significantly promote or demote targeted items. We formalize these attacks under token-edit and semantic-similarity constraints, and we examine their effectiveness in both promotion (long-Tail items) and demotion (short-head items) scenarios. Our experiments on MovieLens, using two large language model (LLM) retrieval modules, show that even subtle attacks shift final rankings and item exposures while eluding naive detection. The results underscore the vulnerability of RAG-based pipelines to small-scale metadata rewrites, and emphasize the need for robust textual consistency checks and provenance tracking to thwart stealthy provider-side poisoning.
2025
33rd Conference on User Modeling, Adaptation and Personalization, UMAP 2025
979-8-4007-1399-6
Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems / Nazary, Fatemeh; Deldjoo, Yashar; Di Noia, Tommaso; Di Sciascio, Eugenio. - ELETTRONICO. - (2025), pp. 98-102. ( 33rd Conference on User Modeling, Adaptation and Personalization, UMAP 2025 New York City June 16-19, 2025) [10.1145/3708319.3733675].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/292026
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact