Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others.
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M / Di Palma, Dario; Antonio Merra, Felice; Sfilio, Maurizio; Anelli, Vito Walter; Narducci, Fedelucio; Di Noia, Tommaso. - ELETTRONICO. - (2025), pp. 2582-2586. (Intervento presentato al convegno 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025 tenutosi a Padova nel July 13-18, 2025) [10.1145/3726302.3730178].
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M
Dario Di Palma;Vito Walter Anelli;Fedelucio Narducci;Tommaso Di Noia
2025
Abstract
Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_Do_LLMs_Memorize_Recommendation_Datasets?_A_Preliminary_Study_on_MovieLens-1M_pdfeditoriale.pdf
accesso aperto
Tipologia:
Versione editoriale
Licenza:
Creative commons
Dimensione
960.98 kB
Formato
Adobe PDF
|
960.98 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

