Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. This is, in part, because ANNs have demonstrated very good results in a wide variety of recommendation tasks. However, the introduction of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness. One aspect most of these comparisons have in common is their focus on accuracy, neglecting other evaluation dimensions important for the recommendation, such as novelty, diversity, or accounting for biases. In this work, we replicate experiments from three different papers that compare Neural Collaborative Filtering (NCF) and Matrix Factorization (MF), to extend the analysis to other evaluation dimensions. First, our contribution shows that the experiments under analysis are entirely reproducible, and we extend the study including other accuracy metrics and two statistical hypothesis tests. Second, we investigated the Diversity and Novelty of the recommendations, showing that MF provides a better accuracy also on the long tail, although NCF provides a better item coverage and more diversified recommendation lists. Lastly, we discuss the bias effect generated by the tested methods. They show a relatively small bias, but other recommendation baselines, with competitive accuracy performance, consistently show to be less affected by this issue. This is the first work, to the best of our knowledge, where several complementary evaluation dimensions have been explored for an array of state-of-the-art algorithms covering recent adaptations of ANNs and MF. Hence, we aim to show the potential these techniques may have on beyond-accuracy evaluation while analyzing the effect on reproducibility these complementary dimensions may spark. The code to reproduce the experiments is publicly available on GitHub at https://tny.sh/Reenvisioning.
Reenvisioning the comparison between Neural Collaborative Filtering and Matrix Factorization / Anelli, Vito Walter; Bellogín, Alejandro; Di Noia, Tommaso; Pomo, Claudio. - STAMPA. - (2021), pp. 3475944.521-3475944.529. (Intervento presentato al convegno 15th ACM Conference on Recommender Systems, RecSys '21 tenutosi a Amsterdam nel September 27 - October 1, 2021) [10.1145/3460231.3475944].
Reenvisioning the comparison between Neural Collaborative Filtering and Matrix Factorization
Anelli, Vito Walter
;Di Noia, Tommaso;Pomo, Claudio
2021-01-01
Abstract
Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. This is, in part, because ANNs have demonstrated very good results in a wide variety of recommendation tasks. However, the introduction of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness. One aspect most of these comparisons have in common is their focus on accuracy, neglecting other evaluation dimensions important for the recommendation, such as novelty, diversity, or accounting for biases. In this work, we replicate experiments from three different papers that compare Neural Collaborative Filtering (NCF) and Matrix Factorization (MF), to extend the analysis to other evaluation dimensions. First, our contribution shows that the experiments under analysis are entirely reproducible, and we extend the study including other accuracy metrics and two statistical hypothesis tests. Second, we investigated the Diversity and Novelty of the recommendations, showing that MF provides a better accuracy also on the long tail, although NCF provides a better item coverage and more diversified recommendation lists. Lastly, we discuss the bias effect generated by the tested methods. They show a relatively small bias, but other recommendation baselines, with competitive accuracy performance, consistently show to be less affected by this issue. This is the first work, to the best of our knowledge, where several complementary evaluation dimensions have been explored for an array of state-of-the-art algorithms covering recent adaptations of ANNs and MF. Hence, we aim to show the potential these techniques may have on beyond-accuracy evaluation while analyzing the effect on reproducibility these complementary dimensions may spark. The code to reproduce the experiments is publicly available on GitHub at https://tny.sh/Reenvisioning.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.