Although beyond-accuracy metrics have gained attention in the last decade, the accuracy of recommendations is still considered the gold standard to evaluate Recommender Systems (RSs). This approach prioritizes the accuracy of recommendations, neglecting the quality of suggestions to enhance user needs, such as diversity and novelty, as well as trustworthiness regulations in RSs for user and provider fairness. As a result, single metrics determine the success of RSs, but this approach fails to consider other criteria simultaneously. A downside of this method is that the most accurate model configuration may not excel in addressing the remaining criteria. This study seeks to broaden RS evaluation by introducing a multi-objective evaluation that considers all model configurations simultaneously under several perspectives. To achieve this, several hyper-parameter configurations of an RS model are trained, and the Pareto-optimal ones are retrieved. The Quality Indicators (QI) of Pareto frontiers, which are gaining interest in Multi-Objective Optimization research, are adapted to RSs. QI enables evaluating the model's performance by considering various configurations and giving the same importance to each metric. The experiments show that this multi-objective evaluation overturns the ranking of performance among RSs, paving the way to revisit the evaluation approaches of the RecSys research community. We release codes and datasets in the following GitHub repository: https://github.com/sisinflab/RecMOE.
Broadening the Scope: Evaluating the Potential of Recommender Systems beyond prioritizing Accuracy / Paparella, Vincenzo; Di Palma, Dario; Anelli, Vito Walter; Di Noia, Tommaso. - (2023), pp. 1139-1145. (Intervento presentato al convegno 17th ACM Conference on Recommender Systems, RecSys 2023 tenutosi a sgp nel 2023) [10.1145/3604915.3610649].
Broadening the Scope: Evaluating the Potential of Recommender Systems beyond prioritizing Accuracy
Vincenzo Paparella;Dario Di Palma;Vito Walter Anelli;Tommaso Di Noia
2023-01-01
Abstract
Although beyond-accuracy metrics have gained attention in the last decade, the accuracy of recommendations is still considered the gold standard to evaluate Recommender Systems (RSs). This approach prioritizes the accuracy of recommendations, neglecting the quality of suggestions to enhance user needs, such as diversity and novelty, as well as trustworthiness regulations in RSs for user and provider fairness. As a result, single metrics determine the success of RSs, but this approach fails to consider other criteria simultaneously. A downside of this method is that the most accurate model configuration may not excel in addressing the remaining criteria. This study seeks to broaden RS evaluation by introducing a multi-objective evaluation that considers all model configurations simultaneously under several perspectives. To achieve this, several hyper-parameter configurations of an RS model are trained, and the Pareto-optimal ones are retrieved. The Quality Indicators (QI) of Pareto frontiers, which are gaining interest in Multi-Objective Optimization research, are adapted to RSs. QI enables evaluating the model's performance by considering various configurations and giving the same importance to each metric. The experiments show that this multi-objective evaluation overturns the ranking of performance among RSs, paving the way to revisit the evaluation approaches of the RecSys research community. We release codes and datasets in the following GitHub repository: https://github.com/sisinflab/RecMOE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.