Massive quantities of data are today processed using parallel computing frameworks that parallelize computations on large distributed clusters consisting of many machines. Such frameworks are adopted in big data analytic tasks as recommender systems, social network analysis, legal investigation that involve iterative computations over large datasets. One of the most used framework is MapReduce, scalable and suitable for data-intensive processing with a parallel computation model characterized by sequential and parallel processing interleaving. Its open-source implementation -- Hadoop -- is adopted by many cloud infrastructures as Google, Yahoo, Amazon, Facebook. In this paper we propose a formal approach to model the MapReduce framework using model checking and temporal logics to verify properties of reliability and load balancing of the MapReduce job flow.
|Autori interni:||DI NOIA, Tommaso|
DI SCIASCIO, Eugenio
|Titolo:||A computational model for Mapreduce job flow|
|Data di pubblicazione:||2014|
|Nome del convegno:||cilc 2014|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|