POLITECNICO DI BARI - Catalogo dei prodotti della Ricerca

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system

On bulk data loading for large-scale analytics applications / Barbuzzi, Antonio; Michiardi, P.; Biersack, E.; Boggia, Gennaro. - (2010), pp. 27-31. ( 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010 Zurich, Switzerland July 28-29, 2010) [10.1145/1859184.1859192].

On bulk data loading for large-scale analytics applications

Barbuzzi, Antonio;Michiardi, P.;Biersack, E.;BOGGIA, Gennaro

2010

Abstract

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals. We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2010
			
	Titolo del convegno
	
				4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
			
	Codice ISBN
	
				978-145030406-1
			
	Codice DOI
	
				https://dx.doi.org/10.1145/1859184.1859192
			
	Citazione
	
				On bulk data loading for large-scale analytics applications / Barbuzzi, Antonio; Michiardi, P.; Biersack, E.; Boggia, Gennaro. - (2010), pp. 27-31. ( 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010 Zurich, Switzerland July 28-29, 2010) [10.1145/1859184.1859192].
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/22572

Citazioni

9

ND

social impact