POLITECNICO DI BARI - Catalogo dei prodotti della Ricerca

A web crawler is a relatively simple automated program or script that methodically scans or "crawls" through Internet pages to retrieval information from data. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. There are many different uses for a web crawler. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. In this work we propose the model of a low cost web crawler for distributed environments based on an efficient URL assignment algorithm. The function of every module of the crawler is analyzed and main rules that crawlers must follow to maintain load balancing and robustness of system when they are searching on the web simultaneously, are discussed. The proposed adynamic URL assignment method, based on grid computing technology and dynamic clustering, results efficient increasing web crawler performance

A dynamic URL assignment method for parallel web crawler / Guerriero, Andrea; Ragni, F.; Martines, C.. - (2010), pp. 119-123. (Intervento presentato al convegno 8th IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, CIMSA 2010 tenutosi a Taranto nel September 6-8, 2010) [10.1109/CIMSA.2010.5611764].

A dynamic URL assignment method for parallel web crawler

GUERRIERO, Andrea;Ragni, F.;Martines, C.

2010-01-01

Abstract

A web crawler is a relatively simple automated program or script that methodically scans or "crawls" through Internet pages to retrieval information from data. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. There are many different uses for a web crawler. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. In this work we propose the model of a low cost web crawler for distributed environments based on an efficient URL assignment algorithm. The function of every module of the crawler is analyzed and main rules that crawlers must follow to maintain load balancing and robustness of system when they are searching on the web simultaneously, are discussed. The proposed adynamic URL assignment method, based on grid computing technology and dynamic clustering, results efficient increasing web crawler performance

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2010
			
	Titolo del convegno
	
				8th IEEE  International Conference on Computational Intelligence for Measurement Systems and Applications, CIMSA 2010
			
	Codice ISBN
	
				978-1-4244-7228-4
			
	Codice DOI
	
				https://dx.doi.org/10.1109/CIMSA.2010.5611764
			
	Citazione
	
				A dynamic URL assignment method for parallel web crawler / Guerriero, Andrea; Ragni, F.; Martines, C.. - (2010), pp. 119-123. (Intervento presentato al  convegno 8th IEEE  International Conference on Computational Intelligence for Measurement Systems and Applications, CIMSA 2010 tenutosi a Taranto nel September 6-8, 2010) [10.1109/CIMSA.2010.5611764].
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/23350

Citazioni

12

ND

social impact