Current Web search engines find new documents basically crawling the hyperlinks with the aid of spider agents. Nevertheless, when indexing newly discovered documents they revert to conventional information retrieval models and single-document indexing, thus neglecting the inherently hypertextual structure of Web documents. Therefore, it can happen that a query string, partially present in a document, with the remaining part available in a linked document on the same site, does not correspond to a hit. This considerably reduces retrieval effectiveness. To overcome this and other limits we propose an approach based on temporal logic that, starting with the modeling of a web site as a finite state graph, allows one to define complex queries over hyperlinks with the aid of Computation Tree Logic (CTL) operators. Query formulation is composed by two steps: the first one is user-oriented and provides a user with a friendly interface to pose queries. The second step is the query translation in CTL formulas. The formulation of the query is not visible to the user that simply expresses his/her requirements in natural language. We implemented the proposed approach in a prototype system. Results of experiments show an improvement in retrieval effectiveness.
I-Search: a system for intelligent information search on the web / Di Sciascio, E.; Donini, F. M.; Mongiello, M.. - STAMPA. - 2366:(2002), pp. 149-157. (Intervento presentato al convegno 13th International Symposium, ISMIS 2002 tenutosi a Lyon, France nel June 27-29, 2002) [10.1007/3-540-48050-1_18].
I-Search: a system for intelligent information search on the web
E. Di Sciascio;F. M. Donini;M. Mongiello
2002-01-01
Abstract
Current Web search engines find new documents basically crawling the hyperlinks with the aid of spider agents. Nevertheless, when indexing newly discovered documents they revert to conventional information retrieval models and single-document indexing, thus neglecting the inherently hypertextual structure of Web documents. Therefore, it can happen that a query string, partially present in a document, with the remaining part available in a linked document on the same site, does not correspond to a hit. This considerably reduces retrieval effectiveness. To overcome this and other limits we propose an approach based on temporal logic that, starting with the modeling of a web site as a finite state graph, allows one to define complex queries over hyperlinks with the aid of Computation Tree Logic (CTL) operators. Query formulation is composed by two steps: the first one is user-oriented and provides a user with a friendly interface to pose queries. The second step is the query translation in CTL formulas. The formulation of the query is not visible to the user that simply expresses his/her requirements in natural language. We implemented the proposed approach in a prototype system. Results of experiments show an improvement in retrieval effectiveness.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.