Search Shortcuts: a New Approach to the Recommendation of Queries

Search Shortcuts: a New Approach to the Recommendation of Queries Ranieri Baraglia 2, Fidel Cacheda 1, Victor Carneiro 1, Diego Fernández 1, Vreixo Formoso 1, Raffaele Perego 2, Fabrizio Silvestri 2 1
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Search Shortcuts: a New Approach to the Recommendation of Queries Ranieri Baraglia 2, Fidel Cacheda 1, Victor Carneiro 1, Diego Fernández 1, Vreixo Formoso 1, Raffaele Perego 2, Fabrizio Silvestri 2 1 University of A Coruña, Campus de Elviña s/n, A Coruña SPAIN 2 ISTI - CNR, Via G. Moruzzi 1, Pisa, ITALY {fidel, viccar, dfernandez, 1, {r.baraglia, r.perego, 2 ABSTRACT The recommendation of queries, known as query suggestion, is a common practice on major Web Search Engines. It aims to help users to find the information they are looking for, and is usually based on the knowledge learned from past interactions with the search engine. In this paper we propose a new model for query suggestion, the Search Shortcut Problem, that consists in recommending successful queries that allowed other users to satisfy, in the past, similar information needs. This new model has several advantages with respect to traditional query suggestion approaches. First, it allows a straightforward evaluation of algorithms from available query log data. Moreover, it simplifies the application of several recommendation techniques from other domains. Particularly, in this work we applied Collaborative Filtering to this problem, and evaluated the interesting results achieved on large query logs from AOL and Microsoft. Different techniques for analyzing and extracting information from query logs, as well as new metrics and techniques for measuring the effectiveness of recommendations are proposed and evaluated. The results obtained clearly show the importance of several of our contributions, and open an interesting field for future research. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Query formulation, Search process General Terms Algorithms, Experimentation, Theory Keywords Search shortcut, Collaborative Filtering, Query Suggestion, model, evaluation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RecSys 09, October 23 25, 2009, New York, New York, USA. Copyright 2009 ACM /09/10...$ INTRODUCTION The main objective of a Web search engine is to help the user fulfill his information need efficiently. In this sense, any assistance provided to users in order to reduce the time spent searching is very valuable. In fact, major search engines, besides being able to answer queries in a few hundred milliseconds, usually provide the user with suggestions, in the form of queries that are somehow related to the user information need. Suggestions are generated on the basis of the query submitted by the user, the knowledge learned from past interactions, and whatever context information available. The design and evaluation of effective and efficient algorithms for such suggestions is a complex and challenging task. For example, the performance of the methods proposed is traditionally evaluated by means of user studies. Although human-based evaluation has been found very precise, its main inconvenience is the non repeatability of the experiments, which makes difficult an extensive comparison of such techniques. In this work we define formally the Search Shortcut Problem (SSP) as a problem related with the recommendation of queries in search engines and the potential reductions obtained in the users session length. This new problem formulation allows a precise goal for query suggestion to be devised: recommend queries that allowed in the past users that used a similar search process, to successfully find the information they were looking for. Actually, considering the recommendation of queries as a SSP has two important advantages with respect to previous query suggestion approaches: First, the different techniques used to solve it can be easily evaluated by exploiting query log data. In this way, the evaluation can be performed offline, without the need of real users to check the relevance of the recommendations, thus simplifying the comparison among different techniques. In this work, we present an evaluation methodology, and we propose a new metric specifically designed to evaluate the precision of algorithms for the SSP. The main advantage with respect to traditional metrics is that our approach clearly defines what relevance means in this context, allowing an effective evaluation to be easily conducted. Furthermore, this new problem simplifies the application of several recommendation techniques to the suggestion of queries. This opens an interesting and promising new field 77 to the big and active recommender systems research community. Specifically, techniques such as Collaborative Filtering, specially successful in e-commerce, can be easily applied to this context. We just need to extract the implicit relevance (ratings) of each query from the query log data. In fact, in this paper we address this problem, by studying different techniques to extract the information from the query logs, and evaluating several collaborative filtering algorithms. The remaining of the paper is organized as follows. In the next Section we present some related work, in the field on query recommendation and collaborative filtering. Then, in Section 3, we introduce the SSP, we define its theoretical model, and finally we present a new evaluation metric. Following, we discuss the application of Collaborative Filtering methods to this problem (Section 4), and the challenges that should be addressed. In section 5, we introduce the experiments performed and we discuss the results obtained. Finally we present some conclusions and outline future research directions. 2. RELATED WORK The problem we presented in this paper is related to two different research fields that, although related, have been traditionally addressed from different points of view. We are talking about query suggestion and recommender systems. Recommender systems have been used in several domains, being specially successful in electronic commerce. They can be divided in two broad classes: those based on content filtering, and those on collaborative filtering. As the name suggests, content filtering approaches base their recommendations on the content of the items to be suggested. They faced serious limitations when dealing with multimedia content and, more importantly, their suggestions are not influenced by the human-perceived quality of contents. On the other side, collaborative filtering algorithms are based on the preferences of other users. There are two main types of collaborative filtering algorithms: memory-based and model-based. Memory-based approaches use the whole past data to identify similar users [21], items [19], or both [25]. Generally, memory-based algorithms are quite simple and produce good recommendations, but they usually face serious scalability problems. On the other hand, model-based algorithms construct in advance a model to represent the behavior of users, allowing to predict more efficiently their preferences. However, the model building phase can be highly time-consuming, and models are generally hard to tune, sensitive to data changes, and highly dependent on the application domain. In the literature different approaches can be found: based on linear algebra methods [5, 18], clustering [24], On the other hand, query suggestion techniques address the problem of recommend queries to users of a search engine. The techniques adopted are very different, but most of them are based on query log data, such as click-through data information [29]. Some of these employ clustering algorithms to determine queries that lead to similar documents [26, 2], or are focused on mining association rules from query logs [7]. Others employ reformulation of queries by previous users [12], or even let users choose the techniques to measure similarity among queries [28]. Query expansion techniques [6] have also been used. The usage of query logs to evaluate the relevance of a given result, focused on improving the search result ranking, is tightly related with this problem. Existing techniques take into account information such as the document position, user click behavior, time user spends on a page, etc. Traditional approaches usually extract the information from a single query at a time [1], although there are several works that take into account the chains of queries that belong to the same search process [17]. In the context of collaborative web search [22], it has been proposed the suggestion of repeated queries that lead to similar relevant results [3]. The idea we present in this paper takes a completely new approach. First, we infer the relevance of a query based on whether it successfully ends a search session (i.e. if the query is useful to find the information the user is searching for). This relevance measure, extracted from query log data, can be used to fill some cells in the rating matrix typically used by collaborative filtering algorithms. Thus, such effective technique can be applied to the recommendation of queries. Successful sessions [23, 22] have already been taken into account as a way o evaluate search result promotions. In this paper, instead, Satisfactory searches (as opposed to successful sessions) are taken as the key factor in building query recommendations. 3. SEARCH SHORTCUTS 3.1 Problem description and motivation The aim of the SSP is to recommend queries that allowed users, in the past, to successfully find the information they were looking for using a similar search process. For example, let us suppose a (sufficiently) high number of users have queried the engine for q 1, q 2, q 3, and finally, after asking for q 4, they found the information they needed. Therefore, we can consider query q 4 relevant for users interested in topics related to q 1, q 2, and q 3. Whenever another user starts to search for topics related to q 1, q 2, or q 3, the query q 4 will be proposed as a shortcut. Obviously, the earlier a relevant shortcut is shown during the user session, the more effective it has to be considered. Moreover, a shortcut for a session may not need to be the last query submitted: a query that will anticipate the positive ending of a session is still acceptable. The more a shortcut (potentially) reduces the length of a session the more important it is. Of course, our problem formulation defines a collaborative approach to query suggestion: the main idea is to exploit the experience of previous users to drive other users with similar information needs in the right direction. We have validated such assumption by analyzing the actual behavior of web search engine users recorded in the AOL query log [14]. This query log is composed of approximately 36 million query records, containing about 20 million web queries collected from about 650,000 users over three months (March-May 2006). From the query log we extracted search sessions, taking into account that a session expires if the user spends, at least, 30 minutes without any action in the search engine. This is a limit commonly used in query log analysis [27]. As a result, we extracted 10,765,687 search sessions with an average length of 2,03 queries. From these search sessions, we considered only the sessions involving two or more queries (4,339,683 sessions with an average length of 4.88 queries/session) and we grouped together all the sessions starting by the same query. The sessions with just one query were removed because they ob- 78 Percentage Definition 1. A query session for a user u U is a sequence of queries u has submitted to a search service looking forward to satisfying an information need. In symbols, σ u = q u 1... q u n . It is assumed that all queries in the session are related to the same user search task. We will drop the superscript u in the specification of σ, e.g. σ = q 1... q n , whenever u is clear from the context. The i-th element of a session is denoted by σ i. The set of all sessions is defined as S. Definition 2. We define a function c : S [1..n] {0, 1} as c (σ, i) = 1 if in σ the user has clicked on at least a result shown in the result page for σ i. If S n and i S then, by definition, c (σ, i) = Session paths ordered by popularity (log) Figure 1: Percentage of satisfactory query paths in the AOL query log, by popularity of the first query. viously do not contain any query path. We want to analyze the query paths followed by different users who started with the same query, assuming they could have the same information need. Some of the users may follow a right path and end up after visiting some documents proposed by the search engine, but others may end the session without visiting any result. Therefore, for each of these sessions, we examined the final query and checked if the query was successful or not: a query was considered successful if the user clicked, at least, one result, unsuccessful otherwise. The sessions that ended with a successful final query are considered satisfactory query paths. Summarizing, we found 140,165 different initial queries (not unique), which are the starting point for several sessions (at least two). Considering these query paths, 64% of the sessions were satisfactory, while 36% ended with a failed query. In Fig. 1 we show the percentage of satisfactory query paths, sorted by the logarithm of the rank of the initial queries (similarly to the common Zipf graph). In the left hand side of the figure, which represents the query paths associated with the most frequent initial queries, we can see that the majority of sessions ended successfully, but at the same time there are several sessions ended up without clicking on any document, although they started at the same point. This shows that the information provided by the satisfactory sessions could lead the failed sessions to a successful ending point, which is the main goal behind this work. For completeness, the right hand side contains the less frequent queries and the graph represents the dots in the common fractions, which produces its particular shape. For example, for all the initial queries repeated twice, we have a dot at: 2/2, 1/2 and 0/2; for the initial queries repeated three times, we have a dot at: 3/3, 2/3, 1/3 and 0/3; and so on. In this case, the potential benefits are less clear because the experience that could be extracted from similar query paths is more reduced. 3.2 Theoretical model and evaluation metric Let U = {u u is a user} be the set of all users of a search engine. Let Q = {q q is a user query} be the set of all the queries submitted by the users of a search engine. Definition 3. We define a session σ = q 1... q n of length n, to be satisfactory if and only if c (σ, n) = 1, unsatisfactory otherwise. Definition 4. We define a k-way shortcut, with 1 k Q, as a function h : S 2 Q taking as argument a session and returning a set of queries of cardinality h (σ) k. H is the set of all possible shortcut functions. Definition 5. The head σ t of σ = q 1... q n up to t n is the sequence of the first t queries in σ, i.e. σ t = q 1,..., q t Definition 6. The tail σ t of σ = q 1... q n from t n is the sequence of the last n t queries in σ, i.e. σ t = q t+1,..., q n Definition 7. Let σ = q 1... q n be a satisfactory session. The similarity of a k-way shortcut h on a head σ t and a tail σ t is defined as s `h `σ t, σ t = P n t P q h(σ t ) m=1 ˆq = `σ t h(σ t ) m f (m) Where f ( ) is a monotonic increasing function. The function [q = σ m] = 1 for 1 if and only if the query q is equal to the query σ m. The similarity function defined in (1) can be used as an objective evaluation measure for the SSP. For example, to evaluate the effectiveness of a shortcut function h on S, the sum or average of s on all sessions in S can be computed. Note that the similarity function can be rewritten to include the function c to give importance only to those queries that had a result clicked. It is worth noticing, that the main difference between query shortcuts and query suggestion is actually represented by the function ˆq = `σ t m in equation (1). By relaxing the strict = requirement `σ t and by replacing `σ t it by a similarity relation, i.e. ˆq m (that is ˆq m = 1 if and only if the query q is similar to the query σ m) the problem reduces, basically, to query suggestion. By defining an appropriate similarity function the equation in (1) can be used to evaluate query suggestion effectiveness as well. Finally, we should consider the influence the function f (m) has in the definition of scoring functions. Actually, depending on how f is chosen, different features of a shortcut generating algorithm may be tested. For instance, by setting (1) 79 f (m) to be the constant function f (m) = c, we measure simply the number of queries in common between the query shortcut set and the queries submitted by the user. A nonconstant function can be used to give an higher score to queries that a user would have submitted later in the session. The exponential function f (m) = e m was chosen to assign an higher score to shortcuts suggested early. Smoother f functions can be used to modulate position effects. 4. SEARCH SHORTCUTS THROUGH COL- LABORATIVE FILTERING As metioned above, several algorithms can be applied to the SSP. In this work, we have studied the application of traditional Collaborative Filtering techniques. Those techniques have obtained very good results in other recommendation domains, also based on the experiences from previous users, so they seem suitable for our purposes. Collaborative filtering deals with a set of users U, and a set of of items I. User preferences are taken into account as item ratings, a numeric value representing the utility of an item to a given user. The subset of valid ratings is denoted as R. Ratings can be explicitly introduced by users, or implicitly extracted from user interaction (e.g. from query log data). Preferences for all users are stored in a user-item matrix, known as the rating matrix V. Each entry v ui of V represents the rating of user u for item i, with v ui R { }, where { } indicates that the user has not rated the item yet. Thus, to apply collaborative filtering to the SSP, we need to fill such matrix with the information in the query log data. First, the concepts of the SSP (users, queries, terms and sessions) have to be mapped to the pure collaborative filtering problem (users and items). As the goal in the SSP is to recommend queries for a given session, it seems reasonable to treat each session as a user, and each query as an item. Second, the query ratings must be inferred from the information in the query log. As a preliminary approach, in this work we rate the queries focusing in the last query of each session. If such last query was successful (the user has clicked at least one result), then a positive rating (10.0) is given to the query. Otherwise, it is given a negative rating (0.0). All remaining queries are considered neutral (5.0). Finally, we should note that this particular problem offers some important challenges to traditional collaborative filtering algorithms. In fact, domains where they have been usually applied are much more dense than a query log, i.e., they have much more relations (ratings) between users and items. For example, in an electronic commerce site most products have been rated or purchased by several customers. Thus, we can obtain information about an item from several users. However, in query session logs there are many queries that only appear in a single session. This lack of information is the well-known sparsity problem [11]. In addition, web search query logs usually contain much more data than t
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!