Description

A survey on Learning to Rank (LETOR) approaches in information retrieval

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD
–
382 481, 08-10 DECEMBER, 2011
1
Abstract
--In Recent years, the application of machine learning approaches to conventional IR system evolve a new dimension in the field. The emphasis is now shifted from simply retrieving a set of documents to rank them also for a given query in terms of
user’s need. The researcher’s task is not only to
retrieve the documents from the corpus but also to rank them in order of
their relevance to the user’s requirement. To improve the
system’s performance is now the hot area of research. In this
paper, an attempt has been made to put some of most commonly used algorithms in the community. It presents a survey on the approaches used to rank the retrieved documents and their evaluation strategies.
Index Terms
--Machine Learning, Learning to Rank (LETOR), Information Retrieval.
I.
INTRODUCTION
O
rank all the relevant documents form the available corpus for a given user query in accordance with their relevance is the central problem in the field of Information Retrieval (IR). This problem bring attraction of the researchers in the sense that what user needs in given query and what will be the best possible order of the retrieved documents. Ideally, the best matching document must be on the top and least matching document must be in the bottom. In general, it is a difficult task to identify semantics of the given query. The conventional IR approaches such as Boolean Model, Vector Space Model (VSM), BM25 etc. places the relevant documents on the top of the first non-relevant document in the list. The use of machine learning approaches make it possible to find out relevance between the relevant documents in context of given user query and place them in order of their relevance on the top of first non-relevant document in the list. In general, those methods that use machine learning (ML) technologies to solve the problem
of ranking can be named as “Learning To
Rank (LeToR)”
methods. The increase in the resources leads the attention of Industry and academics to focus on the problem to efficiently order the relevant documents. To deal with this in recent years, number of conferences, workshop and challenges has been
organized namely, SIGIR, ICML, NIPS, Yahoo‟s
Learning to
Rank Challenge 2010, Microsoft‟s Learning to
Rank Group and so on. Learning to Rank (LeToR) in context of document retrieval is a task of building a ranking model as shown in Figure I. Suppose some queries are given with their relevant documents are the relevance judgments. Here ranking model should be trained in such a way that it should produce a correct ranklist. So given a new query (as a test query) with documents, model should predict the correct rank of the documents. In LeToR, unlike conventional unsupervised ranking models more features can be utilized and automatically be learnt to combine these features for better rank prediction [15]. The parameter tuning with machine learning approaches makes it easier in comparison with their traditional counterparts [16]. It also provides facility to combine multiple evidences and avoid over-fitting of parameters by regularization methods. In practice, the ranking problem may be reduced to finding an appropriate scoring function that can evaluate individual documents. On the basis of those scoring values, documents can be arranged in descending way [12]. Many learning to rank algorithms proposed typically minimize a loss function loosely related to the IR measures [17]. The idea is to maximize the accuracy of a ranking model in terms of an IR measures on training data. The objective of this paper is to discover the concept of machine learning approaches in context of Information Retrieval explore their utility in this field and present success achieved so far. The paper is organized as follows: Section II provides the overview of the most commonly used learning to rank approaches in accordance to their nature. The evaluation measures used for LeToR are mentioned in section III. Section IV outlines the details of the datasets used in the community. Section V provides glimpse of winning strategies
of Yahoo!‟s
Learning To Rank Challenge and section VI concludes the manuscript. II.
APPROACHES
IN
LETOR
Learning to rank, when applied to document retrieval, is a task as follows. Assume that there is a collection of documents. In retrieval (i.e., ranking), given a query, the ranking function assigns a score to each document, and ranks the documents in descending order of the scores. The ranking order represents the relevance of documents with respect to the query. In learning, a number of queries are provided; each query is associated with a perfect ranking list of documents; a ranking function is then created using the training data, such that the model can precisely predict the ranking lists in the
A Survey on Learning To Rank (LETOR) Approaches in Information Retrieval
Ashish Phophalia Dhirubhai Ambani Institute of Information and Communication Technology
T
INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE –
2011‟
2 training data. This learning is done in supervised manner.
Fig. 1. Possible architecture of LeToR system [26]
A.
Pointwise Approaches
The point-wise method is the earliest method the researches consider. The conventional and simple idea of ordinal regression is to map the ordinal scales into numeric values, and then solve the problem as a standard regression problem. In this approach, a document - query pair is considered in the training phase. In [2], a Support Vector Machine (SVM) based approach is used to tackle the problem of ordinal regression. It tries to formulate in classification problem using large margin principle used in SVM and proposed fixed margin and sum of margin strategy. In context of web search, McRank approach was proposed in [3]. To learn the class probabilities, a boosting tree algorithm for multiple classification is implemented and for multiple ordinal classification and regression also. Let the estimated class probabilities are
k i
p
,
ˆ
and a document is to be classified in
K
classes. It defines scoring function as follows
K k k ii
k T pS
0,
)(
ˆ
(1) where
T(k)
is some monotone (increasing) function of the relevance level
k
. Once the score
i
S
for all data points is computed, then sort the documents within each query by the descending order of
i
S
. The RankProp [7] is an early model to consider the ranking problem. RankProp uses a neural net having two phases: an MSE regression on the current target values, and an adjustment of the target values themselves to reflect the current ranking given by the net.
B.
Pairwise Approaches
This approach takes document pairs as instances in learning, and formalizes the problem of learning to rank as that of classification. Specifically, in learning it collects document pairs from the ranking lists, and for each document pair it assigns a label representing the relative relevance of the two documents. It then trains a classification model with the labeled data and makes use of the classification model in ranking. The known classification methods can directly be deployed here. But the availability of document pairs for the given query in the training phase make this approach biased toward that query. In [4], Learning to Rank problem is attacked using boosting method called as
RankBoost
algorithm. It is based on AdaBoost method [5] where emphasis is to boost up weak hypothesis. A gradient descent method is used to solve using neural network in [6], called RankNet. The authors proposed a simple probabilistic cost function, and to minimize that function, RankNet is implemented by combining gradient descent method and neural network. Let the score of data points
i
x
and
j
x
be
))((
ii
x f s
and
))((
j j
x f s
respectively. For a given query, let
}1,0,1{
ij
S
be defined to be +1 if document
i
has been labeled to be more relevant than document
j
, -1 if document
i
has been labeled to be less relevant than document
j
, and 0 if they have the same label. Then the cost function in this approach is defined as follows:
)1log()()1(
21
)(
ji
s s jiij
e s sS C
(2) LambdaRank [10] is another ranking algorithm based on RankNet. In information retrieval models, the cost function in ranking is either flat, or discontinue. So the derivatives of the cost function with respect to the model parameters are undefined, and it is difficult to tackle the cost in the common optimization methods. Usually, construct the implicit cost to approximate the target cost. In LambdaRank, the authors prove that LambdaRank can solve the problem above if the implicit cost is convex and smooth, and the speed of LambdaRank is fast than RankNet in experiments [22]. The approach used in [23] considers the idea of the multiple nested ranker also based on RankNet. As it is known, the accuracy at the top ranking is crucial in IR. The authors perform an iterative algorithm by re-ranking the top documents. At each iteration, the part of the top documents is re-ranked using RankNet. In this way, the number of the retrieved documents decreases and the accuracy of the top ranked documents can be improved [22]. The authors in [25] proposed a ranking method based on SVM involving difference of document pair, called as RankSVM. Consider a class of linear ranking functions
),(),()(),(
jiw ji
d qwd qwq f d d
(3) where
w
is a weight vector that is adjusted by learning.0
),(
d q
is a mapping onto features that describes the match between query
q
and document
d
. The convex optimization problem in this is defined as follows:
INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD
–
382 481, 08-10 DECEMBER, 2011
3
k ji jk ik
d qd qw
,,
1)),(),((
(4) Since this involves the pairwise difference in classification, decomposition algorithms can be applied in solve convex optimization problem.
C.
Listwise Approaches
The work in the list-wise method is not as much as the two ones above, but it seems to be promising method among the three. In the first two methods, the models or algorithms are valid and effective on the assumption that all the document pairs or all the document points have certain features or attributions, just like other machine learning models. For example, a classifier can identify whether a fruit is an apple or an orange based on the attributions such as the color, the size. However, feature selection in IR is not unbiased. The features depend on the queries, and the queries vary greatly. That is to say, some of the document pairs or points even are not comparable with each other. Another disadvantage is the loss function in the first two frameworks not accords with the evaluation measures. As for these problems, the list-wise approach comes out. In [1], authors has proposed a new learning method for optimizing the listwise loss function based in the top k probability, with neural network as model and gradient descent as optimization algorithm. It is called as ListNet. ListNet is similar to RankNet. The only major difference lies in that the former uses document lists as instances while the latter uses document pairs as instances;the former utilizes a listwise loss function while the latter utilizes a pairwise loss function. The LambdaMART [10] combines MART and LambdaRank. MART is a boosted tree model in which output of the model is linear combination of the outputs of a set of regression trees. Since MART models derivatives and LambdaRank works by specifying the derivatives at any point during training. The motivational idea behind BoltzRank approach [11] is to define a probability distribution over document permutations, and consider the expectation of the target performance measure under this distribution, then it should be possible to propagate the derivatives and update the parameters that govern the scoring function to maximize the expectation. It defines conditional Boltzmann distribution over document permutations. The scoring function proposed in this consists of two potentials: individual potential
and pairwise potential
.
operates on single document and assigns absolute scores to them without considering any relative information.
takes as input pairs of documents and predicts the relative difference in score of the two documents in each pair. The final score for any given document
j
d
is then computed in following way:
jk k k j j j
d d d Dqd f
,
),()(),|(
(5) The authors in [12] tries to minimize the Bayes Risk in decision making for the IR system. They called it as BayesRank which directly optimizes the Bayes Risk related to the ranking accuracy in terms of the IR evaluation measures. It uses Plackett-Luce Model as probability model of permutations. A multilayer perceptron neural network is designed for learning BayesRank with NDCG related permutation loss. To approach this problem, FRank algorithm is proposed in [13] which is based on the concept of fidelity from physics. The fidelity is defined as follows
x x x x x
q pq p F
),(
(6) where
x
p
and
x
q
are the probability distributions. The geometric interpretation of the fidelity is that the fidelity is the inner product between vectors with components
x
p
and
x
q
which lies on unit sphere.
))1)(1((1
**
ijijijijij
p p p p F
(7) and the fidelity loss of all queries can be estimated as
ijijq
F q F
#1
(8) where
*
ij
p
is the given target value of the posterior probability and
ij
p
is the modeled probability. To find the similarity between estimated output and available ground truth result, Rank Cosine approach is proposed in [14]. Let
n(q)
is the documents for query
q
and ground truth ranking list for this query is
g(q)
. Let denote the output of a learning machine for query
q
as
H(q)
. The ranking loss for every loss for every query q is as follows:
)()(
))(,)((
121)))(),(cos(1(
21))(),((
q H p g
q H q g
q H p g q H p g L
T
(9) where
.
is
2
L
norm of a vector. The goal of learning then turns out to minimize the total loss function over all training queries.
Qq
q H p g L H L
))(),(()(
(10)
INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE –
2011‟
4 III.
EVALUATION
MEASURES
A.
Winner Takes All (WTA)
In this measure, only the first retrieved document is considered for the given query q. If the first retrieved document is relevant with the query q then
WTA(q) = 1
, otherwise it will be zero. It can be averaged over all the queries.
B.
Mean Average Precision (MAP)
Precision at position k for query q:
k ltstotopkresudocument ofrelevant No
k P
sin.@
(11) Average precision for query q:
documentsofrelevant No Lk P AP
k k
..@
(12) Mean Average Precision (MAP) is defined as average over all queries.
C.
Normalized Discounted Cumulative Gain (NDCG)
Recently, a new evaluation measure called Normalized Discounted Cumulative Gain [16] has been proposed, which can handle multiple levels of relevance judgments. While evaluating a ranking list, NDCG follows two rules:
Highly relevant documents are more valuable than marginally relevant documents.
The lower ranking position of a document (of any relevance level), the lesser value for the user, because it is less likely to be examined by user. According to the above rules, the NDCG value of a ranking list at position n is calculated as follows:
k j jck
j Z k NDCG
1)(
)1log(12@
(13) where
c(j)
is the rating of the
th
j
document in the ranking list, choosing the normalization constant
n
Z
in such a way that the ideal list gets a NDCG score of 1.
D.
Mean Reciprocal Rank (MRR)
This measure assumes only a single relevant document and defined as the inverse of the position of that document in the rank list.
E.
Expected Reciprocal Rank (ERR)
This assumes that user will be satisfied up to the
th
r
ranked document in the list and will not go further in the rank list. It can be defined as
)(1
1
r P r ERR
nr
(14) where
P(r)
is the probability that user will stop at position
r
and will not check any document after that. IV.
DATASETS
USED
FOR
LETOR In the case of supervised learning, for the training phase benchmark data has to be there to build the ranking function. Initially, researchers started working on different datasets and reported their results. The most commonly used were NP2003, TD2003, TD2004. The details of these datasets can be found in [9]. After 2005, it was realized to have a common benchmarking dataset and let researchers to evaluate their algorithm on the same platform. At the same time, industry took keen interest and started to release their internal dataset which is used to build ranking function1. In this, Yahoo! and Russian search engine Yandex has released their dataset. In conjunction to this, Microsoft Research also came up with their datasets. Table I shows available dataset used by community after 2008. Table II shows the statistics of the dataset released by Yahoo! for Learning To Rank Challenge in 2010.
TABLE
1
STATISTICS
OF
AVAILABLE
DATASETS Queries Docs Rel. Features Year Letor 3.0-Gov 575 568k 2 64 2008 Letor 3.0-Ohsumed 106 16k 3 45 2008 Letor 4.0 2476 85k 3 46 2009 Yandex 20267 213k 5 245 2009 Yahoo! 36251 883k 5 700 2010 Microsoft 31531 3771k 5 136 2010 TABLE
2
DATASET
RELEASED
BY
YAHOO!
FOR
LEARNING
TO
RANK
CHALLENGE
IN
2010.
THE
FIRST
ONE,
NAMED
SET
1,
ORIGINATES
FROM
THE
US,
WHILE
THE
SECOND
ONE,
SET
2,
IS
FROM
AN
ASIAN
COUNTRY.[8] SET 1 SET 2 Train. Valid. Test Train. Valid. Test Queries 19,944 2,994 6,983 1,266 1,266 3,798 Docs 4,73,134 71,083 1,65,660 34,815 34,815 1,03,174 Features 519 596
V.
YAHOO!‟S
LEARNING
TO
RANK
CHALLENGE
[8]
A.
Overview of the challenge
Yahoo!‟s LeToR challenge took place from March to May
2010. There were 2 tracks organized: a standard learning to rank track and a transfer learning track. The goal of second track was to learn to a ranking function for a small country by leveraging the larger training set of another country. The winning algorithms are presented in a workshop at 27
th
International Conference on Machine Learning (ICML 2010)
INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD
–
382 481, 08-10 DECEMBER, 2011
5 in Haifa, Israel. The baseline results are reported in [8] and uses NDCG and ERR as evaluation measures on the datasets. (The details of dataset released for this challenge is given in the previous section)
B.
Overview of winning algorithms
The top performed algorithm used a linear combination of 12 ranking models, 8 of which were LambdaMART boosted tree models, 2 of which were LambdaRank neural nets, and 2 of which were logistic regression models. But it does not justify that why this linear combination of 12 methods should work. Eric Gottschalk and David Vogel first processed the datasets to create new normalized features. The srcinal and derived features were then used as inputs into a random forest procedure. Multiple random forests were then created with different parameters used in training process. The out-of-bag estimates from the random forests were then used in a linear regression to ensemble the forests together. For the final submission, this ensemble was blended with a gradient boosting machine trained on a transformed version of the dependent variable. They stood on second position in track 1. Dmitry Pavlov and Cliff Brunk tested a machine learning approach for regression based on the idea of combining bagging and boosting called BagBoo. The model borrows its high accuracy potential from Friedman
‟s gradient boosting,
and high effciency and scalability through parallelism from
Breiman‟s bagging. It often
achieves better accuracies than bagging or boosting alone. For the transfer learning track, they combined the datasets in a way that puts 7 times higher weight on set 2. Daria Sorokina also used the idea of combining bagging and boosting in an algorithm called Additive Groves. Igor Kuralenok proposed a novel pairwise method called YetiRank
[19] that modifies Friedman‟s gradient boosting
method in the gradient computation part. It also takes uncertainty in human judgments into account. Ping Li recently proposed Robust LogitBoost to provide a numerically stable implementation of the highly influential LogitBoost algorithm for classifications. Unlike the widely used MART algorithm, (robust) LogitBoost use both the first and second-order derivatives of the loss function in the treesplitting criterion. The five-level ranking problem was viewed as a set of four binary classification problems. The predicted class probabilities were then mapped to a relevance score as in. For transfer learning, classifiers were learned on each set and a linear combination of the class probabilities from both sets was used. Geurts and Louppe experimented with several tree-based ensemble methods, including bagging, random forests, and extremely randomized trees, several (pointwise) classification and regression-based coding of the relevance label, and several ranking aggregation schemes. The best result on the first track was obtained with the extremely randomized trees in a standard regression setting. On the second transfer learning track, the best entry was obtained using extremely randomized regression trees built only on the set 2 data. While several attempts at combining both sets were somewhat successful when cross-validated on the training set, the improvements were slight and actually not confirmed on the validation set. Busa-Fekete [20] used decision trees within a multi-class version of AdaBoost while Mohan [21] tried various combinations of boosted decision trees and random forests for the transfer learning track.
TABLE
3
WINNERS
OF
TRACK
1
ALONG
WITH
THEIR
ERR
SCORES
OF
PRIMARY
SUBMISSIONS 1 C. Burges, K. Svore, O. Dekel, Q. Wu, P. Bennett, A. Pastusiak and J. Platt (Microsoft Research) 0.46861 2 E. Gottschalk (Activision Blizzard) and D. Vogel (Data Mining Solutions) 0.46786 3 M. Parakhin (Microsoft) -Prize declined 0.46695 4 D. Pavlov and C. Brunk (Yandex Labs) 0.4667 5 D. Sorokina (Yandex Labs) 0.4661 TABLE
4
WINNERS
OF
TRACK
2
ALONG
WITH
THEIR
ERR
SCORES
OF
PRIMARY
SUBMISSIONS 1 I. Kuralenok (Yandex) 0.46348 2 P. Li (Cornell University) 0.46317 3 D. Pavlov & C. Brunk (Yandex Labs) 0.46311 4 P. Geurts (University of Liege) 0.46169
VI.
CONCLUSION Applying machine learning technique to ranking in IR has become an important research problem. This paper summarizes some of the most commonly used algorithms and evaluation measures in learning to rank problem. One of the sections is also dedicated to the available datasets on which now researcher prefers to work. Now industry and academics putting forward their efforts to build a common platform to find the best solution to this problem. Although, some of the algorithms have been adopted in search engines but still no algorithm can resolve all the queries. Most of the researchers reported that Listwise approach perform better. There are some possibilities to work with in domain, for example, a ranking function should be capable of understanding the behavior of the query i.e. it may be random in nature, may hard or soft query and treats accordingly from the available information in the training phase. A nice attempt is made in [24] to put forward the questions still left to be answered by researchers. Along with this, the maximum achievable scores in LeToR on different datasets by various algorithms is in range 0:4
0:5, but ideally should be near to 1:0. So, still half of the path is not yet explored by any machine learning

Search

Similar documents

Related Search

A survey on context-aware systemsLearning to RankMTI Education in Light of a Survey on EmploymA survey on logical formalisms dealing with dA Survey on Language Use, Attitudes, and IdenA Study on Reducing the Root Fillet Stress inUse of the h index to rank scientific Latin A•\tLife-long learning and learning to learn aIntegrated Systems of Survey on a City and BuI Have to Deliver a Presentation on This Topi

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks