Parameter Sensitivity in the Probabilistic Model for Ad-hoc
Retrieval
Ben He
Department of Computing Science
University of Glasgow
Glasgow, The United Kingdom
ben@dcs.gla.ac.uk
ABSTRACT
The term frequency normalisation parameter sensitivity is
an important issue in the probabilistic model for Information Retrieval. A high parameter sensitivity indicates that
a slight change of the parameter value may considerably affect the retrieval performance. Therefore, a weighting model
with a high parameter sensitivity is not robust enough to
provide a consistent retrieval performance across different
collections and queries. In this paper, we suggest that the
parameter sensitivity is due to the fact that the query term
weights are not adequate enough to allow informative query
terms to differ from non-informative ones. We show that
query term reweighing, which is part of the relevance feedback process, can be successfully used to reduce the parameter sensitivity. Experiments on five Text REtrieval Conference (TREC) collections show that the parameter sensitivity
does remarkably decrease when query terms are reweighed.
Categories and Sub ject Descriptors: H.3.3 [Information Storage and Retrieval]: Retrievalmodels;General
In Information Retrieval (IR), it is a crucial issue to rank
retrieved documents in decreasing order of relevance. A recent survey on the query logs from real Web search engine
users concluded that users rarely look beyond the top returned documents [16]. Therefore, it is important to rank
the highly relevant documents at the top of the retrieved list.
Usually, the document ranking is based on a weighting model.
In particular, most weighting models apply a term frequency
(tf ) normalisation method to normalise term frequency, i.e.
the number of occurrences of the query term in the document.
Var i o u s tf normalisation methods have been proposed in
the literature, e.g. the pivoted normalisation [24] in the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CIKM’07, November 6–8, 2007, Lisboa, Portugal.
Copyright 2007 ACM 978-1-59593-803-9/07/0011 ...
$5.00.
Iadh Ounis
Department of Computing Science
University of Glasgow
Glasgow, The United Kingdom
ounis@dcs.gla.ac.uk
vector space model [23], the normalisation method of the
BM25 weighting model [22], normalisation 2 [1] and normalisation 3 [1, 14] in the Divergence from Randomness (DFR)
framework [1]. All the aforementioned normalisation methods normalise term frequency according to document length,
i.e. the number of tokens in the document. Each of the
aforementioned normalisation methods involves the use of a
parameter. The setting of these parameter values usually
has an important impact on the retrieval performance of an
IR system
mance of a weighting model is sensitive to a slight change of
its parameter value, the weighting model may not be robust
enough to provide consistent retrieval performance. This is
referred to as the parameter sensitivity issue.
In a practical IR context, parameter sensitivity is a very
important issue. Since relevance assessment and training
data are not always available in a practical environment, it
is crucial to ensure that the parameter value used provides
a robust retrieval performance. The parameter sensitivity
issue has been previously studied in the context of the language modelling approach [18, 26, 27]. In addition, several weighting models that are less sensitive than the classical ones were generated in an axiomatic approach based
on parameter constraints [8, 9]. Nevertheless, little work
has been done to actually reduce the parameter sensitivity
of a weighting model. In this paper, we base our study
on the classical BM25 probabilistic model, and the PL2
model of the Divergence from Randomness (DFR) probabilistic framework. These two models have been shown to
be effective in various previous TREC experiments [7].
The main contributions of this paper are two-fold. First,
we provide a better understanding and explanation of the
parameter sensitivity. We argue that parameter sensitivity
is caused by the existence of non-informative terms in the
query. Second, we show that parameter sensitivity can be
reduced by applying query term reweighing, which is part
of the relevance feedback.
The rest of this paper is organised as follows. Section 2
introduces the related work, including the BM25 and PL2
models, and previous research on the parameter sensitivity
issue. Section 3 provides an explanation for the manifestation of the parameter sensitivity and suggests to apply an
1
For instance, training a retrieval system using the PL2
weighting model [1] on TREC 10 ad-hoc queries, gives an
MAP of 0.2397 on the TREC 9 queries, compared to an
MAP of 0.2174 using the default (c = 1) setting. This difference is statistically significant (p<=0.0009).
1
[5, 14, 15]. In particular, if the retrieval perfor-
263
appropriate query term reweighing to reduce the parameter
sensitivity. Section 4 describes the experimental setting and
methodology, and Section 5 provides analysis and discussion
on the experimental results. Finally, Section 6 concludes on
the paper and suggests possible future research directions.
2.RELATED WORK
In this section, we introduce the BM25 and PL2 models
in Section 2.1, and briefly describe previous research on the
parameter sensitivity issue in Section 2.2.
2.1The BM25 and PL2 Probabilistic
Weighting Models
As one of the most established weighting models, Okapi’s
BM25 computes the relevance score of a document d for a
query Q by the following formula [22]:
distribution is modelled by an approximation to the Poisson distribution with the use of the Laplace succession for
normalising the relevance score. Using the PL2 model, the
relevance score of a document d for a query Q is given by:
score(d, Q)=
X
qtw ·
t∈Q
· log
tfn +1
e +0.5 · log2(2π · tf n)
2
1
`
tfn · log
tfn
2
λ
´
+(λ − tf n)
where λ is the mean and variance of a Poisson distribution.
It is given by λ = F/N. F is the frequency of the query
term in the collection and N is the number of documents
in the collection. The query term weight qtw is given by
qtf/qtf
; qtf is the query term frequency. qtf
max
max
is the
maximum query term frequency among the query terms.
The normalised term frequency tfn is given by the so-
called normalisation 2:
(3)
score(d, Q)=
X
t∈Q
(k1+1)tfn
(1)
w
k1+ tfn
· qtw(1)
where
(1)
is the idf factor, which is given by:
• w
=log
N − Nt+0.5
2
Nt+0.5
(1)
w
where N is the number of documents in the whole
collection. N
is the document frequency of term t,
t
i.e. the number of documents containing t.
• qtw is the query term weight that is given by
+1)qtf
(k
3
k3+ qtf
where qtf is the number of occurrences of the given
term in the query. k
= 1000 [22].
is k
3
is a parameter. Its default setting
3
• tfn is the normalised term frequency of the given term
t. k
is a parameter. Its default setting is k1=1.2 [22].
1
The tf normalisation component of the BM25 formula is:
where l and avg
tfn =
tf
(1 − b)+b ·
l
avg l
l are the document length and the aver-
(2)
age document length in the collection, respectively. tf is
the term frequency in the document. The document length
refers to the number of tokens in a document. b is a parameter. The default setting is b =0.75 [22]. Singhal et al.’spivoted normalisation, for normalising the tf · idf weight
in the context of the vector space model [24], can be seen
as a generalisation of the above BM25’s tf normalisation
component.
PL2 is one of the Divergence from Randomness (DFR)
document weighting models [3]. The idea of the DFR models is to infer the importance of a query term in a document
by measuring the divergence of the term’s distribution in the
document from its distribution in the whole collection that
is assumed to be random. In the PL2 model, this random
avg
tfn = tf · log
(1 + c ·
2
where l is the document length and avg
l
), (c>0)(4)
l
l is the average
document length in the whole collection. tf is the original
term frequency. c is the parameter of normalisation 2. Its
default setting is c = 7 for short queries and c =1forlong
queries [1].
2.2Previous Work on Parameter Sensitivity
The parameter sensitivity issue has drawn the attention of
several previous research. Zhai & Lafferty addressed the parameter sensitivity issue of the smoothing technique for language modelling. They found that query length, the number
of unique terms in a query, has a considerable impact on the
parameter sensitivity. In particular, the parameters of the
smoothing methods are very sensitive for long queries [26,
27]. Similar findings were also observed for the BM25 and
PL2 weighting models [13]. Moreover, Fang et al. generated
some weighting models that are less sensitive than current
one, using their axiomatic approach based on a set of parameter constraints [8, 9].
A quantitative analysis of the parameter sensitivity was
conducted by Metzler in [18]. Two measures, namely Entropy (H) and Spread (S), were proposed to define the parameter sensitivity. The Entropy (H )measureisgivenas
follows:
H = −ZP (opt, x)log
where P (opt, x) is the probability of the parameter value x
being the optimal one. In [18], P (opt, x) is computed using
Bayes’s rule.
Spread measures the flatness of a posterior distribution
over a set of parameter values. It is given as follows:
S = m(max,X) − m(min, X)(6)
where m(max,X)(resp. m(min, X )) is the maximum (resp.
minimum) posterior over a set X of parameter values. For
example, if the retrieval performance evaluation measure
used is the mean average precision (MAP), m(max, X)is
the maximum MAP, and m(min, X) is the minimum one.
In practise, the lower Spread or Entropy is, the lower parameter sensitivity an IR model has. In addition, a low
P (opt, x)(5)
2
264
Spread is preferred over a low Entropy in order to ensure a
low parameter sensitivity [18].
The above mentioned research either addressed or quantitatively analysed the parameter sensitivity issue. Nevertheless, little work has been actually done to reduce the
parameter sensitivity of a weighting model. In the next section, we explain the parameter sensitivity issue, and show
how to apply query term reweighing to reduce the parameter
sensitivity of the probabilistic model.
3.QUERY TERM WEIGHTS AND
PARAMETER SENSITIVITY
In this section, we provide an explanation for the parameter sensitivity issue. We suggest that the parameter sensitivity is caused by inadequate query term weighting. We also
propose to reduce the parameter sensitivity by reweighing
query terms.
As mentioned in the previous section, query length has
an important impact on the parameter sensitivity of tf normalisation. In particular, the parameter setting tends to be
more sensitive for long queries than for short queries [26,
27].
One of the characteristics differentiating long queries from
short queries is that long queries have much more noninformative query terms than short queries. As a consequence, for a long query, it is necessary to address the difference in the informativeness among query terms in the
weighting models. For example, in the BM25 and PL2
weighting models (see Equations (1) and (3)), a query term
weight (qtw) measure is employed to represent the relative
informativeness among query terms. Such a query term
weight measure is adequate for short queries, because a short
query usually consists of highly informative query terms.
When the query gets longer, it is “contaminated” by noninformative query terms. Although the query term weight
measure is meant to reflect the informativeness of a query
term, it accounts only for the query term frequency, and it
is still not adequate to differentiate informative query terms
from non-informative ones. In this case, a tf normalisation
parameter, providing a “harsh” normalisation, is needed to
neutralise the effect of non-informative query terms on the
document ranking. We explain the notion of “harsh” normalisation as follows.
Harter [10] and Amati [1] suggested that document length
and term frequency have a linear relationship. Such a linear relationship can be indicated by the linear correlation
between these two variables. He & Ounis suggested that
the purpose of tf normalisation is to adjust the linear dependence between document length and term frequency [14].
They also showed that document length and term frequency
are positively correlated. However, when tf normalisation
is applied, the correlation between these two variables decreases until it reaches a large negative value [14]. In Section 5, we will show that the optimised parameter value of
short queries gives a small negative correlation, and that
of long queries gives a relatively large negative correlation.
In the IR weighting models, e.g. BM25 and PL2, the relevance score usually increases with term frequency. Hence, a
large negative correlation indicates that the contribution of
each occurrence of a query term on the document ranking
decreases rapidly with document length increasing. Therefore, the smaller this correlation value is, the harsher the
tf normalisation process is. For long queries, the retrieval
performance decreases radically if the parameter value used
does not provide a harsh enough normalisation, which leads
to a notable parameter sensitivity. One way of dealing with
parameter sensitivity is to use the query term weights to
address the difference in the informativeness among query
terms. However, in current probabilistic IR models, the
query terms weights depend only on the occurrences of query
terms in the query, which is usually not adequate.
The above explanation suggests that the parameter sensitivity issue is due to the fact that the query term weights
cannot adequately reflect the informativeness of each query
term. For this problem, we hypothesise that if we reweigh
the query terms to achieve an adequate query term weighting, the parameter sensitivity can be reduced. A query term
reweighing process takes into account each query term’s distribution in one or a set of assumed highly relevant document(s), returned by the first-pass retrieval. The query
terms are reweighed accordingly. Query term reweighing is
usually considered as part of the so-called blind relevance
feedback technique [4]. In the next sections, we conduct
experiments to test the effect of query term reweighing on
reducing the parameter sensitivity.
4.EXPERIMENTAL SETTING AND
METHODOLOGY
The purpose of our experiments is to examine the impact of query term reweighing on the parameter sensitivity. In particular, as per Section 3, we expect the use of
query term reweighing to reduce the parameter sensitivity.
We introduce the experimental setting in Section 4.1, the
term weighting model used for query term reweighing in Section 4.2, and the experimental methodology in Section 4.3.
4.1Experimental Setting
We experiment on five standard TREC collections. The
five collections used are the disk1&2, disk4&5 (minus the
Congressional Record on disk4) of the classical TREC col-
2
, and the WT2G [12], WT10G [11] and .GOV2 [6]
3
. The test queries used are the TREC top-
eng.html.
collections/.
Each TREC topic consists of three fields, i.e. title, de-
• Title-only (T) queries: Only the title topic field is
used.
• Full ( T DN ) queri e s: All the three topic fields (title,
description and narrative) are used.
Related information of disk1&2 and disk4&5 of the
Relatedinformation ofthesethreeTRECWeb
lections
Web collections
ics that are numbered from 51 to 200 for disk1&2, from 301
to 450 and from 601-700 for disk4&5, from 401 to 450 for
WT2G, from 451 to 550 for WT10G, and from 701 to 850
for .GOV2, respectively (see Table 1). All the test topics
used are ad-hoc ones, which require finding as many relevant documents as possible [25].
scription and narrative. We experiment with two types of
queries with respect to the use of different topic fields. These
two types of queries are:
2
TREC collections can be found from the following URL:
http://trec.nist.gov/data/docs
3
collections can be found from the following URL:
http://ir.dcs.gla.ac.uk/test
265
Loading...
+ 7 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.