Description

A novel probabilistic quantifier fuzzification mechanism for information retrieval

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

A novel probabilistic quantiﬁer fuzziﬁcation mechanismfor information retrieval
Félix Díaz-Hermida David E. Losada Alberto Bugarín Senén Barro
Intelligent Systems GroupDepartment of Electronics and Computer ScienceUniversity of Santiago de Compostela15782 Santiago de Compostela{felixdh,dlosada,alberto,senen}@dec.usc.es
Abstract
In this work, a novel quantiﬁer fuzzi-ﬁcation mechanism is proposed. Thismethod is deeply rooted in the the-ory of probability and skips the nestedassumption for crisp representatives,which is often taken by other probabili-tistic approaches to quantiﬁcation. Thenew proposal takes into account all pos-sible crisp representatives which yieldsto a natural and intuive strategy for in-formation retrieval tasks. Furthermore,preliminar analysis of the formal prop-erties of the fuzziﬁcation mechanismpermits us to advance that the applica-tion of this method in other domains isalso promising.
1 Introduction
Fuzzy quantiﬁers have been extensively appliedin diverse ﬁelds such as expert systems, monitor-ing and control of processes, database systems,etc. [2, 12]. These fuzzy tools have played a keyrole in such domains because the linguistic state-ments a human expert uses are naturally modeledand, hence, the expressiveness of the system isenriched.Fuzzy quantiﬁers can be deﬁned in a direct way,e.g. proposing a form of combination of themembership values of the elements belonging tothe involved fuzzy set(s). Nevertheless, given acertain linguistic expression, it is often difﬁcultto achieve consensus about the most appropriatequantiﬁed deﬁnition. In order to avoid such in-conveniences, indirect deﬁnitions of fuzzy quan-tiﬁers have been introduced [9], based on the con-cept of semi-fuzzy quantiﬁer, which is a half-way point between classic quantiﬁers and fuzzyquantiﬁers. Fuzzy quantiﬁers are intuitively de-ﬁned from semi-fuzzy quantiﬁers through a so-called quantiﬁer fuzziﬁcation mechanism. Be-cause semi-fuzzy quantiﬁers are closer to thewell-known classic quantiﬁers, the implementa-tion of a linguistic expression in terms of semi-fuzzy quantiﬁers results more natural and intu-itive.In the information retrieval (IR) literature, fuzzyquantiﬁcation has been applied for designing ﬂex-ible query languages [1]. Sincethe connection be-tween query term semantics and document con-tents is inherently vague, the retrieval processcan be naturally modelled in terms of fuzzy sets.Fuzzy quantiﬁcation supplies appropriate formaltools for handling linguistic expressions whichenrich the query languages of the IR system. Thisaids users to establish additional constraints in theretrieval process (e.g. retrieved documents shouldmatch at least 3 of the query terms).The importance of fuzzy quantiﬁers for IR wasempirically demonstrated for large collections of documents in [11]. Nevertheless, this practicaldeployment of fuzzy quantiﬁers also revealed thata new class of quantiﬁer fuzziﬁcation mecha-nisms may be beneﬁcial. This motivated us todeﬁne here a new fuzziﬁcation method which isevaluated for a retrieval task. Furthermore, themost relevant properties the model fulﬁlls are pre-sented, advancing the adequacy of this new ap-proach for other domains.Therest ofthis paper isorganized asfollows. Sec-
tion 2 reports brieﬂy some background conceptson fuzzy quantiﬁcation. Section 3 explains thenew quantiﬁcation proposal and section 4 appliesthe new quantiﬁcation framework for handling in-formation retrieval. The paper ends with someconclusions.
2 Fuzzy quantiﬁers
The formal notions of classic quantiﬁer, fuzzyquantiﬁer and semi-fuzzy quantiﬁer have beenused to associate meaning to quantiﬁed sentences[9]. Formally, a classic s-ary quantiﬁer on abase or referential set
is a mapping
¡¢£¥¤ §¦©¨"!
, where
£¥¤ §¦
is the powersetof
. Throughout this work we assume the refer-ential set
to be ﬁnite, which is sufﬁcient from apractical perspective.An s-ary fuzzy quantiﬁer
#¡
on a base set
%$&('
is a mapping
#¡)¢#£0¤ 1¦2¨35476
which to eachchoice of
8@9BABABA
8¨0C#£¥¤ §¦
assigns a gradualresult
#¡¤8D9BABABA
8¨¦C476
(
#£¥¤ §¦
is the fuzzypowerset of
).In many cases, it is not easy to achieve consen-sus on an intuitive and generally applicable ex-pression for implementing a given quantiﬁed sen-tence. To overcome this problem, the conceptof semi-fuzzy quantiﬁer was introduced [9]. Asemi-fuzzy quantiﬁer is a half-way point betweenclassic quantiﬁers and fuzzy quantiﬁers, which isvery close to the idea of Zadeh’s linguistic quan-tiﬁer [14]. Semi-fuzzy quantiﬁers are similar toclassic quantiﬁers, but they allow variation of theresults in
476
. Formally, an s-ary semi-fuzzyquantiﬁer
¡
on a base set
5$&E'
is a mapping
¡F¢£¥¤ §¦¨476
which assigns a gradualresult
¡¤8@9BABABA
8¨¦C4G76
to each choice of crisp
89BABABAH
8¨C£¥¤ §¦
.Semi-fuzzy quantiﬁers are much more intuitiveand easier to deﬁne than fuzzy quantiﬁers, butthey do not resolve the problem of evaluatingfuzzy quantiﬁed sentences. In order to do soquantiﬁer fuzziﬁcation mechanisms are needed[9] that enable us to transform semi-fuzzy quan-tiﬁers into fuzzy quantiﬁers, i.e. mappings withdomain in the universe of semi-fuzzy quantiﬁersand range in the universe of fuzzy quantiﬁers:
IQPSRUTVPXWYRa`cbedgfihprqts©uvbwfyxTQP
WRa`bedgfihprqsu
2.1 Quantiﬁer fuzziﬁcation mechanisms
Several methods for evaluating quantiﬁed sen-tences have been proposed in the literature [9, 5].In [5] two models for the evaluation of fuzzyquantiﬁed sentences are proposed. These modelsshow avery consistent behaviour and arebased ona voting model interpretation of fuzzy sets [10,6].If the universe of discourse
is ﬁnite and expres-sions are unary (i.e. involve a single fuzzy set)then both models collapse into the same:
"§3BXV
2UjHk
mlnoXo"
(1)
where
&9
&
and
z9|{ABABA{}
denote the membership values in descending or-der of the elements in
to the fuzzy set
8
and
¤8¦~X
stands for the
-cut of level
of
8
, i.e.the crisp set containing the elements of
whosedegree of membership in
8
is greater or equalthan
. For the unary and binary cases, expres-sion 1 is equivalent to the quantiﬁcation modeldeﬁned in [3]
1
. Moreover, for the case of non-decreasing unary quantiﬁers, itis equivalent to thequantiﬁcation method based on ordered weightedoperators [13].In equation 1, the value
w9
can be inter-preted as the probability that
¤8¦B~X
is selectedas the crisp representative for the fuzzy set
8
.Therefore, the semi-fuzzy quantiﬁer is applied forevery crisp representative of
8
and those valuesare weighted by the probability of each crisp rep-resentative. In this formulation, the use of
-cutsmakes that, given the fuzzy set
8
, the crisp rep-resentatives
¤8¦~X
are nested.In [11], equation 1 was empirically evaluated forthe basic IR task. Although this experimentationmade evident the beneﬁts that IR might obtainfrom fuzzy quantiﬁers, it also revealed that thenested assumption may not always be appropri-ate, as it will be seen in the following sections.This motivated us to propose a novel probabilistic
1
keeping aside differences related to representative nor-malization.
method that skips the nested assumption. In sec-tion 4 we will enter into details on the adequacyof the new quantiﬁcation proposal for IR. Alongthis paper, the approach sketched in equation 1will be referred to as NVM (standing for NestedVoting Model) approach.
3 A probabilistic interpretation of fuzzysets
Given a fuzzy set
8C#£0¤ 1¦
, the process thatselects a number of elements in
to belong toa crisp representative of
8
can be viewed as arandom process in which
mutually independent
binary decisions are made (
&
¢¡
¡
). Every in-dividual decision involving an element
£
C
may be viewed as a Bernoulli trial whose prob-ability of success (i.e. the probability of se-lecting e for representing X) is equal to
¤¦¥
¤
£
¦
.Hence, for every possible crisp representative of
8
,
§
C£g¤ 1¦
, we can estimate its probabil-ity as follows. Given a discrete random variable
¨£©££ !"$#%£
¥
which takes values on
£z¤ §¦
,the probability that
¨£©££ !"$#%£¥
results inY is equal to:
&
R
('0)132 4)456)87@9$AB9DCFEG)
IHQP
b
HRSTUV
R
W)
b
XRS`YTU
Rms
baV
R
W)
bb
For simplicity, we introduce the fol-lowing compact notation:
c¥
¤
§
¦&
d
¤
¨£e©fg£hB£4 3"$#%£¥
&
§
¦
.In the next section this deﬁnition is used for de-signing a novel quantiﬁer fuzziﬁcation mecha-nism based on this independence assumption.
3.1 A new fuzziﬁcation mechanism
Following the previous deﬁnition, a new fuzziﬁ-cation method in which all possible crisp repre-sentatives of a given fuzzy set
8
are consideredarises in a natural way. This contrasts with theNVMapproach inwhich only thenested crisp setsobtained by successive
-cuts on
8
are taken intoaccount.
Deﬁnition 1 (
ip
)
Let
¡¢£0¤ 1¦¨476
bea semi-fuzzy quantiﬁer. We deﬁne the quantiﬁer fuzziﬁcation mechanism
iqp
as:
fr
z
es$t$t$tus
v
mX
wyx
88G
t$t$twyxv88G`
U
t$t$tv
Uv
U
s$tut$t$s
Uv
es$t$t$t$s
vT
Following this deﬁnition, all the crisp representa-tives are handled independently and no crisp rep-resentative is disregarded a priori. The superindex
was chosen to stress that all crisp representa-tives are considered.Unfortunately, in the general case
iqp
is not com-putable in polynomial time. Nevertheless, when
quantitative
semi-fuzzy quantiﬁers (i.e, thosewhich can be expressed as a function of the car-dinalities of the involved sets
2
) are handled, it ispossible to develop polynomial time algorithms.This is very important because quantitative quan-tiﬁers are the most interesting from a practicalview [4,7]and, indeed, sufﬁcient for ourpurposesin IR.Now we will sketch the procedure for solvingquantitative unary quantiﬁers. Algorithms forsolving higher arity quantitative quantiﬁers can bedesigned using similar ideas.We will denote by
&
£
X9BABABAH
£
B!
a referentialcontaining
"
elements. By
§
C£
$
(
8C#£
) we will denote a crisp (fuzzy) set on thisreferential.Let us consider a unary semi-fuzzy quantitativequantiﬁer:
¡§9
§
&
9
h§
8C£
where
9
is a function with the form
9¢
4G76
.For this case the independence quantiﬁcation ex-pression becomes:
2
More speciﬁ cally, those which can be expressed asa function of the cardinalities of the arguments and theirboolean combinations[8].
r
Sel
x
8
U
l
U
l
x
8
¡ £¢
x
¢
H
U
l
U
l
¥¤
t$tut
¤
x
`
¡£¢
x
¢
H
U
l
U
l
x
8
¡ £¢
x
¢
H
U
l
§¦
©¨
¤
t$t$t
¤
x
`
¡£¢
x
¢
H
U
l
§¦
If we denote
"!
w
$#&%£%
'%(0)
c¥
§
by
132
¤
4
3
5
¥
&
76
¦
3
we can rewrite the previous ex-pression as follows:
r
ml
98@BADC¡EGFIH
9¨IP¦
©¨
¤
t$t$t
¤¤8@AC¡EQFRH
9P¦
X
S
a
8@AC¡EQFRH
UTP¦
£T
It can be proved that the values
1V2
¤
4
!
5
¥
&
W6
¦
can be obtained with a complexity
X
¤
`Y
¦
.The next example clariﬁes the use of the newquantiﬁcation approach. First, probabilities of allpossible crisp representatives are computed and,next, the previous expression is applied.
Example 1
Let us consider the evaluation of thequantiﬁed sentence "almost all students are tall".Suppose that we model the property tall for anumber of individuals
&
£
9
£
Y
£
ba
!
throughthe fuzzy set
dcc
&HA
©egf
£
r9©A
©hgf
£
Y
bf
£
a
!
and we support the quantiﬁed expression "almost all"by means of the following semi-fuzzy quantiﬁer:
¡¤8¦&
9¤
¡
8
¡
¦
9¤
¦&
pi
qsrY
First, we compute the probabilities
132
¤
4
3
5tDuwv©v
&
W6
¦
for every value of
6
:
3
This value can be interpreted as the probability that thefuzzy set
x
is represented by a crisp set whose size is
y
.
8@
C¡EQFRHR£
9¨
i
x
8`G
¢
x
¢
"&£
"
U
l
"&£
7
©
X
9¨
t
n
¨
t
n
¨
9¨8@
C¡EQFRHR£
i
x
8`G
¢
x
¢
"
"&£
"
U
l
"&£
7
S
I
¤
"&£
t
"
S
¡
¤¤
"&£
S
¡
X
9¨
t
n
¨
t
n
¨¤¤¨
t
n
¨
t
n
¨¤¨
t
n
¨
t
n
9¨
t
¨8@
C¡EQFRHR£
i
x
8`G
¢
x
¢
"&£
"
U
l
"&£
7
S
esS
¡
¤
"&£
t
"
S
esS
¤¤
"&£
S
sS
¡
X
9¨
t
n
¨
t
n
¨¤¤¨
t
n
¨
t
n
¤¨
t
n
¨
t
n
¨
t
¨¤¨
t
&
9¨
t
8@
C¡EQFRHR£
i
x
8`G
£¢
x
¢
"&£
U
l
"&£
S
sS
sS
¡
X
¨
t
n
¨
t
n
9¨
t
And then,
r
$ER
%
S
aH
8@
C¡EGFIHR"&£
UT
¦
£T
8@
"C¡EQFIHR"&£
9¨
¦
©¨
¤¤8@
"C¡EQFIH
"&£
¦
¤¤8@
"C¡EQFIHR"&£
¦
¤¤8@
"C¡EQFIHR"&£
¦
X
¨
n
¨¤¨
t
¨
n
b¤¨
t
I
n
b¤¤¨
t
n
9¨
t
It is interesting to see how the NVM approachproceeds against the same example. Given thefuzzy set
cjc
, the
values (equation 1) are
&g9&
Y
&A
©h
a
&A
©e
lk
&
and the fuzziﬁcation process runs as follows:
cB
©$EQ
%
H§
©$ER
jHklnoo©
©$ER
jHktlnoo
¤¤
©$ER
jHk©lnoo
¤¤
©
©$ER
jHk
lno
o
¤¤
©
©$ER
jHk
lno
o
bm
X3
n
S
n
¤
3
"
S
n
¨
t
¤¤
3
n
S
sS
'
n
©¨
t
¨
t
¤¤
3
n
S
sS
'
sS
n
©¨
t
¨
X
n
¨
t
¤
n
¨
t
¤
n
¨
t
¨
t
IoIo
This example clariﬁes the differences betweenboth methods. For instance, the NVM approach
estimates the odds that exactly two individuals aretall by means of the two elements of
cjc
havingthe largest degrees ofmembership,
£
Y
£
a
. Itisim-plicitly assumed that, if only two individuals areconsidered tall, these should be
£
Y
and
£
ba
manda-torily. This assumption is completely reasonableif the fuzzy set
cjc
was built from measures of height of the individuals involved. In that case, itis difﬁcult to imagine a situation in which, for in-stance,
£
ba
and
£
9
are considered as tall individualswhereas
£
Y
is not considered a tall person. Never-theless, think that the fuzzy set
cjc
may representpredictions about the height of future descendantsfor three couples (e.g. estimated from the heightof the members of each couple) and, hence, itmight be the case that
£
a
and
£
X9
do ﬁnally producetall descendants whereas
£
Y
produces a short one.Summing up, it may be adequate to consider allpossible two-sized crisp representatives for com-puting the odds that exactly two elements do com-ply with the property formalized by the fuzzy set.This is precisely what the
ip
method does. Theodds that exactly two individuals are tall are com-puted taking into account the odds that only
£
9
and
£
Y
are tall, the odds that only
£
Y
and
£
a
aretall and the odds that only
£
r9
and
£
a
are tall.As we will detail later, IR is foundated on a num-ber of useful heuristics that have played a fun-damental role to enhance retrieval performance,e.g. tf/idf to weight the importance of a term fora given document. Of course, this class of heuris-tics is not perfect and, hence, one can never besure that the two terms which are the most sig-niﬁcative in the context of a given document arethose ones have the higher tf/idf values. As a con-sequence, the
ip
approach is a good support forour application of fuzzy quantiﬁers in IR.
3.2 Properties of the model
A formal analysis of the properties fulﬁlled bythe new fuzziﬁcation approach is currently un-dergoing. In this study we follow the axiomaticframework presented in [9]. We can advance thatthe model is well-behaved because it fulﬁlls theproperties of
correct generalization of crisp ex- pressions, induced operations, external negation,internal negation, duality, internal meets, mono-tonicity in arguments
,
monotonicity in quantiﬁersand coherence with logic.
This assures that thequantiﬁcation method proposed yields a naturaland intuitive modeling of quantiﬁed expressions,as depicted brieﬂy in the following examples. Forinstance, if external negation is not fulﬁlled sen-tences such as "at most 10 tall individuals areblonde" and "not more than 10 tall individualsare blonde" are not considered equivalent. Thesentences "all tall individuals are blonde" and "notall individual is not blonde" are only equivalentwhen the quantiﬁcation model complies with in-ternal negation. Duality assures that "some tall in-dividuals are blonde" and "not all tall individualsare not blonde" are equivalent and the equivalencebetween "some tall individuals are blonde" and"there is some individual who is tall and blonde"is guaranteed by the property of internal meets.Monotonicity in quantiﬁers assures that the resultof evaluating an expression such as "about 80% ormore of the tall individuals are blonde" is less orequal than the result obtained from "about 60% ormore of the tall individuals are blonde". These ex-amples show clearly that unacceptable and coun-terintuitive situations might arise when the quan-tiﬁcation approach does not comply with someof these fundamental properties. Since the
ip
method complies with such properties, its appli-cation in a wide range of domains is promising.
4 Application in information retrieval
The adequacy of fuzzy quantiﬁers for IR was al-ready anticipated in [1]. In a recent work [11],a query language enriched with quantiﬁed state-ments was empirically tested. This evaluationrevealed that fuzzy quantiﬁers are beneﬁcial interms of retrieval performance. The proposal en-closed in [11] designs a general framework basedon theNVMmethod in whichquantiﬁers withdif-ferent degrees of expressiveness can be handled.In the experimental setting quantiﬁed expressionswere handled through unary quantiﬁers. This ap-proach subsumes the quantiﬁcation model basedon ordered weighted operators [13] and, as arguedbefore, it falls into the nested assumption for crisprepresentatives.Next paragraphs sketch the use and convenienceof the
ip
model in the context of the basic IRtask.Consider a query formulation with the form

Search

Similar documents

Tags

Related Search

A Mechanism For Booster Approach in Mobile AdA Novel Computing Method for 3D Doubal DensitA novel comprehensive method for real time ViA Novel Model for Competition and CooperationDevelopment of a novel approach for identificRequest For Informationmechanism for conflict prevention : SADC, ECOEvolving A New Model (SDLC Model-2010) For SoRequest for Information IssuesVisual Systems for Information and Identifica

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks