A New Version of Bayesian Rough Set Based on Bayesian Confirmation Measures
Ayad R. Abbas
1
, Liu Juan
2
, Safaa O.Mahdi
3
1,2
School of Computer, Wuhan University, Wuhan 430079, China
3
Harbin Institute of Technology, Harbin 150001, China
1
ayad_cs@yahoo.com,
2
liujuan@whu.edu.cn,
3
safaa_vb@yahoo.com
Abstract
Bayesian Confirmation Measures (BCM) quantify the strength of the confirmation where an evidence confirm, disconfirm or conformational irrelevance to the hypothesis under the test. In this paper BCM within Bayesian Rough Set approach (BRS) are applied to introduce a parametric extension of the BRS in order to handle totally ambiguous and enhance the precision of Rough set, and to deal with both two decision classes and multi decision classes. This concept is demonstrated by an example. The simulated result gives good accuracy and precise information with few computational steps.
1. Introduction
Rough set theory [1], introduced by Pawlak, it has been described as an efficient tool to extract significant rules from data and to find minimal set of data with the same knowledge. The srcinal rough set addresses the case where there exist fully correct. Consequently, the extension of rough set has become an important issue that addresses the ambiguity of information in many applications. Pawlak supposed information system A is a pair of the form A= (
U
,
A
) where
U
is a finite nonempty set of objects, called the universe, and
A
is a finite nonempty set of attributes that is further classified into two disjoint subsets, condition attributes
C
and decision attributes
D
where
A
=
C
D
and
R
be an equivalence relation in
U
.
U
/
R
means the family of all equivalence classes of
R
, equivalence classes of the relation
R
is called elementary sets. The expression of the main idea of rough set in the following way: upper and lower approximations of
X
, lower approximation provides a certain rule, while the upper approximation does a possible rule. Several researchers have been interested to extend the rough set. Firstly, variable precision rough set model (VPRS) [2], is a parametric version to overcome misclassification problems. Secondly, Bayesian rough set [3], is a non parametric version, it is a special case of VPRS. Thirdly, a variable precision Bayesian rough set model (VPBRS) [4] is a parametric Bayesian extension of the rough set model. Fourthly, rough sets and Bayes factor [5], is a novel approach to understand the concepts of the theory of rough sets in terms of the inverse probabilities derivable from data. In this paper, a parametric refinement of BRS is introduced by allowing single parameter rather than multi parameters, these parameters are existed in the VPRS model [2], especially with multi decision problems. Moreover, there are two significant differences from previous VPBRS model [4], the first difference; the proposed BRS is produced to deal with multi decision classes. The second difference, that the proposed method is used only when the posterior probabilities between 0 and 1. Consequently, this improvement is used in distance learning application to extract the decision rules from a student's information table in order to enhance student/teacher feedback. The obtained results show that the proposed approach is able to achieve with a few computation steps and good accuracy. Thus, the distance learning is made more realistic learning. This paper is organized as follows: in section 2, the basic notion of the BCM is outlined. In section 3, the VPRS based on BCM is introduced. In section 4, the BRS based on BCM is introduced. In section 5, the BRS based distance learning is described to overcome the lack of the feedback. Section 6, illustrative example is presented. Section 7 concludes the paper.
2. Bayesian Confirmation Measures
According to the Bayesian confirmation theory [6], evidence
E
confirms a hypothesis
X
just in case
E
and
X
are positively probabilistically correlated (under an appropriate probability function
P
), using the inequalities below:
2007 International Conference on Convergence Information Technology
0769530389/07 $25.00 © 2007 IEEEDOI 10.1109/ICCIT.2007.77284
() ()() ()() ()()()()
PXEPX PXEPXE PEXPEX PEXPEPX
>> ¬> ¬> ×
∩
(1) Where,
P
(
X
) is the probability of hypothesis
X
(prior probability),
P
(
X

E
) is a conditional probability of hypothesis
X
given evidence
E
and
P
(
E

X
) is a conditional probability of evidence
E
given hypothesis
X.
For instance, by taking the difference between the left and right hand side of any of these inequalities, one may construct a (intuitively plausible) measure
c
(
X
,
E
) of the degree to which
E
confirms
X
, this measure is called the
relevance measures
. Any such measure is bound to satisfy the following qualitative constraint, in cases where
E
confirms, disconfirms, or is confirmationally irrelevant to
X
.
> 0 if () > () (,)<0 if () < ()= 0 if () = ()
PXEPX XEPXEPX PXEPX
c
(2) The most well known confirmation measures proposed in the literature are the following:
(,)()()()(,)log()()(,)log()
XEPXEPX PXE rXE PE PEX lXE PEX
= −==¬
d
(3)
3. Variable Precision Rough set based on Bayesian Confirmation Measures
3.1. VPRS in terms of two decision classes
The proposed parametric refinement of VPRS is introduced, by allowing single parameter controlled a degree of uncertainty in boundary regions rather than two parameters upper parameter
u
and lower parameter are
l
. Where
P
(
X
t
) is a probability of the target concept (positive hypothesis) and
P
(
X
c
) is a probability of the complement concept (negative hypothesis),
P
(
X
c
)=1
P
(
X
t
) and
P
(
X
c

E
i
)=1
P
(
X
t

E
i
).
The difference between posterior probability
P
(
X
t

E
i
) and prior probability
P
(
X
t
) is compared with threshold value
α
t
. Single parameter is assumed rather than two parameters in order to refine the number of parameters for every decision class, defined as follows:
() = () =
ttttt
uPXPXl
α
− −
(4) The following condition is suggested to choice VPRS parameter:
( )
0(),()
ct ttc
MINPXPX
α
≠
< ≤
∑
(5) The complement concept is defined as:
()()
tcct
PXPX
≠
¬ =
∑
(6) The improved VPRS Positive, Negative, and Boundary regions for two decision classes based on the difference between inequalities (3), are respectively defined as:
( )
{ }
( )
{ }
( )
{ }
: (,) : (,):(,)
t tt tt ttt
t t
POSXEXE NEGXEXE BNDXEXE
α α α
α α α α
= ≥= ≤−= − < <
∪∪∪
i i i
i i i
d d d
(7) Where,
α
t
=
α
c
, this means
l
t
+
u
c
=
u
t
+
l
c
=1. If
α
t
increased up to the maximum value (5), this increment has caused equal decrement both positive and negative regions, and increment the boundary region.
3.2. VPRS in terms of multi decision classes
According bayes factor [5], improved VPRS model is defined as follows:
( )
{ }
( )
{ }
( )
{}
::
:(,): (,): (,) (,)
tcctcctctccctcc
t t
POSXEXE NEGXEXE BNDXEXE XE
α α α
α α α α
≠≠
= ∀ ≤−= ≤−= >− ∧∃ >−
∪∪∪
i i i
i i i i
d d d d
(8) The condition (5) can be used to choose
α
t
and the condition (9) can be used to calculate and parameterize
285
the thresholds
α
c
for all complement concepts
X
c
, as shown below:
::
()()
ccctt ccct c
PX PX
α α
≠≠
= ∀ ×
∑
(9) The parameters satisfying equalities is defined:
1
tcct tctcctct
luul
α α
≠≠ ≠
=
∑
+ = + =
∑ ∑
(10) The main difference of this approach from the previous approaches [2, 5] is that the single target parameter is chosen and other complement parameters are calculated by using a mathematical model (9), rather than choosing all parameters.
4. Bayesian Rough Set Based on Bayesian Confirmation Measures
In some applications [3], the objective is to achieve some improvement of certainty of prediction based on the available information rather than trying to produce rules satisfying preset certainty requirements. Therefore, it appears to be appropriate to not use any parameters to control model derivation. A modification of VPRS model is presented, which allows for derivation of parameterfree predictive models from data while preserving the essential notions and methods of the rough set theories. The BRS positive, negative, and boundary regions are defined, respectively, by:
( )
{ }
( )
{ }
( )
{ }
: ()(): ()(): ()()
ttt ttt ttt
POSXEPXEPX NEGXEPXEPX BNDXEPXEPX
= >= <= =
∪∪∪
ii ii ii
(11)
4.1. BRS in terms of two decision classes
After recalling the basic methods for extracting probabilities from data and the BRS, the improved BRS Positive, Negative, and Boundary regions for two decision classes, are respectively defined as:
{ }{ }{ }
(): ()(): ()(): ()
titititititi
POSXEPXEu NEGXEPXEl BNDXElPXEu
α α α
= ≥= ≤= < <
∪∪∪
(12) Where:
()(1())()()
tt tt
uPXPX lPXPX
α α
= + −= −
[0,1)
α
∈
. By using the basic method of BCM (3), The logarithm ratio between posterior probability
P
(
X
t

E
i
) and prior probability
P
(
X
t
) is compared with threshold value
α
. Thus, the BRS Positive, Negative, and Boundary regions based on BCM for two decision classes, are respectively defined as:
( )
{
( )
}
( )
{
( )
}
( )
{
( )
}
():(,) ()0():(,) ()0():(,) :(,)
ticicititititiciiti
POSXErXE PXE NEGXErXE PXE BNDXErXE ErXE
α α α
β β β β
ℜℜℜ
= ≤∨ == ≤∨ == >∧ >
∪∪∪
(13) Where:
=log (1 )
β α
()0()0
()(,)log()()(,)log()
citi
ciciPXE ctitiPXE t
PXE rXE PX PXE rXE PX
≠≠
==
()(1())1()(1)()()()(1)
PXPXPXu PXPXPXl
α α α α
+ − = − ¬ − =− = − =
4.2. BRS in terms of multi decision classes
It is potential that the concept [5] may be applicable to more general decision (more than two target events). The improved BRS model is defined as follows:
286
( )
{
( )
}
( )
{
( )
}
( )
{
( )
}
::
():(,) ()0():(,) ()0():(,) :(,)
cct cct
ticicititititiciiti
POSXErXE PXE NEGXErXE PXE BNDXErXE ErXE
α α α
β β β β
≠≠
ℜℜℜ
= ∀ ≤∨ == ≤∨ == ∃ >∧ >
∪∪∪
(14) 1.
The
α
positive region
POS
(
X
t
) can be explained as a collection of objects
u
if and only if for every
X
c
, where the confirmation measure is less
than or equal
β
or posterior probability for hypotheses
X
c
equal to 0. 2.
The
α
negative region
NEG
(
X
t
) can be explained as a collection of objects
u
if and only if for
X
t
, where the confirmation measure is less
than or equal
β
or posterior probability for hypotheses
X
t
equal to 0. 3.
The
α
boundary region
BND
(
X
t
) can be explained as a collection of objects
u
if and only if at least
X
c
, where the confirmation measure is greater than
β
and if and only if for
X
t
, where the confirmation measure is greater than
β
.
5. Bayesian Rough Set based Distance Learning
In [7,8], the traditional rough set and rough set based Inductive Learning are used to assist students and instructors, and provide an instrument for learner self assessment when taking courses delivered via the World Wide Web. Rough Set Based Distance Learning improves the stateoftheart of Web learning by offsetting the lack of student/teacher feedback and provides both students and teachers with the insights needed to study better. In this section, three steps are defined to extract suitable rules. Firstly, find the approximation regions. Secondly, compute the best attribute. Thirdly, induct the suitable description. The Rough Set methodology is applied to the information table to find rules behind data. After calculating the discernibility matrix and relative discernibility function for the information table based on the set of prime implicate of the relative discernibility function, both definite and default rules is obtained for the system. From the rule base, we can find the section of material that is most relevant to failure on the final exam. Discriminant index [9] is used to provide a measure of the degree of certainty in classifying the set of objects in
E
i
with respect to the concept represented by
X
t
, the index value is calculated first and the highest index value is determined the best attribute, the discriminant index is defined as:
(())1()
t
cardBNDX cardU
α
η
= −
(15) Where,
η
is the discriminant index,
card is the
cardinality of the set.
6. Illustrative Example
Let us assume an information system A=(
U
,
A
) ,
A
is the condition attributes , the subset of attributes
B
, where
BA
⊆
and the equivalence classes family
U
/
B
=
E
i
={
E
0
,
E
1
,…
E
n1
}. Let us also assume there are 100 students are recorded in a Student Information Table, that includes some fields: domain
U
, condition attributes C={
T
1,
T
2,
T
3
}, decision attribute
D
={
Final
} and frequency field
F
, the results of self tests should be Excellent (
A
), Very Good (
B
), Good (
C
), Fair (
D
) or Poor (
F
), and frequency field counts the number of students have the same records, as follows:
Table 1. Student Information Table
U T
1
T
2
T
3
Final Frequency u
1
A B A B 10 u
2
C F D F 18 u
3
C F F F 3 u
4
F F F D 6 u
5
F D F F 17 u
6
D D F F 10 u
7
C F F D 8 u
8
A B A C 5 u
9
A A A B 4 u
10
B A A A 2 u
11
F C D D 5 u
12
A C B C 5 u
13
B A B A 5 u
14
B C C C 2
287
Where:
U=domain, T
1
=self test 1,
T
2
=self test 2,
T
3
=self test 3,
Final
=final self test. The decision classes
P
(
X
A
)=0.07,
P
(
X
B
)=0.14,
P
(
X
C
)=0.12,
P
(
X
D
)=0.19 and
P
(
X
F
)=0.48, the target concept is “why students are failed in the final test”
then
X
t
=
X
F
and
X
c
={
X
A
,
X
B
,
X
C
,
X
D
}, the results of a posterior probabilities for
T
1
,
T
2
and
T
3
are calculated, as shown in table 2, table 3 and table 4. Where: E
i
=Partitions, D
1
=
P
(
X
A

E
i
), D
2
=
P
(
X
B

E
i
), D
3
=
P
(
X
C

E
i
), D
4
=
P
(
X
D

E
i
), D
5
=
P
(
X
F

E
i
). Let single parameter value
α
=0.3, the value of
β
can be calculated by using equation below:
=log (1 )
β α
= 0.15. All posterior probabilities in the above tables are converted to confirmation measures (3) by using the logarithm ratio measure
.
For instance, the confirmation measure for
P
(
X
B

E
0
) as shown in the self test 1 is:
()0
()(,)log()
ti
titiPXE t
PXE rXE PX
≠
=
=0.62. Because of there are multi decision classes
A
,
B
,
C
,
D
and
F
then the following characteristics are obtained by using (14) as shown in table 5. Where T
i
=Self Test number and
η
is a discriminant index. Because the test 2 has the highst discriminant index then the first rule is obatined:
Rule
1
{
IF T
2
=F then Final =F
}
confidence=
60%
The positive region and the negative region for test 2 are eliminated from Student Information Table (1), and the output result is given in table 6: The same previous steps are used to extract the remaining rules, because the discriminant index for (
T
2
and
T
3
) is highest than the discriminant index for (
T
1
and
T
2
). Therefore, the out put rule is obtained:
Rule
2 {IF
T
2
=F and T
3
=D then Final =F
}
confidence=
100%
The strength of a rule is commonly measured [10, 11],
Table 2. Posterior Probabilities for Self Test 1
E
i
D
1
D
2
D
3
D
4
D
5
E
0
0 0.58 0.42 0 0 E
1
0.77 0 0.22 0 0 E
2
0 0 0 0.27 0.72 E
3
0 0 0 0 1 E
4
0 0 0 0.39 0.61
Table 5. Approximation Regions
T
i
POS
0.3
(X
F
) NEG
0.3
(X
F
) BND
0.3
(X
F
) T
1
{E
3
} {E
0
, E
1
} {E
2
, E
4
} 0.43 T
2
{E
3
} {E
0
, E
1
, E
2
} {E
4
} 0.65 T
3
θ
{E
0
, E
1
, E
2
} {E
3
, E
4
} 0.33
Table 6. Reduced Table
U T
1
T
2
T
3
Final Frequency u
2
C F D F 18 u
3
C F F F 3 u
4
F F F D 6 u
7
C F F D 8
Table 3. Posterior Probabilities for Self Test 2
E
i
D
1
D
2
D
3
D
4
D
5
E
0
0.63 0.36 0 0 0 E
1
0 0.66 0.33 0 0 E
2
0 0 0.58 0.41 0 E
3
0 0 0 0 1 E
4
0 0 0 0.4 0.6
Table 4. Posterior Probabilities for Self Test 3
E
i
D
1
D
2
D
3
D
4
D
5
E
0
0.09 0.66 0.23 0 0 E
1
0.5 0 0.5 0 0 E
2
0 0 1 0 0 E
3
0 0 0 0.21 0.78 E
4
0 0 0 0.32 0.68
288