Description

The one-mode projecting is extensively used to compress the bipartite networks. Since the one-mode projection is always less informative than the bipartite representation, a proper weighting method is required to better retain the original

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

a r X i v : 0 7 0 7 . 0 5 4 0 v 2 [ p h y s i c s . s o c - p h ] 3 1 J u l 2 0 0 7
How to project a bipartite network?
Tao Zhou
1
,
2
,
∗
Jie Ren
1
, Mat´uˇs Medo
1
, and Yi-Cheng Zhang
1
,
3
†
1
Department of Physics, University of Fribourg, Chemin du Muse 3, CH-1700 Fribourg, Switzerland
2
Department of Modern Physics and Nonlinear Science Center,University of Science and Technology of China, Hefei Anhui, 230026, PR China
3
Information Economy and Internet Research Laboratory,University of Electronic Science and Technology of China, Chengdu Sichuan, 610054, PR China
(Dated: February 11, 2013)The one-mode projecting is extensively used to compress the bipartite networks. Since the one-mode projection is always less informative than the bipartite representation, a proper weightingmethod is required to better retain the srcinal information. In this article, inspired by the network-based resource-allocation dynamics, we raise a weighting method, which can be directly applied inextracting the hidden information of networks, with remarkably better performance than the widelyused global ranking method as well as collaborative ﬁltering. This work not only provides a creditablemethod in compressing bipartite networks, but also highlights a possible way for the better solutionof a long-standing challenge in modern information science: How to do personal recommendation?
PACS numbers: 89.75.Hc, 87.23.Ge, 05.70.Ln
I. INTRODUCTION
The last few years have witnessed a tremendous ac-tivity devoted to the understanding of complex networks[1,2,3,4,5,6,7]. A particular class of networks is
the
bipartite networks
, whose nodes are divided into twosets,
X
and
Y
, and only the connection between twonodes in diﬀerent sets is allowed (as illustrated in Fig.1a). Many systems are naturally modeled as bipartitenetworks[8]: Human sexual network[9] is consisted of
men and women, metabolic network[10] is consisted of chemical substances and chemical reactions, etc. Twokinds of bipartite networks should be paid more atten-tion for their particular signiﬁcance in social, economicand information systems. One is the so-called
collabora-tion network
, which is generally deﬁned as a networks of actors connected by a common collaboration act[11,12].
Examples are numerous, including scientists connectedby coauthoring a scientiﬁc paper[13,14], movie actors
connected by costarring the same movie[1,15], and so
on. Moreover, the concept of collaboration network isnot necessarily restricted within social systems (see, forexample, recent reports on technological collaboration of software[16] and urban traﬃc systems[17]). Although
the collaboration network is usually displayed by the one-mode projection on actors (see later the deﬁnition), itsfully representation is a bipartite network. The otherone is named
opinion network
[18,19], where each node
in the
user-set
is connected with its collected objects inthe
object-set
. For example, listeners are connected withthe music groups they collected from music-sharing li-brary (e.g.
audioscrobbler.com
) [20,21], web-users are
connected with the webs they collected in a bookmark
∗
Electronic address:zhutou@ustc.edu
†
Electronic address:yi-cheng.zhang@unifr.ch
site (e.g.
delicious
)[22], customers are connected withthe books they bought (e.g.
Amazon.com
)[23,24].
Recently, a large amount of attention is addressed toanalyzing[8,20,25,26,27] and modeling[28,29,30]
bipartite network. However, for the convenience of di-rectly showing the relations among a particular set of nodes, the bipartite network is usually compressed byone-mode projecting. The one-mode projection onto
X
(
X
-projectionfor short) means a network containing only
X
-nodes, where two
X
-nodes are connected when theyhave at least one common neighboring
Y
-node. Fig. 1band Fig. 1c show the resulting networks of
X
-projectionand
Y
-projection, respectively. The simplest way is toproject the bipartite network onto an unweighted net-work [13,14,31,32,33], without taking into account of
the frequency that a collaboration has been repeated. Al-though some topological properties can be qualitativelyobtained from this unweighted version, the loss of in-formation is obvious. For example, if two listeners hascollected more than 100 music groups each (it is a typicalnumber of collections, like in
audioscrobbler.com
, the av-erage number of collected music groups per listener is 140[20]), and only one music group is selected by both listen-ers, one may conclude that those two listeners probablyhave diﬀerent music taste. On the contrary, if nearly 100music groups belong to the overlap, those two listenersare likely to have very similar habits. However, in the un-weighted listener-projection, this two cases have exactlythe same graph representation.Since the one-mode projection is always less informa-tive than the srcinal bipartite network, to better reﬂectstructure of the network, one has to use the bipartitegraph to quantify the weights in the projection graph. Astraightforward way is to weight an edge directly by thenumber of times the corresponding partnership repeated[34,35]. This simple rule is used to obtain the weights in
Fig. 1b and Fig. 1c for
X
-projection and
Y
-projection,respectively. This weighted network is much more infor-
2
Y-projection (c) 1 1 1 2 2 2 y 6 y 5 y 4 y 3 y 2 y 1 (b) 1 1 1 1 1 1 1 1 2 2 x 8 x 7 x 6 x 5 x 4 x 3 x 1 x 2 y 6 y 5 y 4 y 3 y 2 y 1 x 8 x 7 x 6 x 5 x 4 x 3 x 2 Y-nodes: X-nodes: x 1 (a) 2 1 X-projection
FIG. 1: Illustration of a bipartite network (a), as well as its
X
-projection (b) and
Y
-projection (c). The edge-weight in(b) and (c) is set as the number of common neighbors in
Y
and
X
, respectively.
mative than the unweighted one, and can be analyzedby standard techniques for unweighted graphs since itsweights are all integers [36]. However, this method isalso quantitatively biased. Li
et al.
[37] empirically stud-ied the scientiﬁc collaboration networks, and pointed outthat the impact of one additional collaboration papershould depend on the srcinal weight between the twoscientists. For example, one more co-authorized paperfor the two authors having only co-authorized one pa-per before should have higher impact than for the twoauthors having already co-authorized 100 papers. Thissaturation eﬀect can be taken into account by introduc-ing a hyperbolic tangent function onto the simple countof collaborated times[37]. As stated by Newman thattwo scientists whose names appear on a paper togetherwith many other coauthors know one another less wellon average than two who were the sole authors of a pa-per [14], to consider this eﬀect, he introduced the factor1
/
(
n
−
1) to weaken the contribution of collaborations in-volving many participants [38,39], where
n
is the numberof participants (e.g. the number of authors of a paper).How to weight the edges is the key question of theone-mode projections and their use. However, we lack asystematic exploration of this problem, and no solid baseof any weighting methods have been reported thus far.For example, one may ask the physical reason why usingthe hyperbolic tangent function to address the satura-tion eﬀect [37] rather than other inﬁnite possible candi-dates. In addition, for simplicity, the weighted adjacentmatrix
{
w
ij
}
is always set to be symmetrical, that is,
w
ij
=
w
ji
. However, as in scientiﬁc collaboration net-works, diﬀerent authors may assign diﬀerent weights tothe same co-authorized paper, and it is probably the casethat the author havingless publications may give a higherweight, vice versa. Therefore, a more natural weightingmethod may be not symmetrical. Another blemish inthe prior methods is that the information contained bythe edge whose adjacent
X
-node (
Y
-node) is of degreeone will be lost in
Y
-projection (
X
-projection). Thisinformation loss may be serious in some real opinion net-works. For example, in the user-web network of
delicious
(http://del.icio.us), a remarkable fraction of webs havebeen collected only once, as well as a remarkable fractionof users have collected only one web. Therefore, both theuser-projection and web-projection will squander a lot of information. Since more than half publications in
Math-ematical Reviews
have only one author [31], the situationis even worse in mathematical collaboration network.In this article, we propose a weighting method, withasymmetrical weights (i.e.,
w
ij
=
w
ji
) and allowed self-connection (i.e.,
w
ii
>
0). This method can be directlyapplied as a personal recommendation algorithm, whichperforms remarkably better than the widely used
global ranking method
(GRM) and
collaborative ﬁltering
(CF).
II. METHOD
Without loss of generality, we discuss how to deter-mine the edge-weight in
X
-projection, where the weight
w
ij
can be considered as the importance of node
i
in
j
’ssense, and it is generally not equal to
w
ji
. For exam-ple, in the book-projection of a customer-book opinionnetwork, the weight
w
ij
between two books
i
and
j
con-tributes to the strength of book
i
recommendation to acustomer provided he has brought book
j
. In the sci-entiﬁc collaboration network,
w
ij
reﬂects how likely is
j
to choose
i
as a contributor for a new research project.More generally, we assume a certain amount of a resource(e.g. recommendation power, research fund, etc.) is asso-ciated with each
X
-node, and the weight
w
ij
representsthe proportion of the resource
j
would like to distributeto
i
.To derive the analytical expression of
w
ij
, we go backto the bipartite representation. Since the bipartite net-work itself is unweighted, the resource in an arbitrary
X
-node should be equally distributed to its neighbors in
Y
. Analogously, the resource in any
Y
-node should beequally distributed to its
X
-neighbors. As shown in Fig.2a, the three
X
-nodes are initially assigned weights
x
,
y
and
z
. The resource-allocation process consists of twosteps; ﬁrst from
X
to
Y
, then back to
X
. The amount of resource after each step is marked in Fig. 2b and Fig. 2c,respectively. Merging these two steps into one, the ﬁnalresource located in those three
X
-nodes, denoted by
x
′
,
y
′
and
z
′
, can be obtained as:
x
′
y
′
z
′
=
11
/
18 1
/
6 5
/
181
/
9 5
/
12 5
/
125
/
18 5
/
12 4
/
9
xyz
.
(1)Note that, this 3
×
3 matrix are column normalized, andthe element in the
i
th row and
j
th column represents thefraction of resource the
j
th
X
-node transferred to the
i
th
X
-node. According to the above description, this matrixis the very weighted adjacent matrix we want.
3
(c) (b) (a) x yz x/3 x/3+y/2+z/3 y/2+z/3 x/3+z/3 11x/18+y/6+5z/18 x/9+5y/12+5z/18 5x/18+5y/12+4z/9
FIG. 2: Illustration of the resource-allocation process in bi-partite network. The upper three are
X
-nodes, and the lowerfour are
Y
-nodes. The whole process consists of two steps:First, the resource ﬂows from
X
to
Y
(a
→
b), and then re-turns to
X
(b
→
c). Diﬀerent from the prior network-basedresource-allocation dynamics[40], the resource here can onlyﬂow from one node-set to another node-set, without consid-eration of asymptotical stable ﬂow among one node-set.
Now, consider a general bipartite network
G
(
X,Y,E
),where
E
is the set of edges. The nodes in
X
and
Y
are denoted by
x
1
,x
2
,
···
,x
n
and
y
1
,y
2
,
···
,y
m
, respec-tively. The initial resource located on the
i
th
X
-node is
f
(
x
i
)
≥
0. After the ﬁrst step, all the resource in
X
ﬂowsto
Y
, and the resource located on the
l
th
Y
-node reads,
f
(
y
l
) =
n
i
=1
a
il
f
(
x
i
)
k
(
x
i
)
,
(2)where
k
(
x
i
) is the degree of
x
i
, and
a
il
is an
n
×
m
adja-cent matrix as
a
il
=
1
, x
i
y
l
∈
E,
0
,
otherwise
.
(3)In the next step, all the resource ﬂows back to
X
, andthe ﬁnal resource located on
x
i
reads,
f
′
(
x
i
) =
m
l
=1
a
il
f
(
y
l
)
/k
(
y
l
) =
m
l
=1
a
il
k
(
y
l
)
n
j
=1
a
jl
f
(
x
j
)
k
(
x
j
)
.
(4)This can be rewritten as
f
′
(
x
i
) =
n
j
=1
w
ij
f
(
x
j
)
,
(5)where
w
ij
=1
k
(
x
j
)
m
l
=1
a
il
a
jl
k
(
y
l
)
,
(6)which sums the contribution from all 2-step paths be-tween
x
i
and
x
j
. The matrix
W
=
{
w
ij
}
n
×
n
repre-sents the weighted
X
-projectionwe were looking for. Theresource-allocation process can be written in the matrixform as
−→
f
′
=
W
−→
f
.It is worthwhile to emphasize the particular charactersof this weighting method. For convenience, we take thescientiﬁc collaboration network as an example, but ourstatements are not restricted to the collaboration net-works. Firstly, the weighted matrix is not symmetricalas
w
ij
k
(
x
j
)=
w
ji
k
(
x
i
)
.
(7)This is in accordance with our daily experience - theweight of a single collaboration paper is relatively small if the scientist has already published many papers (i.e., hehas large degree), vice versa. Secondly, the diagonal ele-ments in
W
are nonzero, thus the information containedby the connections incident to one-degree
Y
-node will notbe lost. Actually, the diagonal element is the maximalelement in each column. Only if all
x
i
’s
Y
-neighbors be-longs to
x
j
’s neighbors set,
w
ii
=
w
ji
. It is usually foundin scientiﬁc collaboration networks, since some studentscoauthorize every paper with their supervisors. There-fore, the ratio
w
ji
/w
ii
≤
1 can be considered as
x
i
’s re-searching independence to
x
j
, the smaller the ratio, themore independent the researcher is, vice versa. The in-dependence of
x
i
can be approximately measured as
I
i
=
j
w
ji
w
ii
2
.
(8)Generally, the author who often publishes papers solely,or often publishes many papers with diﬀerent coauthorsis more independent. Note that, introducing the measure
I
i
here is just to show an example how to use the informa-tion contained by self-weight
w
ii
, without any commentswhether to be more independent is better, or contrary.
III. PERSONAL RECOMMENDATION
The exponential growth of the Internet[41] and World-Wide-Web[42] confronts people with an informationoverload: They are facing too many data and sourcesto be able to ﬁnd out those most relevant for him. Onelandmark for information ﬁltering is the use of searchengines [43], however, it can not solve this
overload prob-lem
since it does not take into account of personalizationthus returns the same results for people with far diﬀerenthabits. So, if user’s habits are diﬀerent from the main-stream, it is hard for him to ﬁnd out what he likes in the
4
02000400060008000 0.0 0.2 0.4 0.6 0.8 1.0
r
Rank GRM<r>=0.139 CF<r>=0.120 NBI<r>=0.106
FIG. 3: (color online) The predicted position of each entry inthe probe ranked in the ascending order. The black, red andblue curves, from top to bottom, represent the cases of GRM,CF and NBI, respectively. The mean values are top 13.9%(GRM), top 12.0% (CF) and top 10.6% (NBI).
countless searching results. Thus far, the most potentialway to eﬃciently ﬁlter out the information overload isto recommend personally. That is to say, using the per-sonal information of a user (i.e., the historical track of this user’s activities) to uncover his habits and to con-sider them in the recommendation. For instances, Ama-zon.com uses one’s purchase history to provide individualsuggestions. If you have bought a textbook on statisticalphysics, Amazon may recommend you some other sta-tistical physics books. Based on the well-developed
Web2.0
technology [44], the recommendation systems are fre-quently used in web-based movie-sharing (music-sharing,book-sharing, etc.) systems, web-based selling systems,bookmark web-sites, and so on. Motivated by the signif-icance in economy and society, recently, the design of aneﬃcient recommendation algorithm becomes a joint focusfrom marketing practice [45,46] to mathematical analy-
sis [47], from engineering science[48,49,50] to physics
community[51,52,53].
Basically, a recommendation system consists of usersand objects, and each user has collected some objects.Denote the object-set as
O
=
{
o
1
,o
2
,
···
,o
n
}
and user-set as
U
=
{
u
1
,u
2
,
···
,u
m
}
. If users are only allowed tocollect objects (they do not rate them), the recommenda-tion system can be fully described by an
n
×
m
adjacentmatrix
{
a
ij
}
, where
a
ij
= 1 if
u
j
has already collected
o
i
,and
a
ij
= 0 otherwise. A reasonable assumption is thatthe objects you have collected are what you like, and arecommendation algorithm aims at predicting your per-sonal opinions (to what extent you like or hate them) onthose objects you have not yet collected. A more compli-cated case is the voting system [54,55], where each user
can give ratings to objects (e.g., in the
Yahoo Music
, theusers can vote each song with 5 discrete ratings repre-senting
Never play again
,
It is ok
,
Like it
,
Love it
, and
Can’t get enough
), and the recommendation algorithmconcentrates on estimating unknown ratings for objects.These two problems are closely related, however, in thisarticle, we focus on the former case.Denote
k
(
o
i
) =
mj
=1
a
ij
the degree of object
o
i
. The
global ranking method
(GRM) sorts all the objects in thedescending order of degree and recommends those withhighest degrees. Although the lack of personalizationleads to an unsatisfying performance of GRM (see nu-merical comparison in the next section), it is widely usedsince it is simple and spares computational resources. Forexample, the well-known
Yahoo Top 100 MTVs
,
Amazon List of Top Sellers
, as well as the board of most down-loaded articles in many scientiﬁc journals, can be all con-sidered as results of GRM.Thus far, the widest applied personal recommendationalgorithm is
collaborative ﬁltering
(CF) [50,54], based
on a similarity measure between users. Consequently,the prediction for a particular user is made mainly usingthe similar users. The similarity between users
u
i
and
u
j
can be measured in the Pearson-like form
s
ij
=
nl
=1
a
li
a
lj
min
{
k
(
u
i
)
,k
(
u
j
)
}
,
(9)where
k
(
u
i
) =
nl
=1
a
li
is the degree of user
u
i
. For anyuser-object pair
u
i
−
o
j
, if
u
i
has not yet collected
o
j
(i.e.,
a
ji
= 0), by CF, the predicted score,
v
ij
(to what extent
u
i
likes
o
j
), is given as
v
ij
=
ml
=1
,l
=
i
s
li
a
jl
ml
=1
,l
=
i
s
li
.
(10)Two factors give rise to a high value of
v
ij
. Firstly, if the degree of
o
j
is larger, it will, generally, have morenonzero items in the numerator of Eq. (10). Secondly, if
o
j
is frequently collected by users very similar to
u
i
, thecorresponding items will be signiﬁcant. The former paysrespect to the global information, and the latter reﬂectsthe personalization. For any user
u
i
, all the nonzero
v
ij
with
a
ji
= 0 are sorted in descending order, and thoseobjects in the top are recommended.We propose a recommendation algorithm, which is adirect application of the weighting method for bipartitenetworks presented above. The layout is simple: ﬁrstcompress the bipartite user-object network by object-projection, the resulting weighted network we label
G
.Then, for a given user
u
i
, put some resource on thoseobjects already been collected by
u
i
. For simplicity, weset the initial resource located on each node of
G
as
f
(
o
j
) =
a
ji
.
(11)That is to say, if the object
o
j
has been collected by
u
i
, then its initial resource is unit, otherwise it is zero.Note that, the initial conﬁguration, which captures per-sonal preferences, is diﬀerent for diﬀerent users. The ini-tial resource can be understood as giving a unit recom-mending capacity to each collected object. According to
5the weighted resource-allocation process discussed in theprior section, the ﬁnal resource, denoted by the vector
−→
f
′
, is
−→
f
′
=
W
−→
f
. Thus components of
f
′
are
f
′
(
o
j
) =
n
l
=1
w
jl
f
(
o
l
) =
n
l
=1
w
jl
a
li
.
(12)For any user
u
i
, all his uncollected objects
o
j
(1
≤
j
≤
n
,
a
ji
= 0) are sorted in the descending order of
f
′
(
o
j
),and those objects with highest value of ﬁnal resource arerecommended. We call this method
network-based infer-ence
(NBI), since it is based on the weighted network
G
.Note that, the calculation of Eq. (12) should be repeated
m
times, since the initial conﬁgurations are diﬀerent fordiﬀerent users.
IV. NUMERICAL RESULTS
We use a benchmark data-set, namely
MovieLens
, to judge the performance of described algorithms. TheMovieLens data is downloaded from the web-site of
Grou-pLens Research
(http://www.grouplens.org). The dataconsists 1682 movies (objects) and 943 users. Actually,MovieLens is a rating system, where each user votesmovies in ﬁve discrete ratings 1-5. Hence we applied thecoarse-graining method similar to what is used in Ref.[19]: A movie has been collected by a user iﬀ the giv-ing rating is at least 3. The srcinal data contains 10
5
ratings, 85.25% of which are
≥
3, thus the user-moviebipartite network after the coarse gaining contains 85250edges. To test the recommendation algorithms, the dataset (i.e., 85250 edges) is randomly divided into two parts:The training set contains 90% of the data, and the re-maining 10% of data constitutes the probe. The trainingset is treated as known information, while no informationin probe set is allowed to be used for prediction.All three algorithms, GRM, CF and NBI, can provideeach user an ordered queue of all its uncollected movies.For an arbitrary user
u
i
, if the edge
u
i
−
o
j
is in theprobe set (according to the training set,
o
j
is an uncol-lected movie for
u
i
), we measure the position of
o
j
in theordered queue. For example, if there are 1500 uncollectedmovies for
u
i
, and
o
j
is the 30th from the top, we say theposition of
o
j
is the top 30/1500, denoted by
r
ij
= 0
.
02.Since the probe entries are actually collected by users,a good algorithm is expected to give high recommenda-tions to them, thus leading to small
r
. The mean valueof the position value, averaged over entries in the probe,are 0.139, 0.120 and 0.106 by GRM, CF and NBI, respec-tively. Fig. 3 reports the distribution of all the positionvalues, which are ranked from the top position (
r
→
0)to the bottom position (
r
→
1). Clearly, NBI is the bestmethod and GRM performs worst.To make this work more relevant to the real-life recom-mendation systems, we introduce a measure of algorith-mic accuracy that depends on the length of recommen-dation list. The recommendation list for a user
u
i
, if of
040080012001600 0.0 0.2 0.4 0.6 0.8 1.0
h i t t i n g r a t e
lengthofrecommendationlist GRM CF NBI
FIG. 4: The hitting rate as a function of the length of recom-mendation list. The black, red and blue curves, from bottomto top, represent the cases of GRM, CF and NBI, respectively.TABLE I: The hitting rates for some typical lengths of rec-ommendation list.Length GRM CF NBI10 10.3% 14.1% 16.2%20 16.9% 21.6% 24.8%50 31.1% 37.0% 41.2%100 45.2% 51.0% 55.9%
length
L
, contains
L
highest recommended movies result-ing from the algorithm. For each incident entry
u
i
−
o
j
in the probe, if
o
j
is in
u
i
’s recommendation list, we saythe entry
u
i
−
o
j
is
hit
by the algorithm. The ratio of hit entries to the population is named
hitting rate
. For agiven
L
, the algorithm with a higher hitting rate is bet-ter, and vice versa. If
L
is larger than the total numberof uncollected movies for a user, the recommendation listis deﬁned as the set of all his uncollected movies. Clearly,the hitting rate is monotonously increasing with
L
, withthe upper bound 1 for suﬃciently large
L
. In Fig. 4, wereport the hitting rate as a function of
L
for diﬀerent al-gorithms. In accordance with Fig. 3, the accuracy of thealgorithms is NBI
>
CF
>
GRM. The hitting rates forsome typical lengths of recommendation list are shownin Table I.In a word, via the numerical calculation on a bench-mark data set, we have demonstrated that the NBIhas remarkably better performance than GRM and CF,which strongly guarantee the validity of the presentweighting method.
V. CONCLUSION AND DISCUSSION
Weighting of edges is the key problem in the construc-tion of a bipartite network projection. In this article

Search

Similar documents

Tags

Related Search

Business and Personal DevelopmentLombard Language and personal namesCoaching and Personal DevelopmentDress and Personal Adornment (Archaeology)Body and Personal IdentityAnthropology of the Body, Dress and Personal Islam and Personal CleanlinessCompassion Satisfaction Fatigue and Personal New media, Social Network Sites and Youth PraWireless sensor network- lifetime and coverag

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x