在线社会网络中用户行为的实证分析与机制建模研究
VIP免费
摘 要
“认识你自己”的神谕早在几千年前便已刻下,而时至今日人类对于自身行为的
认识依旧任重道远。在信息革命推动下,互联网时代的到来为我们揭开人类行为
的神秘面纱、挖掘人类行为的内在机制提供了契机。鉴于互联网中用户的浏览、
交友、选择、购买等行为对与理解人类行为模式具有重要理论意义,以及对电子
商务、在线服务等行业具有重要实际价值,本论文将对在线社会系统中的用户行
为进行实证分析,并对其行为的内在机制进行建模。
首先,对用户在线选择偏好的记忆效应进行了实证分析,并利用马尔科夫过程
对记忆效应的机制进行了建模。在线系统往往允许用户对其选择过的产品进行打
分,而打分可以在很大程度上反映用户对产品的喜好。根据 Correlation Coefficient
方法,我们发现用户的选择序列(用户所选产品的平均分序列)与打分序列(用
户对其选择过产品的打分)都具有较强的记忆效应。不同于随机情况下的指数分
布,这种记忆的长度呈现出幂律形式的分布,也就是说,用户长时间执行相似行
为的可能性要比随机情况高得多。通过利用马尔科夫过程描述用户的选择过程,
并假设其打分行为完全依赖于选择行为,本文建立了偏好模型对用户偏好的记忆
效应进行了刻画。模型中用户的选择行为和打分行为各只有一个参数,通过控制
此参数,便可以重现从幂律分布(强记忆)到指数分布(弱记忆)之间的任意形
式的记忆长度分布。
其次,基于二部分网络模型,分别从网络局部集聚特性以及对产品流行度偏好
的角度研究了在线用户兴趣的表征与作用。对于用户—产品二部分网络这一特殊
系统,传统集聚系数无法描述其集聚特性,因此基于二部分网络的集聚系数
C4
得
以提出。从用户—产品二部分网络的特性出发,一个用户的
C4
可以表征其兴趣的
多样性。根据
C4
对用户兴趣多样性的体现,本文发现兴趣最为广泛以及最为单一
的用户,其活跃度都处于系统最低水平,而系统中最活跃用户的兴趣多样性往往
处于系统中等水平。在对用户行为进行分析的时候,我们不能将同等活跃度的用
户一概而论,因为同样活跃度的用户可能有两种截然不同的兴趣多样性形式——
非常单一或非常广泛。这种现象不仅仅存在于依赖网络结构的兴趣定义方式,用
户基于产品流行度的兴趣多样性也展现出类似的结果。进一步,考虑用户对产品
流行度的偏好,从经典的物质扩散和热传导算法出发,本文提出了非平衡物质扩
散以及非平衡热传导算法。较之经典算法,本文提出的算法在准确率与召回率等
方面都有大幅度的提升。这再一次证明了用户对产品流行度的偏好是用户在线选
择行为的重要驱动力。
最后,研究了产品相似度的稳定性问题。相似度指标虽然可以度量生物、社会、
商务等系统中产品之间的潜在关联关系,但若不稳定则不可信赖并含有大量虚假
信息的。一个好的相似度指标应该在不同时间度量两个固定产品之间的相似度,
得到相同的结果。在用户—产品二部分系统中,本文对 15 种经典相似度指标的稳
定性进行了分析,发现除 Preferential Attachment, Common Neighbor, Adamic-Adar
Index 以及 Resource Allocation Index 指标以外的大部分算法,在使用不同样本数据
时,得出的相似度矩阵可能完全不同,也就是非常不稳定的。从稳定性的角度出
发,结果表明众多的相似度指标可以被分成几个简单的类别,并且同一类别中的
指标基本都具有同样的思想以及相似的数学定义,其在数据量变化时的动态过程
也是一致的。如此一来,对于任意相似度算法,只需要分析其稳定性、确定其分
类,便可以通过对比深入地理解该算法。另外,论文提出 Top-n-stability 方法,在
推荐时只考虑稳定的产品相似度,分析了产品相似度稳定性对推荐结果的影响。
实验表明,不稳定的产品相似度类似于虚假信息,通过剔除该部分信息,可使推
荐结果的稳定性有大幅度的提高。
总之,本文的研究工作对认识人类在线行为有十分重要的意义。围绕在线用户
的选择行为、兴趣偏好以及相似度指标的稳定性等若干方面,本文的研究与结论
能够帮助我们更深入地理解用户的在线行为模式,为设计在线系统、提高在线服
务质量提供了重要的参考价值。也希望本文所提出、研究的问题能够得到更为广
泛的关注,对在线社会系统中的用户行为进行更深入的研究。
关键词:在线社会系统 用户选择行为 记忆效应 用户兴趣 推荐系统
产品相似度 稳定性
ABSTRACT
Beginning with the oracle, ‘know yourself’, thousands years ago, we still have far
way to go on the road of understanding human behavior patterns. Fortunately, the
coming of Internet era, which recorded every detail of our behavior, provides us a great
opportunity to study and uncovering the human behavior’s patterns and mechanisms.
The present thesis focus on the user behavior in the online social systems, which has
theoretical significance for better understanding human behaviors and practical
significance for e-commerce and online service etc.
Firstly, the thesis uncovers and models the memory effect of the online users’
selecting and rating behavior that could reflect the periodical transfer of users’ interest
and tastes. Since users are allowed to rate on objects in many online systems, ratings
can well reflect the users’ preference. According to the method of Correlation
Coefficient, we find the strong memory effect in users’ selecting behavior, which is the
sequence of qualities of selected objects, and the rating behavior, which is the sequence
of ratings delivered by each user. In addition, the memory duration, which is presented
to describe the length of a memory, exhibits the power-law distribution, i.e., the
probability of the occurring of long-duration memory is much higher than that of the
random case that follows the exponential distribution. We further present a preference
model in which a Markovian process is utilized to describe the users’ selecting behavior,
and the rating behavior depends on the selecting behavior. With only one parameter for
each of the user’s selecting and rating behavior, the preference model could regenerate
any duration distribution ranging from the power-law form (i.e. strong memory) to the
exponential form (i.e. weak memory).
Secondly, based on the bipartite network theory, the thesis studies online users’
interest from the prospective of local clustering properties and the object popularity
respectively. The clustering coefficient of bipartite network
C4
, considering the
properties of user-object systems, could describe the diversity of users’ interests. We
find that, one should not classify users only according to their activity levels, because
users with the same activity level may have two totally different interest patterns, one of
which is very concentrate but another is very diverse. Besides in the interest based on
the structure clustering property, similar phenomenon is also found in users’ preference
on object popularity. Thus, according to the preference on object popularity, the thesis
proposes Non-equilibrium Mass Diffusion and Non-equilibrium Heat Conduction
recommendation algorithm. Compared with the classical mass diffusion and heat
conduction method, the presented algorithm could largely improve the accuracy and
diversity of the recommendation. The improvement proves from another perspective
that, the users’ preference is an important force in the evolution of such kind of online
social systems.
At last, the thesis presents and studies the stability problem of object similarity.
Similarity measuring two objects’ potential relation is widely used constructing the gene
co-expression networks, protein-to-protein networks, recommendation systems etc. But
it would be unreliable and contains false information if the similarity is unstable, i.e. the
similarity of a definite pair of objects is measured as different level before and after. In
two online bipartite systems, we evaluate the stabilities of fifteen similarity indexes
when measuring object similarity. Results show that, more data could lead to more
stable evaluations but most indexes except Preferencial Attachment, Common Neighbor
index, Adamic-Adar index and Resource Allocation index, may have quite different
evaluations using different data samples. While there are dozens of similarity indexes,
most of them can be classified into three Clusters from the prospective of stability and
indexes in the same cluster are generally based on the same considerations and have
similar mathematical definitions. When a new index being proposed, one just need to
identify which cluster it belongs to, and then could get deeper insight to this index by
comparing with other indexes in the same cluster. In addition, we develop a
top-n-stability method to study the object similarity stability’s further effect on the
recommendation. We find that, by taking only the stable similarities into account, the
stability, accuracy and diversity of the recommendation could be improved.
Overall, the present thesis is a significant step on the road of understanding human
behavior pattern. Focusing on the online users’ selecting behavior, interest preference
and objects similarity stability, investigations and results in this thesis may shed some
light on both theoretical investigation and practical application, and attract more
attention to get deeper insight in those important problems.
Key Words: Online social system, selecting behavior, memory effect, user interest,
recommendation system, object similarity, stability
目 录
摘要
ABSTRACT
第一章 绪论 .................................................................................................................... 1!
1.1 问题的研究背景和意义 ................................................................................... 1!
1.2 人类行为动力学之线下行为研究综述 ........................................................... 3!
§1.2.1 突发事件中的人类行为 ...................................................................... 3!
§1.2.2 人类行为的时间分布 .......................................................................... 4!
§1.2.3 人类行为的空间分布 .......................................................................... 7!
1.3 在线社会系统中的人类行为动力学 ............................................................... 9!
§1.3.1 社交网络中的用户-用户行为研究进展 ........................................ 10!
§1.3.2 推荐系统中的用户行为分析 ............................................................ 15!
1.4 本文的研究内容和主要贡献 ......................................................................... 18!
第二章 用户选择偏好的记忆效应分析及建模研究 .................................................. 21!
2.1!用户选择行为对产品质量偏好的记忆效应分析 ...................................... 21!
2.1.1 数据集介绍 .......................................................................................... 21!
2.1.2 方法与定义 .......................................................................................... 22!
2.1.3 实证分析结果 ...................................................................................... 24!
2.2!用户在线选择偏好的马尔科夫建模 .......................................................... 27!
2.2.1 基于任务优先级的经典排队模型 ....................................................... 27!
2.2.2 基于马尔科夫过程的偏好模型 .......................................................... 29!
2.2.3 用户活跃度以及间隔时间对模型与实证的影响 .............................. 33!
2.3!本章小结 ...................................................................................................... 35!
第三章 基于用户—产品二部分网络集聚特性及流行度偏好的用户兴趣研究 ...... 37!
3.1!用户-产品二部分网络及经典推荐算法简介 .......................................... 37!
3.1.1 推荐系统的构成 .................................................................................. 37!
3.1.2 推荐系统的复杂网络建模 .................................................................. 38!
3.1.3 经典推荐算法简介 .............................................................................. 39!
3.1.4 几个经典的评价指标 .......................................................................... 41!
3.2!基于集聚系数的用户兴趣多样性分析 ...................................................... 43!
3.2.1 二部分网络集聚系数的定义 .............................................................. 43!
3.2.2 用户—产品二部分网络集聚系数的实证分析 .................................. 44!
3.2.3 随机模型 .............................................................................................. 48!
3.3!考虑流行度偏好的非平衡扩散推荐算法研究 .......................................... 48!
3.3.1 非平衡热传导算法与非平衡物质扩散算法的定义 .......................... 49!
3.3.2 数值实验 .............................................................................................. 52!
3.4!本章小结 ...................................................................................................... 54!
第四章 相似度指标的稳定性及其对推荐的影响 ...................................................... 57!
4.1!常用相似度指标的定义 .............................................................................. 58!
4.2!产品相似度的稳定性分析 .......................................................................... 60!
4.2.1 数据与研究方法介绍 .......................................................................... 60!
4.2.2 产品相似度稳定性结果 ...................................................................... 63!
4.3!产品相似度稳定性对推荐结果的影响 ...................................................... 66!
4.3.1 推荐结果稳定性 .................................................................................. 66!
4.3.1 Top-n-stability 方法 ............................................................................. 68!
4.4!本章小结 ...................................................................................................... 70!
第五章 总结与展望 ...................................................................................................... 72!
5.1!总结 .............................................................................................................. 72!
5.2!展望 .............................................................................................................. 73!
参考文献 ........................................................................................................................ 75!
在读期间公开发表的论文和承担科研项目及取得成果 ............................................ 86!
致谢 ................................................................................................................................ 88!
相关推荐
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 7
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 6
-
VIP免费2025-01-09 7
-
VIP免费2025-01-09 6
作者:侯斌
分类:高等教育资料
价格:15积分
属性:91 页
大小:13.25MB
格式:PDF
时间:2025-01-09