协同过滤推荐算法研究

VIP免费

3.0 陈辉 2024-11-19 5 4 964.5KB 55 页 15积分

侵权投诉

摘要

当前，Web 技术已经成为人们获取信息的重要途径。但随着互联网技术的迅

猛发展，网络中的用户和数据量的快速增长，人们搜索满足自己需要的信息变得

异常困难。如何能有效地帮助用户方便、快捷的获取其需要的资源，成为国内外

广大研究者共同关注的热点。推荐系统就是在这样的背景下产生的。推荐系统就

是根据用户目前或曾经与其他相似用户之间的信息交互推断出用户的实际需要

并把结果输出给目标用户。推荐算法是推荐系统的核心，本文按推荐算法的不同，

分别介绍了基于内容的推荐算法、协同过滤推荐算法、混合推荐算法及刚兴起的

基于复杂网络的推荐算法。其中，协同过滤推荐算法是当前应用最为普遍的推荐

算法。由于受到系统大数据量和评分数据稀疏性的影响，推荐算法的复杂性增大，

推荐精度下降。传统基于项目聚类的协同过滤推荐算法在一定程度上提高了算法

扩展性和推荐精度，但推荐精度仍较低。

本文针对目前协同过滤算法的不足，分析了用户评分数据的高维稀疏性，在

传统高维数据聚类算法研究基础上，采用数据稀疏相似度与项目类别结合的集合

相似度计算方法对项目聚类;然后在该项目聚类基础上计算用户相似性、搜索最

近邻用户、预测评分和产生推荐。

实验选用公开的 MovieLens 数据集作为实验数据，设计并完成了基于稀疏相

似度与项目类别的项目聚类(ICBSSIG 算法)基础上的协同过滤推荐算法的验证

实验，并和传统以余弦相似性为度量标准的协同过滤算法作了比较。通过对测试

结果的分析，验证了该算法考虑了项目类别和用户评分的影响，提高了项目相似

度计算和兴趣预测的精确性，查找的最近邻也更准确，提高了在用户评价数据稀

疏条件下的推荐质量。对比实验还表明了改进的推荐算法在推荐精度和执行效率

上都优于传统协同过滤算法、基于 K-means 聚类的协同过滤推荐算法及传统基于

项目聚类的协同过滤算法。同时，本文还就算法中的部分输入参数作了灵敏度分

析实验，实验分别检验了项目类别因子、集合相似度阈值及最近邻搜索个数等参

数对推荐结果的影响。算法实验结果验证了基于稀疏相似度与项目类别的项目聚

类基础上的协同过滤推荐算法的可行性和有效性。

关键词：推荐系统协同过滤 ICBSSIG 算法项目聚类

ABSTRACT

Nowdays, web technology has become an important way by which people can

obtain information. However, with the rapid development of internet technology, the

users and dataset grow sharply in net, it is a fact that people search for information

which they need becomes extremely difficult. How to effectively help users attain the

resources easily and fast has become the focus of research by experts and scholars

who are domestic and abroad. Recommendation system was produced in this situation

before some years ago. Recommendation system aim to infer the actual needs of

customers and output the results according to the information exchanged with other

customers from the past to present. Recommendation algorithm is the core of

recommendation system. Content-based recommendation algorithm, collaborative

filtering recommendation algorithm, hybrid recommendation algorithm and complex

network based recommendation algorithm that have been proposed in order to make

better recommendation result. The collaborative filtering recommendation algorithm

is one of the most successful recommendation algorithm in the application. However,

according to the influence of the sparsity of dataset, the complexity of algorithm

increased and the accuracy of recommendation system decreased. The traditional item

clustering algorithms have some advantages in algorithm scalability and the accuracy

of recommendation results. but the recommendation precision still needs to be

improved.

By discussing the shortcomings of current collaborative filtering algorithm,

researching the sparse data in recommendation system, analyzing high dimensional

sparse clustering methods, the thesis proposes a new item clustering algorithm based

on sparse similarity and item genre(ICBSSIG).the thesis uses ICBSSIG algorithm in

item clustering, then calculates item similarity and searches for k-nearest neighbors of

target item based on the outcome of item clustering, and finally makes rating forecast

and generates recommendation outcome.

Constructing experimental data from the public MovieLens datasets, with the

datasets we accomplish the experiment of item clustering algorithm based on sparse

similarity and item genre(ICBSSIG),and then achieve collaborative filtering algorithm.

Comparing the results with the collaborative filtering algorithm, which take cosine

similarity as measuring standard of similarity. Through analysising the experimental

outcome, it is verified that the improved algorithm takes attributes of item and user’s

appraisal, so the algorithm increases the measuring accuracy of interests prediction

and items similarity, and calculate the nearest neighbors more precisely. Even if user

evaluation is very sparse, the improved algorithm also can attain better accuracy of

recommendation. The experimental results also show the improved collaborative

filtering algorithms not only overcome the inferior quality of traditional collaborative

filtering algorithms, collaborative filtering algorithm based on K-means algorithm and

collaborative filtering algorithms based on item clustering, but also have some

advantages in accuracy and efficiency. Besides, the thesis designs some parameters

sensitivity analysis experiments. The experiments confirm the influence of item genre

factor, sparse feature similarity threshold, searching numbers for nearest neighbors to

recommendation quality. Experimental results show the algorithm for collaborative

filtering algorithms to improve the feasibility and effectiveness.

Key Word: Recommendation system, Collaborative filtering, ICBSSIG

algorithm, Item clustering

摘要

ABSTRACT

第一章绪论 .................................................... 1

§1.1 研究背景 ................................................. 1

§1.2 研究意义 ................................................. 2

§1.3 研究内容 ................................................. 3

§1.4 论文框架 ................................................. 3

第二章推荐系统及复杂网络理论概述 ................................ 5

§2.1 推荐系统概述 ..............................................5

§2.1.1 推荐系统概念 ..........................................5

§2.1.2 推荐系统研究内容 ......................................5

§2.1.3 推荐系统的组成结构 ....................................5

§2.1.4 推荐算法分类 ..........................................6

§2.2 复杂网络理论及应用 ....................................... 8

§2.2.1 复杂网络统计性质 ..................................... 8

§2.2.2 基于二部图网络的推荐算法 ............................. 9

§2.3 国内外研究状况 ...........................................10

§2.3.1 国内研究现状 .........................................10

§2.3.2 国外研究现状 .........................................10

§2.3.3 目前推荐系统的挑战 ...................................11

§2.4 本章小结 .................................................11

第三章协同过滤算法和聚类研究 ................................... 13

§3.1 协同过滤算法介绍 .........................................13

§3.1.1 基于用户的协同过滤算法 ...............................14

§3.1.2 基于项目的协同过滤算法 .............................. 16

§3.1.3 协同过滤算法的优缺点分析 ............................ 19

§3.2 聚类分析 ................................................ 20

§3.2.1 聚类概念及常用算法 .................................. 20

§3.2.2 高维稀疏数据聚类算法 ................................ 24

§3.3 基于项目聚类的协同过滤算法 ...............................25

§3.4 基于用户聚类的协同过滤算法 ...............................26

§3.5 本章小结 .................................................27

第四章基于项目聚类的协同过滤推荐算法 ........................... 28

§4.1 算法提出的原因 ...........................................28

§4.1.1 算法提出的背景 ...................................... 28

§4.1.2 算法要解决的问题 .................................... 28

§4.2 稀疏相似度 .............................................. 29

§4.2.1 数据稀疏特征及相似度描述 ............................ 29

§4.2.2 评分因子 .............................................31

§4.3 项目类别属性 .............................................32

§4.3.1 项目类别属性简述 .................................... 32

§4.3.2 项目类别因子 ........................................ 33

§4.4 集合相似度计算 .......................................... 33

§4.5 基于稀疏相似度和项目类别的项目聚类算法步骤 .............. 35

§4.6 基于项目聚类的协同过滤推荐算法 .......................... 36

§4.6.1 改进的协同过滤算法流程图 ............................ 36

§4.6.2 算法实施步骤 ........................................ 37

§4.7 本章小节 .................................................38

第五章实验与结果分析 ........................................... 39

§5.1 实验平台和试验数据 .......................................39

§5.2 试验结果度量标准 ........................................400

§5.2.1 兴趣预测度量标准 .....................................40

§5.2.2 推荐结果度量标准 .....................................40

§5.3 对比实验 .................................................41

§5.3.1 与基于 K 均值项目聚类的协同过滤算法对比分析 .......... 41

§5.3.2 全数据集上对比实验 ...................................42

§5.4 参数灵敏度分析 .......................................... 42

§5.4.1 项目集相似度阈值 .....................................43

§5.4.2 最近邻搜索个数 ...................................... 44

§5.5 实验小结 ................................................ 44

第六章结论与展望 ............................................... 46

§6.1 本文总结 .................................................46

§6.2 展望 .................................................... 46

参考文献 ....................................................... 477

在读期间公开发表论文和承担科研项目及取得的成果 ................. 511

致谢 ......................................................... 52

第一章绪论

§1.1 研究背景

随着网络的发展和普及，互联网信息已经慢慢融入到人们的生活之中。由于

网络信息不断更新和增加，每天互联网中的信息呈直线上升，且其组织异构、分

布多元化。由于 Web 信息数据急剧增长，广大用户不得不花去大量的时间在网络

中搜索需要的信息。搜索引擎虽然是使用最多的信息检索工具，但是由于其通用

性，不能满足人们个性化的需求。如何对网络中海量的数据进行有效的组织管理，

方便有效的帮助用户找到对其有价值的知识和数据，以满足不同用户的个性化的

需求。个性化推荐系统就是在这样的背景下产生的，它是数据挖掘的一个新的研

究分支，主要对网络上的各种数据，利用数据挖掘技术方法发掘各种隐含的知识

和信息。推荐系统能根据用户的兴趣爱好向其推荐感兴趣的信息或服务，有利于

促进交易和服务。推荐系统可大大节省用户寻找信息的时间，企业也可利用其收

集和反馈信息提高经营、管理能力。个性化推荐系统的出现为解决网络数据超载

提供了极其强有力的工具，它可从数以千万的信息中过滤出与用户兴趣相关的有

用的部分推荐给用户，有效的指导网络用户的决策行为。

按照推荐对象的特性，推荐系统可分成以网页为推荐对象的搜索系统(采用数

据挖掘技术为用户推荐有用的网页)和以商品为对象的推荐系统(通过分析用户的

兴趣向其推荐感兴趣的商品或信息服务)两种类型。

在社会经济需求和网络技术发展的大力推动下，国外最早从九十年代初期就

开始对推荐系统进行了广泛的研究，尤其在电子商务推荐技术方面的研究取得了

丰硕的理论和应用成果。但是我国在这方面的研究起步较晚，相对成果也较少。

那时对推荐技术的研究主要集中在基于内容的信息过滤、协同过滤等推荐算法。

基于协同过滤的推荐技术是目前应用最为广泛的推荐技术之一。协同过滤推

荐系统亦称基于用户相关性的推荐系统，其是基于这样的假设：若两个用户对系

统某些项目的评分比较相似，则他们对系统中其他的项目评分也比较相似。协同

过滤技术虽然应用较为普遍且取得了极大的成功，但是系统数据量的迅猛增加，

传统的协同过滤技术仍然面临如下问题[1]：

(1) 数据稀疏性问题

协同过滤算法首先要将用户的兴趣信息构建成用户—项目评分矩阵，则计算

用户兴趣相似性可以转化成计算用户—项目评分矩阵的相似性。在实际的很多推

荐系统中，涉及到用户的信息量相当有限。一般用户评分的项目只占系统项目总

量的 1%—2%，当在线项目相当多而用户又特别稀少的问题尤为突出，造成了用户

评分矩阵极其稀疏。在这种情况下计算相似性和寻找最近邻变的异常的困难。

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

15 积分 4人已下载

立即下载 VIP免费下载

摘要：

摘要当前，Web技术已经成为人们获取信息的重要途径。但随着互联网技术的迅猛发展，网络中的用户和数据量的快速增长，人们搜索满足自己需要的信息变得异常困难。如何能有效地帮助用户方便、快捷的获取其需要的资源，成为国内外广大研究者共同关注的热点。推荐系统就是在这样的背景下产生的。推荐系统就是根据用户目前或曾经与其他相似用户之间的信息交互推断出用户的实际需要并把结果输出给目标用户。推荐算法是推荐系统的核心，本文按推荐算法的不同，分别介绍了基于内容的推荐算法、协同过滤推荐算法、混合推荐算法及刚兴起的基于复杂网络的推荐算法。其中，协同过滤推荐算法是当前应用最为普遍的推荐算法。由于受到系统大数据量和评分数据稀...

展开>> 收起<<

协同过滤推荐算法研究.pdf

共55页,预览6页

还剩页未读，继续阅读

协同过滤推荐算法研究

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

推荐作者

热门标签

举报选择: