空间数据挖掘中迷糊聚类算法研究

VIP免费
3.0 牛悦 2024-11-19 4 4 637.63KB 66 页 15积分
侵权投诉
1
摘 要
近年来,伴随着空间信息获取技术的快速发展,不同类型的空间数据库以及
空间数据库中的数据量正在不断增长,如何从复杂的空间数据中提取有价值信息
已成为一个非常迫切的问题,空间数据挖掘为解决该问题提供了一种新的思路。
空间数据具有海量、非线性、多尺度、高维和模糊性等复杂性特点。空间数
据挖掘是指从空间数据库中提取用户感兴趣的空间模式与特征、空间与非空间数
据的普遍关系及其它一些隐含在空间数据中的普遍的数据特征,它能挖掘空间关
联规则、特征规则、分类规则和聚类规则等。空间数据挖掘作为数据挖掘的一个
分支,在地理信息系统、遥感、导航、环境研究以及许多使用空间数据的领域中
有着广泛的应用。
本文阐述空间数据挖掘的定义、特点、难点,介绍空间数据挖掘可用的理论
方法与九种典型的空间数据聚类方法,将模糊聚类引入空间数据聚类领域,研究
模糊集理论用于空间数据挖掘的基础理论和方法,包括模糊集的基本概念和性质,
模糊聚类的基本方法和基本步骤,提出了一种基于模糊相似矩阵的空间聚类算法
—模糊相似群聚类算法,该算法能够克服模糊聚类不能处理大规模数据集的缺点,
给出对象间的相似性度量。
关键词:空间数据挖掘 空间聚类 模糊聚类 模糊相似群聚类
2
ABSTRACT
Recently, with the development of spatial data acquisition, such as remote sensing,
network technology, the number and size of different spatial database are rapidly
growing. It has been an extraordinarily exigent problem how to extract valuable
information from the complex spatial data. The technology of spatial data mining (SDM)
provides a new idea to solve the problem.
Spatial data have complexity characters of mass, nonlinearity, multi-scale,
high-dimension, fuzzy etc. Spatial data mining refers to picking up interesting rules
from spatial database, such as spatial patterns and characteristics, the universal relations
of spatial and non-spatial data and other universal implicated in spatial data. It can
mining rules, including association rules, characteristic rules, classification rules,
clustering rules et al. Spatial Data Mining, one branch of Data Mining, has been widely
applied in the geographical information system, remote sensing, navigation,
environmental research as well as other field.
This article, on the first hand, state Spatial Data Mining’s definition, characteristics
and difficulties, then introduces its approaches and nine kinds of spatial data clustering
approaches. The basic theory and techniques of Fuzzy Sets including the basic notion
and character, A new spatial clustering algorithm-“Fuzzy Similar Brunch Clustering
Algorithm”, based on Fuzzy Similar Matrix is set forward.
Key word: Spatial Data Mining, Spatial Clustering, Fuzzy clustering,
Fuzzy Similar Brunch Clustering
1
目 录
摘要
ABSTRACT
第一章 绪 论 ······················································································ 1
§1.1 ·····················································································1
§1.2 空间数据挖掘的研究现状与未来发展趋势 ······································2
§1.3 本文研究的主要内容 ································································· 4
第二章 空间数据挖掘 ············································································5
§2.1 空间数据挖掘的概念和意义 ························································5
§2.2 空间数据挖掘的对象、特点、任务和难点 ······································5
§2.2.1 空间数据挖掘的对象和特点 ················································· 5
§2.2.2 空间数据挖掘的任务和难点 ················································· 6
§2.3 空间数据挖掘可用的理论方法 ·····················································9
§2.4 空间数据预处理 ·····································································15
§2.4.1 空间数据清理 ·································································· 15
§2.4.2 数据集成 ······································································· 17
§2.4.3 数据归约 ······································································· 17
§2.4.4 数据变换 ······································································· 18
第三章 空间聚类 ················································································ 20
§3.1 空间聚类的概念 ····································································· 20
§3.2 空间聚类的评价标准和选择标准 ················································ 20
§3.3 相似度量方法 ·········································································21
§3.3.1 对象的距离 ····································································· 21
§3.3.2 对象的相似系数 ······························································· 22
§3.3.3 属性的关联系数 ······························································· 25
§3.4 空间聚类的方法 ····································································· 26
§3.4.1 划分聚类算法 ·································································· 26
§3.4.2 分层聚类算法 ·································································· 27
§3.4.3 基于密度的聚类算法 ·························································28
§3.4.4 基于网格的聚类算法 ·························································28
§3.4.5 基于模糊数学的方法 ·························································30
2
§3.4.6 基于模型的聚类 ······························································· 30
§3.4.7 基于数学形态学的聚类 ······················································30
§3.4.8“数据场-云”聚类 ····························································31
§3.4.9 基于遗传算法的聚类 ·························································31
§3.5 空间聚类分析算法存在的问题 ··················································· 32
第四章 模糊集与模糊聚类 ···································································· 34
§4.1 模糊集 ··················································································34
§4.1.1 模糊集的隶属度和表示方法 ················································35
§4.1.2 模糊集的模糊关系 ···························································· 35
§4.1.3 模糊矩阵 ········································································ 36
§4.1.4 模糊等价矩阵及其性质 ······················································37
§4.1.5 模糊相似矩阵及其性质 ······················································38
§4.1.6 模糊集的转换 ·································································· 38
§4. 2 模糊聚类分析的方法 ·······························································39
§4. 3 模糊相似群聚类算法 ·······························································46
§4. 3.1 模糊相似群聚类算法的思想 ··············································· 46
§4. 3.2 模糊相似群聚类算法的计算步骤 ········································· 46
§4. 3.3 模糊相似群聚类算法的性能分析 ········································· 48
第五章 模糊相似群聚类在行业工资分群中的应用 ······································ 49
§5.1 数据挖掘对象 ·········································································49
§5.2 空间数据挖掘的系统结构与聚类分析流程 ···································· 49
§5.3 数据预处理 ············································································50
§5.4 行业分群 ···············································································51
第六章 总结与展望 ············································································· 57
§6.1 总结 ·····················································································57
§6.2 研究展望 ···············································································57
参考文献 ··························································································· 59
在读期间公开发表的论文和承担科研项目及取得成果 ································· 63
致谢 ································································································· 64
第一章 绪 论
1
第一章 绪 论
现代空间数据获取技术和计算机网络等技术的迅速发展,使得不同类型的空
间数据库以及数据库中的数据量正在快速增长。虽然这些空间数据满足了人类研
究地球资源和环境的潜在需要,拓宽了可供利用的信息源,但是空间数据的复杂
性和大小远非一般的事务型数据所能企及,而目前的空间数据处理手段又相对滞
后,大量的空间数据只能被束之高阁,无法满足人们认识自然和推进社会可持续
发展的需要。同时,研究专家系统的科学技术人员在使用传统的技术方法总结和
表述经验规则,从外部输入系统形成知识库时,由于规则的复杂性、模糊性和难
以表达性,往往会遇到严重的困难。如何从这些海量数据中提取有用的信息和知
识成为科学家研究的重点。空间数据挖掘就是针对“数据爆炸但是知识贫乏”
一特殊现象而提出的。
§1.1 引 言
空间数据的采集、存储和处理等现代技术设备的迅速发展,使得空间数据的
复杂性和数量急剧膨胀,远远超出了人们的解释能力。比如计算机芯片集成度、
CPU 处理能力和通信信道传输速率平均每 18 个月翻一番。在美国,广播达到 5000
万用户用了 38 年,电视用了 13 年,而因特网络仅用了 4年,全球 IP 网则以每 6
个月翻番的速度发展。大量的数据通过文字、图像表影、多媒体等手段被送到计
算机中,数据库体之大可以达到上千个 TB 的量级。空间数据作为表达空间实体(地
物、地物属性、地物关系等)的首选载体,具有越来越重要的作用,是人们借以
认识自然和发展社会经济的重要数据。在国民经济建设和社会发展的实践活动,
以及人们日常所接触和利用的现实世界数据中,大约有 80%与地理位置和属性与
其分布有关。在计算机、网络、全球定位系统Global Positioning System, GPS
遥感(Remote Sensing, RS)和地理信息系统(Geographical Information System, GIS)
技术于空间数据的应用和发展中,空间数据的数量、大小和复杂性及其传输的速
度都在飞速增长。而且空间数据的膨胀速度也极大地超出了常规的事务型数据
高分辨率、高动态的新型卫星传感器不仅波段数量多、光谱分辨率高、数据速率
高、周期短,而且数据量特别大,一般情况下数据容量均在千兆量级以上。仅
EOS-AM1 PM1 每日获取的遥感空间数据量就以 TB 级计算。现在,每时每刻,
现代技术设备都在采集和产生新的数据,同时也在存储和积累已经存在的数据。
这些数据极大地满足了人类研究地球资源和环境的潜在需要,拓宽了可供利用的
空间数据挖掘中模糊聚类算法研究
2
信息源。
空间数据库的空间数据具有许多独有的特性,它带有拓扑、方位和()距离信
息,通常以复杂的多维空间索引结构组织,通过空间数据存取方法存取,常常需
要空间推理、几何计算和空间知识表示等技术。在一般的关系数据库中,学习的
属性直接取自字段,或者可以通过字段的简单数学或逻辑运算派生出。空间数据
库中图形的几何特征和空间关系等一般并不直接存储于数据库中,而是隐含在多
个图层的图形数据中,需要经过专有的空间运算和空间分析才能得到归纳学习用
的属性。这是空间数据挖掘区别于一般关系数据库中的数据挖掘的主要特征,需
要修改现有的数据挖掘算法使其适合在空间数据库中进行挖掘。现有的数据挖掘
算法大多是针对关系数据库设计的。关系数据库是结构化的,以数据库的记录作
为元组,字段作为属性,便于数据挖掘算法的实施。而空间数据库比一般的关系
数据库要复杂得多,既有空间数据又有属性数据。其中空间数据是一种非结构化
的数据。它既有矢量数据又有栅格数据,所以在空间数据库中实施数据挖掘比在
一般的关系数据库中实施要复杂。由于空间数据的复杂性及其应用的专业性,不
能简单地把空间数据库看成数据库技术的应用领域。同样地,空间数据挖掘也不
能简单地被看成是数据挖掘的应用领域,而应该是在一般数据挖掘的基础上,研
究空间数据挖掘特有的理论、方法和应用以及原有数据挖掘方法在空间数据库的
实现。在空间数据库中进行数据挖掘,需要克服使用单一技术的缺陷。空间数据
挖掘是多学科和多技术交叉综合的新领域,它综合了机器学习、遥感、可视化等
领域的有关技术,因此,进行空间数据挖掘需要各个领域的有关技术互相融合。
§1.2 空间数据挖掘的研究现状与未来发展趋势
数据挖掘是 20 世纪 80 年代出现的一个年轻的研究领域[1-3]。 对空间数据挖掘
的研究则更晚一些,只是当数据挖掘研究者把方法集中在关系数据库的挖掘后才
开始的。数据挖掘的研究最初是研究如何从关系数据库、事务数据库和时间序列
等大数据集中发现有用的数据[4-5]。随着遥感、地理信息系统、空间信息获取技术
和网络技术的迅猛发展以及数据的长期积累[6]传统的方法已经无法满足空间数据
处理、分析和应用的需求,为解决因海量空间数据所带来的新困难,人们引进了
数据挖掘和知识发现的理论和方法。特别是探讨如何在模糊概念层次和模糊空间
关系层次下实施模糊空间数据挖掘和知识发现,成为近年来的研究焦点[7-10]
目前,空间数据挖掘技术的研究方兴未艾,在我国更是刚刚起步。空间数据
挖掘已经渗透到数据挖掘和知识发现、以及地球空间信息学等相关学科的学术活
动中。空间数据挖掘起源于国际 GIS 会议,在遥感(RS)、地理信息系(GIS)、全
摘要:

1摘要近年来,伴随着空间信息获取技术的快速发展,不同类型的空间数据库以及空间数据库中的数据量正在不断增长,如何从复杂的空间数据中提取有价值信息已成为一个非常迫切的问题,空间数据挖掘为解决该问题提供了一种新的思路。空间数据具有海量、非线性、多尺度、高维和模糊性等复杂性特点。空间数据挖掘是指从空间数据库中提取用户感兴趣的空间模式与特征、空间与非空间数据的普遍关系及其它一些隐含在空间数据中的普遍的数据特征,它能挖掘空间关联规则、特征规则、分类规则和聚类规则等。空间数据挖掘作为数据挖掘的一个分支,在地理信息系统、遥感、导航、环境研究以及许多使用空间数据的领域中有着广泛的应用。本文阐述空间数据挖掘的定义...

展开>> 收起<<
空间数据挖掘中迷糊聚类算法研究.pdf

共66页,预览7页

还剩页未读, 继续阅读

作者:牛悦 分类:高等教育资料 价格:15积分 属性:66 页 大小:637.63KB 格式:PDF 时间:2024-11-19

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 66
客服
关注