对股票市场信息的文本挖掘

VIP免费
3.0 侯斌 2024-11-19 4 4 949.53KB 81 页 15积分
侵权投诉
摘 要
股票市场是投资人和筹资人集结的场所,各种社会经济信息都会综合地反映
到股票市场行情的变化上,而另一方面,股票市场的涨跌起伏对企业、国家乃至
世界经济也都会产生连带的影响。因此,长久以来,对股票市场的研究一直是国
内外学者专家的热点课题。本文重点关注流通在股票市场上的各种经济信息,希
望能够借助新兴的技术帮助信息需求者从信息包围中突围。
众所周知,在信息化的今天,来自多种渠道的信息量非常巨大,而其中孰重
孰轻、孰是孰非让信息需求者很难分辨,众多信息对股票市场有何影响,影响程
度为何也都不能轻易得知,信息在为我们分析问题解决问题时起到的作用大大打
折。作为从浩瀚的信息资源中发现潜在的、有价值知识的一种有效技术,文本挖
掘技术正逐步兴起,倍受关注。文本挖掘技术的产生,让人们关注到能否将文本
挖掘应用到股票市场信息处理上,利用文本挖掘能够在海量的、非结构化的信息
中发现有用的,先前未知的模式或关系的特点,对股票市场上纷繁的信息进行分
析,从中得出一定的模式,生成一定的信息产品。
文章重点就金融新闻和上市公司披露信息进行的文本挖掘分析和实验。从文
本挖掘的相关理论入手,将理论与实际相结合,在陈述了相关理论的同时也将理
论应用到实际的具体实现方法作以交待,在技术层面上得到了很好的理论支持。
文章第三章节安排了与股票市场信息相关的理论陈述,通过该章节,可以帮助了
解影响股票市场的因素,确定在网络上寻找到可用的信息资源。特别的,该章节
还重点陈述的信息披露的相关理论,指出信息披露的格式性,该性质为实现信息
抽取奠定了重要的基础。文章第四章陈述了文本挖掘平台的系统构架,详细描述
了所设计的文本挖掘系统的系统功能结构图以及数据流程图,为平台的真正实现
构建了良好的中间价。章节还对一些具体实现所设计的关键技术如分词、分类预
测、信息抽取等作了特别的介绍。第五章是本文的实验章节,也是全文的重点章
节,研究设计两个实验,并具体列示相关的实验数据,得出了良好的实验结论。
本文将文本挖掘技术应用于股票市场信息的分析之中,构建了实现平台的框
架和一些涉及具体实现的关键步骤,通过实验所产生的信息产品对推测股市走势
和进行深层分析都取得了一定的效果。
关键词:文本挖掘 分类预测 信息抽取 金融新闻 信息披露 股票市场
ABSTRACT
Stock market is the place where the investors and fund raisers are massed together.
All of the social economic information will be synthetically reflected in the change of
stock market. For another, the ups and downs of stock market will affect the economy of
corporation, nation and even the world. Therefore, the research of the stock market has
been the hot topic of experts and scholars from home and abroad. This paper concerns
primarily with kinds of information circulated in the stock market and hopes to help the
information seekers breakthrough in the mass information.
As we know, nowadays the quantity of information from different ways is very big.
But it is very difficult for the information seekers to know right from wrong. And it is
also difficult to find the influence and its degree of the information to the stock market.
The information cannot work effectively when we analyze and resolve the problems. As
an effective technology to find potential and valuable knowledge in the vast information
resource, the text mining technology is rising up step by step and has received a great
deal of attention. The born of text mining technology make people pay attention to
whether the text mining can be used to the process of stock market information. By
using the text mining technology, we can find the useful and unknown mode or relation.
We analyze the mass information of the stock market to get some mode from it and
form some information product.
The paper emphasizes on text mining analysis and experiment of finance news and
the information, which extracted by the quoted company. Firstly, the paper applies the
text mining theories to practice, and get good academic support on the technical level.
Chapter three arranges the academic presentation of stock market. From this chapter, we
can know the infectors of influencing stock market and confirm to find the useful
information resource in the internet. Especially, the related theories of information
disclosure which the chapter states chiefly point out the formation of the information
disclosure. The formation lays good foundation to achieve information extraction.
Chapter four presents the system architecture of the text mining platform and describes
the designed functional configuration picture and data flow chart of text mining system
in detail. It builds good middleware to the real implement of the platform. The chapter
also introduces some key technologies of the concrete implement such as participle,
correlation analysis and information extraction. Chapter five is the experimental part of
the paper, which is also the most important part. In this chapter, two experiments are
designed. The related experimental data is presented and the good result is gotten.
Paper applies the text mining technology to the analysis of stock market information.
The frame of the implemented platform and some key steps related to the concrete
implement are built. It makes some effects to presume the stock market performance
and do deep-analysis by the information products produced in the experiments.
Keywords: text mining, classification analysis, forecast, information
extraction, finance news, information disclosure, stock market
目 录
中文摘要
ABSTRACT
第一章 绪 论 ...................................................................................................................1
§1.1 选题依据 ...........................................................................................................1
§1.2 研究领域发展现状 ...........................................................................................2
§1.3 本文工作概述和论文结构 ...............................................................................5
§1.3.1 本文工作概述 ................................................................................................5
§1.3.2 论文结构 ........................................................................................................5
第二章 理论背景 .............................................................................................................7
§2.1 文本挖掘技术 ...................................................................................................7
§2.1.1 文本挖掘概述 ................................................................................................7
§2.1.2 文本挖掘的重要技术 ....................................................................................7
§2.1.3 文本挖掘的一般过程 ....................................................................................8
§2.1.4 文本挖掘的主要应用 ....................................................................................8
§2.1.5 文本的预处理 ................................................................................................9
§2.1.6 文本分类 .......................................................................................................12
§2.1.6.1 文本分类的概述 ........................................................................................12
§2.1.6.2 文本分类的方法 ........................................................................................12
§2.1.6.3 分类技术的应用 .......................................................................................14
§2.1.7 分类技术在股票市场信息挖掘中的应用 ...................................................14
§2.1.7.1 应用方式概述 ...........................................................................................14
§2.1.7.2 KNN 算法在本文中的实现 ......................................................................15
§2.1.8 信息抽取 ......................................................................................................17
§2.1.8.1 信息抽取技术概述 ...................................................................................17
§2.1.8.2 信息抽取的关键任务 ...............................................................................17
§2.1.8.3 信息抽取技术的应用 ...............................................................................18
§2.2 网络相关技术 .................................................................................................19
§2.2.1 概述 ..............................................................................................................19
§2.2.2 网络爬虫技术 ..............................................................................................19
§2.2.2.1 网络爬虫技术概述 ....................................................................................19
§2.2.2.2 web 文本降噪 ............................................................................................20
§2.2.3 RSS 技术 ...................................................................................................... 20
4
§2.2.3.1 RSS 技术概述 ........................................................................................... 20
§2.2.3.2 RSS 文档的解析技术 ............................................................................... 21
§2.3 本章小结 ..........................................................................................................22
第三章 股票市场信息分析 ...........................................................................................23
§3.1 股票市场信息分析 .........................................................................................23
§3.2 影响股票市场价格的因素 .............................................................................23
§3.3 信息披露 .........................................................................................................25
§3.3.1 信息披露的定义 ..........................................................................................25
§3.3.2 信息披露的内容 ..........................................................................................26
§3.3.3 信息披露的方式 ..........................................................................................27
§3.3.4 信息披露的用处 ..........................................................................................27
§3.3.5 信息披露存在问题 ......................................................................................28
§3.3.6 上市公司信息披露 ......................................................................................29
§3.3.6.1 背景概述 ....................................................................................................29
§3.3.6.2 上市公司年度报告披露的格式 ................................................................29
§3.3.7 上市公司年度报告分析 ..............................................................................30
§3.3.7.1 上市公司年报分析综述 ............................................................................30
§3.3.7.2 上市公司基本面分析 ...............................................................................31
§3.3.7.3 基本面信息收集 .......................................................................................32
§3.3.8 上市公司信息披露之临时报告分析 ..........................................................33
§3.3.8.1 背景概述 ....................................................................................................33
§3.3.8.2 上市公司股票异常波动的披露与分析 ....................................................33
§3.3.8.3 上市公司股票异常波动信息披露的内容与格式 ...................................33
§3.4 本章小结 .........................................................................................................34
第四章 系统结构与关键技术 .......................................................................................35
§4.1 系统功能 ..........................................................................................................35
§4.2 系统设计 .........................................................................................................35
§4.2.1 系统功能结构图 ..........................................................................................35
§4.2.2 系统结构分层 ...............................................................................................35
§4.2.3 数据流程图 ..................................................................................................36
§4.3 文本预处理的具体实现 .................................................................................38
§4.3.1 词库的构建 ..................................................................................................38
§4.3.2 分词技术的实现 ..........................................................................................40
§4.3.3 文本表示 ......................................................................................................41
§4.4 分类预测的具体实现 .....................................................................................42
§4.4.1 分类预测的实现的执行流程 ......................................................................42
§4.4.2 统计关键词权重 ..........................................................................................44
§4.5 信息抽取的具体实现 .....................................................................................45
§4.5.1 信息抽取的相关说明 ..................................................................................45
§4.5.2 信息抽取实现的执行流程 ...........................................................................46
§4.5.3 目标模板的设计 ...........................................................................................47
§4.5.4 命名实体识别 ...............................................................................................47
§4.5.5 信息抽取模板设计 .......................................................................................47
§ 4.6 本章小结 .........................................................................................................48
第五章 实验与结果分析 ...............................................................................................49
§5.1 实验环境 ..........................................................................................................49
§5.2 实验设计 .........................................................................................................49
§5.2.1 分类预测实验设计 ......................................................................................49
§5.2.2 信息抽取实验设计 ......................................................................................53
§5.3 实验结果与分析 .............................................................................................57
§5.3.1 分类预测实验结果分析 ..............................................................................57
§5.3.2 信息抽取实验结果分析 ...............................................................................57
§5.4 本章小结 ..........................................................................................................58
第六章 总结及待改进地方 ...........................................................................................59
§6.1 总结 .................................................................................................................59
§6.2 存在问题及待改进地方 .................................................................................59
附录 1..............................................................................................................................61
附录 2..............................................................................................................................72
参考文献 .........................................................................................................................73
在读期间公开发表的论文 ............................................................................................ 76
.............................................................................................................................77
摘要:

摘要股票市场是投资人和筹资人集结的场所,各种社会经济信息都会综合地反映到股票市场行情的变化上,而另一方面,股票市场的涨跌起伏对企业、国家乃至世界经济也都会产生连带的影响。因此,长久以来,对股票市场的研究一直是国内外学者专家的热点课题。本文重点关注流通在股票市场上的各种经济信息,希望能够借助新兴的技术帮助信息需求者从信息包围中突围。众所周知,在信息化的今天,来自多种渠道的信息量非常巨大,而其中孰重孰轻、孰是孰非让信息需求者很难分辨,众多信息对股票市场有何影响,影响程度为何也都不能轻易得知,信息在为我们分析问题解决问题时起到的作用大大打折。作为从浩瀚的信息资源中发现潜在的、有价值知识的一种有效技术...

展开>> 收起<<
对股票市场信息的文本挖掘.pdf

共81页,预览9页

还剩页未读, 继续阅读

作者:侯斌 分类:高等教育资料 价格:15积分 属性:81 页 大小:949.53KB 格式:PDF 时间:2024-11-19

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 81
客服
关注