语音识别中的动态时间规整与矢量量化技术

VIP免费

3.0 陈辉 2024-11-20 5 4 990.75KB 48 页 15积分

侵权投诉

摘要

语音识别的研究工作开始于上世纪的 50 年代，经过了近 60 年的不断发展，目

前已在很多关键技术领域取得重大成果。该技术被广泛应用，很多相关产品不断推

向市场。语音识别技术改变了传统的人机交互方式，极大方便了人们的工作和生活。

目前孤立词语音识别技术最为成熟，应用最广。虽然它在很多场合下识别率比较高，

但仍存在着鲁棒性差，容易受随机因素影响等缺点，因此继续深入研究孤立词语音

识别很有必要。

论文在研究和分析了现有语音识别算法的基础上，实现了 4个英文单词的孤立

词语音识别系统。对动态时间规整算法提出了改进的识别算法；对矢量量化技术分

析了其在语音识别中的具体应用,同时通过在 Matlab 上仿真,分析码本尺寸对语音

识别率的影响并比较不同算法的运行时间；最后在 arm 平台上做算法移植。

改进的 DTW 识别法基于统计学的思想，它先计算测试模板与每一个单词的多

个参考模板之间的失真距离，再求取每个单词的平均值，把偏离平均值较远的数据

替换成平均值，然后再计算新的整体平均值，最后对平均值排序，取最小值作为识

别结果。它一定程度上克服了随机因素带来的影响，提高了识别率。对 4个孤立英

文单词，用传统的 DTW 识别方法得到的识别率分别为：93%、82%、93%和69%，

改进的 DTW 识别法的识别率则为：95%、90%、94%和91%。

对应用矢量量化技术的语音识别，介绍了矢量量化的基本原理、最佳码本设计

和识别的一般过程，分析码本尺寸对识别结果的影响。对课题中特定的 4个英语单

词，对码本尺寸分别取 4、8和16 时的识别性能作了比较。结论是：码本尺寸在一

定范围内，值越大识别率越高，但取太大值后不但运算量会增加，识别率并不能提

高，还可能下降。

论文最后的工作是做工程实现。先用 Qt 设计图形用户界面，然后在 mini2440

开发板上做算法移植，达到了预期目的并具有一定的实用价值。

关键词：语音识别特征提取动态时间规整矢量量化 ARM

ABSTRACT

The research of speech recognition started in the 50's of last century. For nearly 60

years of continuous development, significant achievements have been made in many

key technology areas. This technology is used widely and many products related to it

have entered the market. Speech recognition technology changes the traditional

Human–Computer Interaction way and offers great convenience to people’s work and

life. Recently, isolated word speech recognition is the most mature and widely used.

Although it has a high recognition rate in common situation, its robustness is still poor

and is easily affected by the random factors. So it is necessary to continue further

research on isolated word speech recognition.

The paper realizes a four isolated words recognition system by analyzing the

current speech recognition algorithms. It proposes an improved recognition method

based on the DTW algorithm and analyzes the recognition rates of VQ algorithm with

different codebook’s size and compares the computation time of different algorithms on

Matlab as well. Finally, the paper analyzes the test results on Matlab and transplants the

algorithms on ARM platform..

The improved DTW recognition algorithm is based on the statistics. It first

computes the distortions between test template and each word’s many reference

templates, then computes the mean value of each word, replaces the distortion’s value

which is much bigger than the mean value with mean value, then computes the new data

set’s mean value, finally sort all the mean values, the word which has the smallest mean

value is the recognition result. This method to some extend overcomes the impact of

causal factors, so it improves the recognition rate. For four English words, the rate of

traditional DTW algorithm is 93%, 82%, 93%, 69% respectively and the improved

DTW algorithm is 95%, 90%, 94%, 91% in contrast.

For VQ technology used in speech, the paper first introduces the basic theory of

VQ, optimal codebook design and the process of recognition, then analyze the influence

of recognition rate caused by codebook’s size. It compares the recognition performance

in the specific four words case when the codebook’s size is 4, 8 and 16. The conclusion

is that within a certain range, the larger the codebook’s size is the higher the recognition

rate is, but when the codebook’s size is too large, the rate doesn’t increase any more and

even may decrease while the computation increases.

Finally, the paper does engineering implementation. The first step is to design the

graphical user interface with Qt software, then transplant the algorithm to mini2440

platform and achieve the expected goal and has some practical value.

Key Words: Speech Recognition Feature Extraction DTW VQ

ARM

中文摘要

ABSTRACT

第一章绪论 ...............................................................................................................1

1.1 语音识别概述 ..................................................................................................1

1.2 语音识别的发展史 ..........................................................................................1

1.3 语音识别的技术分类 ......................................................................................3

1.4 孤立词语音识别系统中的难点 ......................................................................3

1.5 论文主要工作及章节安排 ..............................................................................4

第二章语音识别系统的基本原理 ...............................................................................4

2.1 语音信号的产生及模型 ..................................................................................4

2.1.1 语音信号产生的生理机理 ...................................................................5

2.1.2 语音信号的数学模型 ...........................................................................5

2.2 语音信号的分析方法 ......................................................................................6

2.2.1 语音信号的预处理 ...............................................................................6

2.2.2 特征参数的提取 ...................................................................................8

2.2.3 模型训练和模板匹配 .........................................................................11

2.3 典型语音识别系统的构成 ............................................................................13

2.4 本章小结 ........................................................................................................14

第三章语音识别中 DTW 和VQ 算法的研究及应用 ..............................................14

3.1 DTW 算法的基本原理 ................................................................................. 14

3.2 DTW 算法的匹配过程及改进方法 ............................................................. 18

3.2.1 DTW 算法的一般匹配过程 .............................................................. 18

3.2.2 改进的 DTW 识别方法 ..................................................................... 19

3.3 矢量量化（VQ）原理介绍 ..........................................................................21

3.4 矢量量化的最佳码本设计 ............................................................................22

3.5 应用矢量量化的语音识别过程 ....................................................................23

3.6 DTW 与VQ 算法在 Matlab 上的仿真与性能分析 .................................... 23

3.7 本章小结 ........................................................................................................25

第四章语音识别系统在 ARM 平台上的实现 ......................................................... 27

4.1 S3C2440 及开发板 mini2440 简介 .............................................................. 27

4.1.1 ARM 处理器 S3C2440 .......................................................................27

4.1.2 mini2440 开发板介绍 ........................................................................ 28

4.2 Qt 及语音识别的用户界面设计 ...................................................................29

4.2.1 Qt 简介 ................................................................................................29

4.2.2 Qt 的安装过程 ....................................................................................30

4.2.3 语音识别系统的用户界面设计 .........................................................31

4.3 交叉编译环境的搭建 ....................................................................................32

4.4 bootloader 的移植 ......................................................................................... 33

4.5 linux 内核的移植 .......................................................................................... 34

4.6 根文件系统的制作与移植 ............................................................................35

4.7 语音识别程序的移植 ....................................................................................38

4.8 本章小结 ........................................................................................................38

第五章总结和展望 .....................................................................................................39

5.1 论文完成的工作 ............................................................................................39

5.2 下一步工作的展望 ........................................................................................39

参考文献 .........................................................................................................................40

在读期间公开发表的论文和承担科研项目及取得成果 .............................................43

一、发表论文 .........................................................................................................43

二、科研项目 .........................................................................................................43

致谢 .............................................................................................................................44

第一章绪论

1.1 语音识别概述

纵观人类发展的历史，无论是在遥远的古代，还是在科技不断进步的现代，语

音总是人们最方便和最快捷的信息交流方式。人们一直在探索和追求让机器听懂人

类的语言，使其能执行相应的动作。从古老的“芝麻开门”传说开始，到如今应用

在各种场合的语音识别设备，机器识别人类语言开始从理想走向现实。语音识别技

术已被广泛应用在工业、交通、军事、医学及民用等领域，产生了巨大的社会经济

效益。

语音识别（Speech Recognition）是智能机器自动识别语音的一项技术[1]。它是

让机器对输入语音进行识别和理解，再转化为相应的文本或命令以达到人们的期望

和要求，其最终目标是实现人与机器的自然语言通信[2]。语音识别是一门交叉学科，

以语音为研究对象，是语音信号处理的一个重要研究方向。它也是模式识别的一个

分支，涉及到生理学、语言学、心理学、计算机科学以及信号处理等多个领域，当

前仍处于发展和攻关阶段。

语音识别是人机接口设计中的一个重要研究内容。随着信息产业的迅猛发展，

包括办公自动化、计算机科学、通信技术、国防科技和机器人在内的各领域，都迫

切需要运用语音识别技术来改变极不方便的人机接口方式[3]。利用语音识别技术可

以解决很多费脑、费力、费时的机器操作。在“手忙”、“手不能用”以及“懒得

动手”的场景中，像驾驶室、一些危险的工业场合、远距离自动信息获取、家电控

制或是灰暗条件下需要手工输入等应用环境里，语音识别技术的引入极大方便了人

们的工作和生活。

1.2 语音识别的发展史

语音识别的研究工作最早开始于上世纪 50 年代。

1952 年贝尔实验室研制出世

界上第一个语音识别系统—Audry 系统，它能识别 10 个孤立的英语数字。它的出

现标志着语音识别研究的开始[4]。

在60 年代，语音识别技术得到快速发展。一方面计算机技术的应用为语音识

别的实现创造了硬件和软件的可能；另一方面语音识别理论研究也取得了重要成

果。动态规划（Dynamic Programming，DP）和线性预测分析（Linear Predictive，

LP）技术[5]在这一时期被提出并应用到语音识别系统中。其中 LP 技术较好的解决

了语音信号模型问题，对语音识别产生了深远影响。

到了 70 年代语音识别研究取得突破性成果。日本人板仓提出动态时间规整

（Dynamic Time Warping，DTW），解决了说话速度不均匀的难题[6]；线性预测编

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

15 积分 4人已下载

立即下载 VIP免费下载

摘要：

摘要语音识别的研究工作开始于上世纪的50年代，经过了近60年的不断发展，目前已在很多关键技术领域取得重大成果。该技术被广泛应用，很多相关产品不断推向市场。语音识别技术改变了传统的人机交互方式，极大方便了人们的工作和生活。目前孤立词语音识别技术最为成熟，应用最广。虽然它在很多场合下识别率比较高，但仍存在着鲁棒性差，容易受随机因素影响等缺点，因此继续深入研究孤立词语音识别很有必要。论文在研究和分析了现有语音识别算法的基础上，实现了4个英文单词的孤立词语音识别系统。对动态时间规整算法提出了改进的识别算法；对矢量量化技术分析了其在语音识别中的具体应用,同时通过在Matlab上仿真,分析码本尺寸对语音识...

展开>> 收起<<

语音识别中的动态时间规整与矢量量化技术.pdf

共48页,预览5页

还剩页未读，继续阅读

语音识别中的动态时间规整与矢量量化技术

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

推荐作者

热门标签

举报选择: