|Table of Contents|

Chinese word segmentation algorithm based on pair coding

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2014年04期
Page:
526-530
Research Field:
Publishing date:

Info

Title:
Chinese word segmentation algorithm based on pair coding
Author(s):
Zhang Bingyi12Wei Bo1Chen Jiancheng3Wei Jie4Rao Guozheng12
1.College of Computer Science and Technology,Tianjin University,Tianjin 300072,China; 2.Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin 300072,China; 3.Xiping County Electric Power Company,Zhumadian 463900,China; 4.State Key Labor
Keywords:
pair coding Chinese word segmentation characteristic matching data compression hash characteristic value fuzzy matching
PACS:
TP391.1
DOI:
-
Abstract:
To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm,this paper proposes a characteristic matching algorithm based on pair coding.The characteristic value is extracted from the Chinese character position.This method can support fuzzy matching and don't need match multi-character Chinese words,so the characteristic value extraction is extracted from the adjacent Chinese character position.In addition,the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.

References:

[1] 周俊,郑中华,张炜.基于改进最大匹配算法的中文分词粗分方法[J].计算机工程与应用,2014,50(2):124-128.
Zhou Jun,Zheng Zhonghua,Zhang Wei.Method of Chinese words rough segmentation based on improving maximum match algorithm[J].Computer Engineering and Applications,2014,50(2):124-128.
[2]麦范金,李东普,岳晓光.基于双向匹配法和特征选择算法的中文分词技术研究[J].昆明理工大学学报,2011,36(1):47-51.
Mai Fanjin,Li Dongpu,Yue Xiaoguang.Research on Chinese word segmentation based on bi-direction marching method and feature selection algorithm[J].Journal of Kunming University of Science and Technology,2011,36(1):47-51.
[3]曹卫峰.中文分词关键技术研究[D].南京:南京理工大学计算机科学与技术学院,2009.
[4]王瑞雷,栾静,潘晓花,等.一种改进的中文分词正向最大匹配算法[J].计算机应用与软件,2011,28(3):195-197.
Wang Ruilei,Luan Jing,Pan Xiaohua,et al.An improved forward maximum matching algorithm for Chinese word segmentation[J].Computer Applications and Software,2011,28(3):195-197.
[5]胡鹏飞.Lucene与中文分词技术的研究及应用[D].北京:北京交通大学计算机科学与技术学院,2010.
[6]卢亮,张博文.搜索引擎原理、实践与应用[M].北京:电子工业出版社,2007.
[7]费洪晓,胡海苗,巩燕玲.基于Hash结构的机械统计分词系统研究[J].计算机工程与应用,2006,42(5):159-161.
Fei Hongxiao,Hu Haimiao,Gong Yanling.A kind of machine-statistics system based on hash structure for Chinese word segmentation[J].Computer Engineering and Applications,2006,42(5):159-161.
[8]Wang Zhengfei,Dai Jing,Wang Wei,et al.Fast query over encrypted character data in database[J].Communications in Information and Systems,2004,4(4):289-300.

Memo

Memo:
-
Last Update: 2014-08-31