[1]杨启悦,余正涛,洪旭东,等.基于维基百科的汉越词语相似度计算[J].南京理工大学学报(自然科学版),2016,40(04):461.[doi:10.14177/j.cnki.32-1397n.2016.40.04.014]
 Yang Qiyue,Yu Zhengtao,Hong Xudong,et al.Chinese-Vietnamese word similarity computation based on Wikipedia[J].Journal of Nanjing University of Science and Technology,2016,40(04):461.[doi:10.14177/j.cnki.32-1397n.2016.40.04.014]
点击复制

基于维基百科的汉越词语相似度计算
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
40卷
期数:
2016年04期
页码:
461
栏目:
出版日期:
2016-08-29

文章信息/Info

Title:
Chinese-Vietnamese word similarity computation based on Wikipedia
文章编号:
1005-9830(2016)04-0461-06
作者:
杨启悦1余正涛1洪旭东1高盛祥1汤智文2
1.昆明理工大学 信息与自动化学院,云南 昆明 650500; 2.北京航空航天大学 计算机科学与工程学院,北京 100191
Author(s):
Yang Qiyue1Yu Zhengtao1Hong Xudong1Gao Shengxiang1Tang Zhiwen2
1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650051,China; 2.School of Computer Science and Engineering,Beihang University,Beijing 100191,China
关键词:
汉语 越南语 词语相似度 维基百科 概念 共现关系 对应关系 词频
Keywords:
Chinese Vietnamese word similarity wikipedia concept co-occurrence relationship corresponding relation word frequency
分类号:
TP391.1
DOI:
10.14177/j.cnki.32-1397n.2016.40.04.014
摘要:
为了解决跨语言汉越词语相似度计算问题,以维基百科多语言概念页面作为桥梁,利用概念之间存在的翻译对应关系、词语出现在不同概念页面及与其他概念之间存在共现关系,提出了基于维基百科的汉越词语相似度计算方法,该方法首先提取维基百科中汉语越南语具有对应关系的概念集合,构建双语概念特征空间,然后根据词语在相应概念描述文本中出现的词频特征,以及词语与概念在其他概念文本中的共现特征构建词语的概念向量值,最后通过夹角余弦对两个向量进行词语相似度计算。实验结果表明提出的方法在汉越双语词语相似度计算上表现了好的效果,概念共现关系能够提高词语相似度的准确率。
Abstract:
In order to solve the word similarity between Chinese and Vietnamese,setting the multi-language concept description page from Wikipedia as a bridge,using translation correspondence between concepts,words appearing in different concept pages,and the co-occurrence relationship between words and other concepts,the method of calculating the similarity between Chinese-Vietnamese words based on Wikipedia is proposed.The set of Chinese-Vietnamese correspondence concept is extracted from Wikipedia to construct bilingual concept feature space.According to the word frequency features appearing in the corresponding concept text,and the co-occurrence features of words and concepts in other concept texts,we construct the concept vector value of words.The similarity between two vectors is calculated by the angle cosine.The experimental results indicate that the proposed method has good effect on the similarity computation between Chinese and Vietnamese words,and the concept co-occurrence relationship can improve the accuracy of word similarity.

参考文献/References:

[1] Ahsaee M G,Naghibzadeh M,Naeini S E Y.Semantic similarity assessment of words using weighted WordNet[J].International Journal of Machine Learning and Cybernetics,2014,5(3):479-490.
[2]田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6):602-608.
Tian Jiule,Zhao Wei.Words similarity algorithm based on tongyici cilin in semantic web adaptive learning system[J].Journal of Jilin Unversity(Information Science Edition),2010,28(6):602-608.
[3]刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76.
Liu Qun,Li Sujian.Word semantic similarity computation based on HowNet[J].Chinese Computation Linguistics,2002,7(2):59-76.
[4]詹志建,梁丽娜,杨小平.基于百度百科的词语相似度计算[J].计算机科学,2013,40(6):199-202.
Zhan zhiJian,Liang Lina,Yang Xiaoping.Word similarity measurement based on BaiduBaike[J].Computer Science,2013,40(6):199-202.
[5]张冰怡,魏 博,陈建成,等.基于对偶编码的中文分词算法[J].南京理工大学学报,2014,38(4):526-530.
Zhang Bingyi,Wei Bo,Chen Jiancheng,et al.Chinese word segmentation algorithm based on pair coding[J].Journal of Nanjing University of Science and Technology,2014,38(4):526-530.
[6]王文,王树锋,李洪华.基于文本语义和表情倾向的微博情感分析方法[J].南京理工大学学报,2014,38(6):733-738.
Wang Wen,Wang Shufeng,Li Honghua.Micro-blogging sentiment analysis method based on text semantics and expression tendentiousness[J].Journal of Nanjing University of Science and Technology,2014,38(6):733-738.
[7]Dagan I,Lee L,Pereira F C N.Similarity-based models of word cooccurrence probabilities[J].Machine Learning,1999,34(1-3):43-69.
[8]Gracia J,Mena E.Web-based measure of semantic relatedness[J].Lecture Notes in Computer Science,2008,5175:136-150.
[9]赵军,胡栓柱,樊兴华.一种新的词语相似度计算方法[J].重庆邮电大学学报(自然科学版),2009,21(4):528-532.
Zhao Jun,Hu Shuanzhu,Fan Xinghua.Word similarity computation based on word link distribution[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2009,21(4):528-53
[10]吴思颖,吴扬扬.基于中文WordNet的中英文词语相似度计算[J].郑州大学学报(理学版),2010,42(2):66-69.
Wu Siying,Wu Yangyang.Chinese and English word similarity measure based on chinese WordNet[J].Journal of Zhengzhou University(Natural Science Edition),2010,42(2):66-69.
[11]Vulic I,Moens M F.Cross-lingual semantic similarity of words as the similarity of their semantic word responses[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT 2013).Atlanta:ACL,2013:106-116.
[12]中文维基百科[EB/OL].http://zh.wikipedia.org/wiki/Wikipedia,2016-12-29.
Chinese Wikipedia[EB/OL].http://zh.wikipedia.org/wiki/Wikipedia,2016-12-29.
[13]Rubenstein H,Goodenough J B.Contextual correlates of synonymy[J].Communications of the Acm,1965,8(10):627-633.
[14]Miller G A,Charles W G.Contextual correlates of semantic similarity[J].Language and Cognitive Processes,1991,6(1):1-28.
[15]Finkelstein L,Gabrilovich E,Matias Y,et al.Placing search in context:the concept revisited[J].Acm Transactions on Information Systems,2002,20(1):116-131.

备注/Memo

备注/Memo:
收稿日期:2015-12-26 修回日期:2016-01-19
基金项目:国家自然科学基金(61175068,61472168); 云南省自然科学重点项目(2013FA030)
作者简介:杨启悦(1992-),女,硕士生,主要研究方向:自然语言处理,Email:yanghelen412@qq.com; 通讯作者:余正涛(1970-),男,博士,教授,主要研究方向:自然语言处理、信息检索、机器翻译,E-mail:ztyu@hotmail.com。
引文格式:杨启悦,余正涛,洪旭东,等.基于维基百科的汉越词语相似度计算[J].南京理工大学学报,2016,40(4):461-466.
投稿网址::http://zrxuebao.njust.edu.cn
更新日期/Last Update: 2016-06-30