|Table of Contents|

Chinese-Vietnamese word similarity computation based on Wikipedia

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2016年04期
Page:
461-
Research Field:
Publishing date:

Info

Title:
Chinese-Vietnamese word similarity computation based on Wikipedia
Author(s):
Yang Qiyue1Yu Zhengtao1Hong Xudong1Gao Shengxiang1Tang Zhiwen2
1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650051,China; 2.School of Computer Science and Engineering,Beihang University,Beijing 100191,China
Keywords:
Chinese Vietnamese word similarity wikipedia concept co-occurrence relationship corresponding relation word frequency
PACS:
TP391.1
DOI:
10.14177/j.cnki.32-1397n.2016.40.04.014
Abstract:
In order to solve the word similarity between Chinese and Vietnamese,setting the multi-language concept description page from Wikipedia as a bridge,using translation correspondence between concepts,words appearing in different concept pages,and the co-occurrence relationship between words and other concepts,the method of calculating the similarity between Chinese-Vietnamese words based on Wikipedia is proposed.The set of Chinese-Vietnamese correspondence concept is extracted from Wikipedia to construct bilingual concept feature space.According to the word frequency features appearing in the corresponding concept text,and the co-occurrence features of words and concepts in other concept texts,we construct the concept vector value of words.The similarity between two vectors is calculated by the angle cosine.The experimental results indicate that the proposed method has good effect on the similarity computation between Chinese and Vietnamese words,and the concept co-occurrence relationship can improve the accuracy of word similarity.

References:

[1] Ahsaee M G,Naghibzadeh M,Naeini S E Y.Semantic similarity assessment of words using weighted WordNet[J].International Journal of Machine Learning and Cybernetics,2014,5(3):479-490.
[2]田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6):602-608.
Tian Jiule,Zhao Wei.Words similarity algorithm based on tongyici cilin in semantic web adaptive learning system[J].Journal of Jilin Unversity(Information Science Edition),2010,28(6):602-608.
[3]刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76.
Liu Qun,Li Sujian.Word semantic similarity computation based on HowNet[J].Chinese Computation Linguistics,2002,7(2):59-76.
[4]詹志建,梁丽娜,杨小平.基于百度百科的词语相似度计算[J].计算机科学,2013,40(6):199-202.
Zhan zhiJian,Liang Lina,Yang Xiaoping.Word similarity measurement based on BaiduBaike[J].Computer Science,2013,40(6):199-202.
[5]张冰怡,魏 博,陈建成,等.基于对偶编码的中文分词算法[J].南京理工大学学报,2014,38(4):526-530.
Zhang Bingyi,Wei Bo,Chen Jiancheng,et al.Chinese word segmentation algorithm based on pair coding[J].Journal of Nanjing University of Science and Technology,2014,38(4):526-530.
[6]王文,王树锋,李洪华.基于文本语义和表情倾向的微博情感分析方法[J].南京理工大学学报,2014,38(6):733-738.
Wang Wen,Wang Shufeng,Li Honghua.Micro-blogging sentiment analysis method based on text semantics and expression tendentiousness[J].Journal of Nanjing University of Science and Technology,2014,38(6):733-738.
[7]Dagan I,Lee L,Pereira F C N.Similarity-based models of word cooccurrence probabilities[J].Machine Learning,1999,34(1-3):43-69.
[8]Gracia J,Mena E.Web-based measure of semantic relatedness[J].Lecture Notes in Computer Science,2008,5175:136-150.
[9]赵军,胡栓柱,樊兴华.一种新的词语相似度计算方法[J].重庆邮电大学学报(自然科学版),2009,21(4):528-532.
Zhao Jun,Hu Shuanzhu,Fan Xinghua.Word similarity computation based on word link distribution[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2009,21(4):528-53
[10]吴思颖,吴扬扬.基于中文WordNet的中英文词语相似度计算[J].郑州大学学报(理学版),2010,42(2):66-69.
Wu Siying,Wu Yangyang.Chinese and English word similarity measure based on chinese WordNet[J].Journal of Zhengzhou University(Natural Science Edition),2010,42(2):66-69.
[11]Vulic I,Moens M F.Cross-lingual semantic similarity of words as the similarity of their semantic word responses[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT 2013).Atlanta:ACL,2013:106-116.
[12]中文维基百科[EB/OL].http://zh.wikipedia.org/wiki/Wikipedia,2016-12-29.
Chinese Wikipedia[EB/OL].http://zh.wikipedia.org/wiki/Wikipedia,2016-12-29.
[13]Rubenstein H,Goodenough J B.Contextual correlates of synonymy[J].Communications of the Acm,1965,8(10):627-633.
[14]Miller G A,Charles W G.Contextual correlates of semantic similarity[J].Language and Cognitive Processes,1991,6(1):1-28.
[15]Finkelstein L,Gabrilovich E,Matias Y,et al.Placing search in context:the concept revisited[J].Acm Transactions on Information Systems,2002,20(1):116-131.

Memo

Memo:
-
Last Update: 2016-06-30