|Table of Contents|

Chinese-Vietnamese word similarity computation based on Wikipedia


Research Field:
Publishing date:


Chinese-Vietnamese word similarity computation based on Wikipedia
Yang Qiyue1Yu Zhengtao1Hong Xudong1Gao Shengxiang1Tang Zhiwen2
1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650051,China; 2.School of Computer Science and Engineering,Beihang University,Beijing 100191,China
Chinese Vietnamese word similarity wikipedia concept co-occurrence relationship corresponding relation word frequency
In order to solve the word similarity between Chinese and Vietnamese,setting the multi-language concept description page from Wikipedia as a bridge,using translation correspondence between concepts,words appearing in different concept pages,and the co-occurrence relationship between words and other concepts,the method of calculating the similarity between Chinese-Vietnamese words based on Wikipedia is proposed.The set of Chinese-Vietnamese correspondence concept is extracted from Wikipedia to construct bilingual concept feature space.According to the word frequency features appearing in the corresponding concept text,and the co-occurrence features of words and concepts in other concept texts,we construct the concept vector value of words.The similarity between two vectors is calculated by the angle cosine.The experimental results indicate that the proposed method has good effect on the similarity computation between Chinese and Vietnamese words,and the concept co-occurrence relationship can improve the accuracy of word similarity.


[1] Ahsaee M G,Naghibzadeh M,Naeini S E Y.Semantic similarity assessment of words using weighted WordNet[J].International Journal of Machine Learning and Cybernetics,2014,5(3):479-490.
Tian Jiule,Zhao Wei.Words similarity algorithm based on tongyici cilin in semantic web adaptive learning system[J].Journal of Jilin Unversity(Information Science Edition),2010,28(6):602-608.
Liu Qun,Li Sujian.Word semantic similarity computation based on HowNet[J].Chinese Computation Linguistics,2002,7(2):59-76.
Zhan zhiJian,Liang Lina,Yang Xiaoping.Word similarity measurement based on BaiduBaike[J].Computer Science,2013,40(6):199-202.
[5]张冰怡,魏 博,陈建成,等.基于对偶编码的中文分词算法[J].南京理工大学学报,2014,38(4):526-530.
Zhang Bingyi,Wei Bo,Chen Jiancheng,et al.Chinese word segmentation algorithm based on pair coding[J].Journal of Nanjing University of Science and Technology,2014,38(4):526-530.
Wang Wen,Wang Shufeng,Li Honghua.Micro-blogging sentiment analysis method based on text semantics and expression tendentiousness[J].Journal of Nanjing University of Science and Technology,2014,38(6):733-738.
[7]Dagan I,Lee L,Pereira F C N.Similarity-based models of word cooccurrence probabilities[J].Machine Learning,1999,34(1-3):43-69.
[8]Gracia J,Mena E.Web-based measure of semantic relatedness[J].Lecture Notes in Computer Science,2008,5175:136-150.
Zhao Jun,Hu Shuanzhu,Fan Xinghua.Word similarity computation based on word link distribution[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2009,21(4):528-53
Wu Siying,Wu Yangyang.Chinese and English word similarity measure based on chinese WordNet[J].Journal of Zhengzhou University(Natural Science Edition),2010,42(2):66-69.
[11]Vulic I,Moens M F.Cross-lingual semantic similarity of words as the similarity of their semantic word responses[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT 2013).Atlanta:ACL,2013:106-116.
Chinese Wikipedia[EB/OL].http://zh.wikipedia.org/wiki/Wikipedia,2016-12-29.
[13]Rubenstein H,Goodenough J B.Contextual correlates of synonymy[J].Communications of the Acm,1965,8(10):627-633.
[14]Miller G A,Charles W G.Contextual correlates of semantic similarity[J].Language and Cognitive Processes,1991,6(1):1-28.
[15]Finkelstein L,Gabrilovich E,Matias Y,et al.Placing search in context:the concept revisited[J].Acm Transactions on Information Systems,2002,20(1):116-131.


Last Update: 2016-06-30