Chinese-Vietnamese word similarity computation based on Wikipedia


Yang Qiyue1Yu Zhengtao1Hong Xudong1Gao Shengxiang1Tang Zhiwen2
1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650051,China; 2.School of Computer Science and Engineering,Beihang University,Beijing 100191,China
Chinese Vietnamese word similarity wikipedia concept co-occurrence relationship corresponding relation word frequency
In order to solve the word similarity between Chinese and Vietnamese,setting the multi-language concept description page from Wikipedia as a bridge,using translation correspondence between concepts,words appearing in different concept pages,and the co-occurrence relationship between words and other concepts,the method of calculating the similarity between Chinese-Vietnamese words based on Wikipedia is proposed.The set of Chinese-Vietnamese correspondence concept is extracted from Wikipedia to construct bilingual concept feature space.According to the word frequency features appearing in the corresponding concept text,and the co-occurrence features of words and concepts in other concept texts,we construct the concept vector value of words.The similarity between two vectors is calculated by the angle cosine.The experimental results indicate that the proposed method has good effect on the similarity computation between Chinese and Vietnamese words,and the concept co-occurrence relationship can improve the accuracy of word similarity.


