[1]王 文,王树锋,李洪华.基于文本语义和表情倾向的微博情感分析方法[J].南京理工大学学报(自然科学版),2014,38(06):733.
 Wang Wen,Wang Shufeng,Li Honghua.Microblogging sentiment analysis method based on text semantics and expression tendentiousness[J].Journal of Nanjing University of Science and Technology,2014,38(06):733.
点击复制

基于文本语义和表情倾向的微博情感分析方法
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
38卷
期数:
2014年06期
页码:
733
栏目:
出版日期:
2014-12-31

文章信息/Info

Title:
Microblogging sentiment analysis method based on text semantics and expression tendentiousness
作者:
王 文12王树锋12李洪华1
常州工学院 1.计算机信息工程学院; 2.常州市软件技术研究与应用重点实验室,江苏 常州 213002
Author(s):
Wang Wen12Wang Shufeng12Li Honghua1
1.School of Information and Engineering; 2.Changzhou Software Technology Research and Application Key Laboratory,Changzhou Institute of Technology,Changzhou 213002,China
关键词:
文本语义 表情倾向 微博 情感分析 机器学习 微博爬虫 应用程序编程接口 情感词典 语义相似度
Keywords:
text semantics expression tendentiousness microblogging sentiment analysis machine-learning Weibo crawlers application programming interface sentiment word dictionaries semantic similarity
分类号:
TP391
摘要:
针对基于机器学习的中文微博情感分析方法存在处理过程复杂、判断准确率低等问题,该文提出了一种新的情感分析方法。将微博爬虫和Web应用程序编程接口(API)相结合,对动态微博数据进行收集和预处理。基于NTUSD和HowNet中文情感词典的微博情感词的抽取和分类,计算词语语义相似度和倾向性。综合考虑表情、文本情感倾向的加权和正面情感增强等因素。实验结果表明:表情情感倾向对微博情感倾向起着重要作用; 在表情和文本情感倾向比值固定的情况下,调整因素和中性区间的选择会对情感倾向判断准确率产生影响; 通过与基于HowNet语义相似度的计算模型比较,该文方法使得情感倾向判断准确率提高约5%。
Abstract:
Aiming at the problems of complex treatment works and low accuracy of the sentiment analysis method of Chinese microblogging based on machine-learning,a new sentiment analysis method is proposed here.The dynamic microblogging data are collected and pretreated by combining Weibo crawlers and Web application programming interface(API).The semantic similarity and tendentiousness are calculated based on the extraction and classification of microblogging emotional words of Chinese sentiment word dictionaries NTUSD and HowNet.The weightings of expression and text emotional tendentiousness,the increase of positive emotion and other factors are considered.Experimental data show that:expression tendentiousness plays a vital role on microblogging emotional tendentiousness; the reasonable setting of adjustment factors and neutral thresholds can improve the accuracy of sentiment analysis better when the ratio of expression and text emotional tendentiousness is fixed; compared with the calculation model based on HowNet semantic similarity,the adjustment accuracy of emotional tendentiousness of the sentiment analysis method proposed here is improved by about 5%.

参考文献/References:

[1] 周胜臣,瞿文婷,石英子,等.中文微博情感分析研究综述[J].计算机应用与软件,2013,30(3):161-164,181. Zhou Shengchen,Qu Wenting,Shi Yingzi,et al.Overview on sentiment analysis of Chinese microblogging[J].Computer Applications and Software,2013,30(3):161-164,181.
[2]Zhang H.The optimality of naive bayes[A].Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference[C].Miami Beach,Florida,USA:DBLP,2004:562-567.
[3]刘志明,刘鲁.基于机器学习的中文微博情感分类实证研究[J].计算机工程与应用,2012,48(1):1-4. Liu Zhiming,Liu Lu.Empirical study of sentiment classification for Chinese microblog based on machine learning[J].Computer Engineering and Applications,2012,48(1):1-4.
[4]吴维,肖诗斌.基于多特征与复合分类法的中文微博情感分析[J].北京信息科技大学学报(自然科学版),2013,28(4):39-45. Wu Wei,Xiao Shibin.Sentiment analysis of Chinese micro-blog based on multi-feature and combined classification[J].Journal of Beijing Information Science and Technology University,2013,28(4):39-45.
[5]张珊,于留宝,胡长军.基于表情图片与情感词的中文微博情感分析[J].计算机科学,2012,39(z3):146-148,176. Zhang Shan,Yu Liubao,Hu Changjun.Sentiment analysis of Chinese micro-blogs based on emoticons and emotional words[J].Computer Science,2012,39(z3):146-148,176.
[6]Kumar S,Morstatter F,Liu Huan.Twitter data analytics[M].New York,USA:Springer New York,2014.
[7]Riloff E,Wiebe J.Learning extraction patterns for subjective expressions[A].Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing(EMNLP-03)[C].Sapporo,Japan:EMNLP,2003:105-112.
[8]Kouloumpis E,Wilson T,et al.Twitter sentiment analysis:The good the bad and the OMG![A].Proceedings of the Fifth International Conference on Weblogs and Social Media[C].Barcelona,Spain:DBLP,2011:538-541.
[9]Turney P.Thumbs up or thumbs down?Semantic orientation applied to unsupervised classification of reviews[A].Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics[C].Philadelphia,USA:ACL,2002:417-424.
[10]Pak A,Paroubek P.Twitter as a corpus for sentiment analysis and opinion mining[A].Proceedings of the Seventh Conference on International Language Resources and Evaluation[C].Valletta,Malta:LREC,2010:1320-1326.
[11]刘群,李素建.基于《知网》的词汇语义相似度计算[A].第三届中文词汇语义学研讨会[C].台北:ACLCLP,2002:59-76.
[12]朱嫣岚,闵锦,周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. Zhu Yanlan,Min Jin,Zhou Yaqian,et al.Semantic orientation computing based on HowNet[J].Journal of Chinese Information Processing,2006,20(1):14-20.
[13]李寿山,黄居仁.基于Stacking组合分类方法的中文情感分类研究[J].中文信息学报,2010,24(5):56-61. Li Shoushan,Huang Juren.Chinese sentiment classification based on Stacking combination method[J].Journal of Chinese Information Processing,2010,24(5):56-61.
[14]王岩.基于共现链的微博情感分析技术的研究与实现[D].长沙:国防科学技术大学计算机学院,2011.
[15]谢丽星,周明,孙茂松.基于层次结构的多策略中文微博情感分析和特征抽取[J].中文信息学报,2012,26(1):73-83. Xie Lixing,Zhou Ming,Sun Maosong.Hierarchical structure based hybrid approach to sentiment analysis of Chinese micro blog and its feature extraction[J].Journal of Chinese Information Processing,2012,26(1):73-83.
[16]王志涛,於志文,郭斌,等.基于词典和规则集的中文微博情感分析[J].计算机工程与应用,2013:1-10. Wang Zhitao,Yu Zhiwen,Guo Bin,et al.Sentiment analysis of Chinese micro blog based on lexicon and rule set[J].Computer Engineering and Applications,2013:1-10.
[17]梁军,柴玉梅,原慧斌,等.基于深度学习的微博情感分析[J].中文信息学报,2014,28(5):155-161. Liang Jun,Chai Yumei,Yuan Huibin,et al.Deep learning for Chinese micro-blog sentiment analysis[J].Journal of Chinese Information Processing,2014,28(5):155-161.
[18]Java A,Song Xiaodan,Finin T,et al.Why we twitter:Understanding microblogging usage and communities[A].Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis[C].New York,USA:ACM,2007:56-65.
[19]Gao Qing,Xiao Bo,Lin Zhiqing,et al.A high-precision forum crawler based on vertical crawling[A].IEEE International Conference on Network Infrastructure and Digital Content,2009(IC-NIDC 2009)[C].Beijing,China:IEEE,2009:362-367.
[20]Wang Yida,Yang Jiangming,Lai Wei,et al.Exploring traversal strategy for Web forum crawling[A].Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].Singapore:ACM,2008:459-466.

备注/Memo

备注/Memo:
收稿日期:2014-09-05 修回日期:2014-10-31
基金项目:常州工学院校级科研
基金项目(YN1316; YN1203)
作者简介:王文(1974-),男,讲师,主要研究方向:信息检索与智能信息处理,E-mail:247307447@qq.com。
引文格式:王文,王树锋,李洪华.基于文本语义和表情倾向的微博情感分析方法[J].南京理工大学学报,2014,38(6):733-738.
投稿网址:http://zrxuebao.njust.edu.cn
更新日期/Last Update: 2014-12-31