[1]牛 迪.基于多维度特征融合的中文命名实体识别系统[J].南京理工大学学报(自然科学版),2020,44(06):645-650.[doi:10.14177/j.cnki.32-1397n.2020.44.06.002]
 Niu Di.Chinese named entity recognition system withmulti-dimensional features[J].Journal of Nanjing University of Science and Technology,2020,44(06):645-650.[doi:10.14177/j.cnki.32-1397n.2020.44.06.002]
点击复制

基于多维度特征融合的中文命名实体识别系统()
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
44卷
期数:
2020年06期
页码:
645-650
栏目:
出版日期:
2020-12-31

文章信息/Info

Title:
Chinese named entity recognition system withmulti-dimensional features
文章编号:
1005-9830(2020)06-0645-06
作者:
牛 迪
浙江大学 经济学院,浙江 杭州 310027
Author(s):
Niu Di
School of Economics,Zhejiang University,Hangzhou 310027,China
关键词:
命名实体识别 深度学习 多维度特征融合
Keywords:
named entity recognition deep learning multi-dimensional feature fusion
分类号:
TP391.1
DOI:
10.14177/j.cnki.32-1397n.2020.44.06.002
摘要:
现阶段的命名实体识别(Named entity recognition,NER)多依赖深度学习模型自动抽取文本特征,无法对文本中字词的特征进行融合,同时对于模型的错误预测结果也无法人工干预,只能通过优化模型参数和再次语料训练来解决。针对这样的问题,该文设计了整体的NER系统架构,同时提出了多维度特征融合的深度学习模型。该文在常规的长短期记忆模型(Long short term memory,LSTM)和 条件随机场(Conditional random field,CRF)模型基础上,构建了新的神经网络结构,融入了多维度的字词特征。整个NER系统还引入了规则匹配,通过规则和深度学习的配合,将整体NER的F1值提升到96.2%。对比常规的LSTM+CRF模型,性能提升了近6%。
Abstract:
At present,named entity recognition(NER)mostly relies on the deep learning model to automatically extract text features,so it is unable to merge the features of characters and words in the text. Meanwhile,the error of model prediction results cannot be manually intervened,the only solution is to optimize the model parameters and retrain. To solve these problems,an NER system architecture is designed and a multi-dimensional feature fusion deep learning model is proposed. In this paper,based on the conventional long term short memory(LSTM)and conditional random field(CRF)model,a new neural network structure is constructed,incorporating multi-dimensional features of characters and words. The whole NER system also uses rule matching system to cooperate with deep learning. The F1 value of the whole NER can be 96.2%. Compared with the conventional LSTM+CRF model,performance can be improved by nearly 6%.

参考文献/References:

[1] 李涛,王次臣,李华康. 知识图谱的发展与构建[J]. 南京理工大学学报,2017,41(1):22-34.
Li Tao,Wang Cichen,Li Huakang. Development and construction of knowledge graph[J]. Journal of Nanjing University of Science and Technology,2017,41(1):22-34.
[2]王雍凯,毛存礼,余正涛,等. 基于图的新闻事件主题句抽取方法[J]. 南京理工大学学报,2016,40(4):438-444.
Wang Yongkai,Mao Cunli,Yu Zhengtao,et al. Approach for topical sentence of news events extraction based on graph[J]. Journal of Nanjing University of Science and Technology,2016,40(4):438-444.
[3]杨玉娟,袁欢欢,王永利. 针对评论文本的情感分析方法[J]. 南京理工大学学报,2019,43(3):280-285.
Yang Yujuan,Yuan Huanhuan,Wang Yongli. Sentiment analysis method for comment text[J]. Journal of Nanjing University of Science and Technology,2019,43(3):280-285.
[4]Zhou Guodong,Su Jian. Named entity recognition using an HMM-based chunk tagger[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia,Pennsylvania,USA:Association for Computational Linguistics,2002:473-480.
[5]Chieu H L,Ng H T. Named entity recognition:A maximum entropy approach using global information[C]//Proceedings of the 19th international conference on Computational linguistics. Philadelphia,Pennsylvania,USA:Association for Computational Linguistics,2002:1-7.
[6]McCallum A,Freitag D,Pereira F C N. Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco,California,USA:Morgan Kaufmann Publishers Inc,2000:591-598.
[7]Lafferty J D,McCallum A,Pereira F C N. Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning. San Francisco,California,USA:Morgan Kaufmann Publishers Inc,2001:282-289.
[8]Chiu J P C,Nichols E. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics,2016(4):357-370.
[9]Rrubaa Panchendrarajan,Aravindh Amaresan. Bidirectional LSTM-CRF for Named Entity Recognition[C]. //32nd Pacific Asia Conference on Language,Information and Computation. Hong Kong,China:Association for Computational Linguistics,2018:531-540.
[10]Lample G,Ballesteros M,Subramanian S,et al. Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. San Diego,California,USA:Association for Computational Linguistics,2016:260-270.
[11]Vikas Yadav,Rebecca Sharp,Steven Bethard. Deep affix features improve neural named entity recognizers[C]//Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. New Orleans,Louisiana,USA:Association for Computational Linguistics,2018:167-172.
[12]Rohini Srihari,Cheng Niu,Wei Li. A hybrid approach for named entity and sub-type tagging[C]//Proceedings of the 6th Conference on Applied Natural Language Processing. Seattle,Washington,USA:Association for Computational Linguistics,2000,247-254.
[13]李明扬,孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版),2019,59(6):461-467.
Li Mingyang,Kong Fang. Combined self-attention mechanism for named entity recognition in social media[J]. Journal of Tsinghua University(Science and Technology),2019,59(6):461-467.
[14]殷章志,李欣子,黄德根,李玖一. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报,2019,33(11):95-100.
Yin Zhangzhi,Li Xinzi,Huang Degen,et al. Chinese named entity recognition ensembled with character[J]. Journal of Chinese Information Processing,2019,33(11):95-100.
[15]林广和,张绍武,林鸿飞. 基于细粒度词表示的命名实体识别研究[J]. 中文信息学报,2018,32(11):62-71.
Lin Guanghe,Zhang Shaowu,Lin Hongfei. Named entity identification based on fine-grained word representation[J]. Journal of Chinese Information Processing,2018,32(11):62-71.
[16]Graves A,Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks,2005,18(5-6):602-610.
[17]Graves A,Mohamed A R,Hinton G. Speech recognition with deep recurrent neural networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing,Vancouver,Canada:IEEE,2013:6645-6649.

相似文献/References:

[1]张少辉,王迤冉.用于图像识别的稀疏高斯编码[J].南京理工大学学报(自然科学版),2016,40(01):61.
 Zhang Shaohui,Wang Yiran.Sparse Gaussian coding for image recognition[J].Journal of Nanjing University of Science and Technology,2016,40(06):61.
[2]王 林,董 楠.基于Gabor特征与卷积神经网络的人体轮廓提取[J].南京理工大学学报(自然科学版),2018,42(01):89.[doi:10.14177/j.cnki.32-1397n.2018.42.01.013]
 Wang Lin,Dong Nan.Human silhouette identification based on Gabor featureand convolutional neural network[J].Journal of Nanjing University of Science and Technology,2018,42(06):89.[doi:10.14177/j.cnki.32-1397n.2018.42.01.013]
[3]姚富光,钟先信,周靖超.粒计算:一种大数据融合智能建模新方法[J].南京理工大学学报(自然科学版),2018,42(04):503.[doi:10.14177/j.cnki.32-1397n.2018.42.04.017]
 Yao Fuguang,Zhong Xianxin,Zhou Jingchao.Granular computing:a new method of intelligent modelingfor big data fusion[J].Journal of Nanjing University of Science and Technology,2018,42(06):503.[doi:10.14177/j.cnki.32-1397n.2018.42.04.017]
[4]吕 鲜,戚 湧,张伟斌.基于长短期记忆模型的交通拥堵预测方法[J].南京理工大学学报(自然科学版),2020,44(01):26.[doi:10.14177/j.cnki.32-1397n.2020.44.01.005]
 Lv Xian,Qi Yong,Zhang Weibin.Traffic congestion prediction method based onlong short-term memory model[J].Journal of Nanjing University of Science and Technology,2020,44(06):26.[doi:10.14177/j.cnki.32-1397n.2020.44.01.005]
[5]印 杰,蒋宇翔,牛博威,等.基于深度学习的网页篡改远程检测研究[J].南京理工大学学报(自然科学版),2020,44(01):49.[doi:10.14177/j.cnki.32-1397n.2020.44.01.008]
 Yin Jie,Jiang Yuxiang,Niu Bowei,et al.Remote detection of web page tampering based on deep learning[J].Journal of Nanjing University of Science and Technology,2020,44(06):49.[doi:10.14177/j.cnki.32-1397n.2020.44.01.008]
[6]张德磊,宋晓宁,於东军.基于统一划分的特征自适应行人再识别方法[J].南京理工大学学报(自然科学版),2020,44(03):266.[doi:10.14177/j.cnki.32-1397n.2020.44.03.002]
 Zhang Delei,Song Xiaoning,Yu Dongjun.Feature adaptive person re-identification method based on unified partition[J].Journal of Nanjing University of Science and Technology,2020,44(06):266.[doi:10.14177/j.cnki.32-1397n.2020.44.03.002]

备注/Memo

备注/Memo:
收稿日期:2020-06-21 修回日期:2020-11-06
基金项目:国家自然科学基金(71673249)
作者简介:牛迪(1985-),男,博士后,主要研究方向:金融和自然语言处理,E-mail:xiao_tie_jiang@126.com。
引文格式:牛迪. 基于多维度特征融合的中文命名实体识别系统[J]. 南京理工大学学报,2020,44(6):645-650.
投稿网址:http://zrxuebao.njust.edu.cn
更新日期/Last Update: 2020-12-30