[1]徐敏,张丽萍,朱梧槚.基于Fisher线性判别式的层次文档分类[J].南京理工大学学报(自然科学版),2005,(04):460-463.
 XU Min~,ZHANG Li-ping~,ZHU Wu-jia~.Hierarchical Document Categorization Based on Fisher Linear Discriminant[J].Journal of Nanjing University of Science and Technology,2005,(04):460-463.
点击复制

基于Fisher线性判别式的层次文档分类
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
期数:
2005年04期
页码:
460-463
栏目:
出版日期:
2005-08-30

文章信息/Info

Title:
Hierarchical Document Categorization Based on Fisher Linear Discriminant
作者:
徐敏1张丽萍2朱梧槚1
南京航空航天大学1. 信息科学与技术学院; 2. 理学院, 江苏南京210016
Author(s):
XU Min~1 ZHANG Li-ping~2 ZHU Wu-jia~1
1.College of Information Science and Technology;2.College of Sciences,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
关键词:
特征选择 正特征词 负特征词 Fisher 线性判别式 层次文档分类
Keywords:
feature selection positive feature words negative feature words Fisher linear dicriminant hierarch-i cal document categorization
分类号:
TP311.52
摘要:
将文档按照主题进行层次分类,利用Fisher线性判别式的思想来提取每一类的正特征词和负特征词,给出基于Fisher线性判别式的层次文档分类算法(HDCF)。HDCF不仅克服一般层次分类算法中假定特征词之间必须满足独立性的条件,而且能处理一个文档涉及多个类的分类问题。在实验中,采用召全率和准确率2个指标与其它算法进行比较,结果表明:HDCF的效果好于其它算法。
Abstract:
To categorize documents hierarchically according to their topics, the thought of Fisher linear discrim-i nant is utilized to get positive feature words and negative feature words in each category, and the algorithm of a hierarchical document categorization is given based on Fisher linear discriminant (HDCF) . The algorithm overcomes the assumption that the feature words appear independently in documents and deals with the problem of a document involving more than one category. With comparision with other algorithms by using the measure of recall and precision in experiments, the results show HDCF is more effective than others.

参考文献/References:

[ 1] Apte C, Damerau F,Weiss S M. Automated learning of dec-i sion rules for text categorization[ J] . ACM Transactions on Information Systems, 1994, 12( 3) , 233- 251.
[ 2] William W, Cohen, Singer Y. Context sensitive learning methods for text categorization [ A] . In Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval[ C] . Zurich, Switzerland: ACM Press, 1996. 307- 315.
[ 3] Hersh W R, Buckley C, Leone T J. Ohsumed: An interactive retrieval evaluation and new large test collection for research [ A] . In Proceedings of the 17th ACM- SIGIR Conference on Research and Development in Information Retrieval [ C] . Doblin, Ireland: ACM Dress, 1994. 192- 201.
[ 4] D. Alessio S, Murray K, Schiaffino R. The effect of topological structure on hierarchical text categorization[ A] . In Proceedings of COLING- ACL. 98[ C] . Quebec, Canadas: Morgam Kaufmam, 1998. 236- 250.
[ 5] Joachims T. Text categorization with support vector machines: Learning with many relevant feature[ A] . In Proceedings 10th European Conference on Machine Learning [ C ] . Berlin: springer, 1998. 137- 142.
[ 6] Yang Y, Chute C G. A linear least squares fit mapping method for information retrieval from natural language texts [ A] . In Proc COLING. 92[ C] . Nantes, France: ICCL, 1992. 447 - 453.
[ 7] Koller D D, SahamiM.Towards optimal feature selection[ A] . International Conference on Machine Learning [ C] . Bari, Italy: Morgan Kanfmann, 1996. 284- 292.
[ 8] Koller D, SahamiM. Hierarchically classifying documents using very few words [ A] . In Proc ICML- 97[ C] . Nashville, Tennessee: Morgan Kaufmann, 1997. 170- 176.
[ 9] Chakrabarti S, Dom B, Agrawal R. Using taxonomy, discrim-i nants, and signatures for navigating in text databases[ A] . In Proc of the 23rd Int. l Conference on Very Large Data Bases [ C] . Athens, Greece: Morgan Kaufmann, 1997. 446- 455.
[ 10] Sahami. Learning limited dependence Bayesian classifiers [ A] . In Proc KDD- 96[ C] . Portland, Oregon: Kluner Academic, 1996. 335- 338.
[ 11] Schapire R E, Singer Y. Boosttexter : A boosting- based system for text categorization[ J] . Machine Learning, 2000, 39 ( 2/ 3) : 135- 168.
[ 12] Yang Y. An evaluation of statistical approaches to text categorization[ R] . Morgan Kaufmann: Computer Science Department, Carnegie Mellon University, 1997. 127- 141.
[ 13] Duda R O, Hart P E, Stork D G. Pattern classification[ M] ( 2nd ed) . New York: John Wiley Sons Inc, 2001. 117 - 121.
[ 14] Glover E, Pennock D M, Lawrence S. Inferring hierachical descriptions[ A] . In Proc CIKM. 02[ C] . McLean, Virginia, USA:Acm Press, 2002. 123- 131.

相似文献/References:

[1]潘 俊,王瑞琴,孔繁胜.基于结构和约束保持的半监督特征选择[J].南京理工大学学报(自然科学版),2014,38(04):518.
 Pan Jun,Wang Ruiqin,Kong Fansheng.Semi-supervised feature selection based on structure and constraints preserving[J].Journal of Nanjing University of Science and Technology,2014,38(04):518.
[2]林 棋,张 宏,李千目.一种基于MA-LSSVM的封装式特征选择算法[J].南京理工大学学报(自然科学版),2016,40(01):10.
 Lin Qi,Zhang Hong,Li Qianmu.Wrapper feature selection algorithm based on MA-LSSVM[J].Journal of Nanjing University of Science and Technology,2016,40(04):10.

备注/Memo

备注/Memo:
“973”国家重点基础研究发展规划项目(G1999032701)
作者简介: 徐敏( 1971- ) , 男, 江苏泰兴人, 讲师, 博士生, 主要研究方向: web 数据挖掘、机器学习、人工智能等, E-mail: keenxu@yahoo. com;
 通讯作者: 朱梧木贾( 1934- ) , 男, 江苏宜兴人, 教授, 博士生导师, 主要研究方向: 数学基础、无穷观、人工智能等, E-mail: zlz cn@ hotmail. com。
更新日期/Last Update: 2013-03-03