|Table of Contents|

Hierarchical Document Categorization Based on Fisher Linear Discriminant

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2005年04期
Page:
460-463
Research Field:
Publishing date:

Info

Title:
Hierarchical Document Categorization Based on Fisher Linear Discriminant
Author(s):
XU Min~1 ZHANG Li-ping~2 ZHU Wu-jia~1
1.College of Information Science and Technology;2.College of Sciences,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
Keywords:
feature selection positive feature words negative feature words Fisher linear dicriminant hierarch-i cal document categorization
PACS:
TP311.52
DOI:
-
Abstract:
To categorize documents hierarchically according to their topics, the thought of Fisher linear discrim-i nant is utilized to get positive feature words and negative feature words in each category, and the algorithm of a hierarchical document categorization is given based on Fisher linear discriminant (HDCF) . The algorithm overcomes the assumption that the feature words appear independently in documents and deals with the problem of a document involving more than one category. With comparision with other algorithms by using the measure of recall and precision in experiments, the results show HDCF is more effective than others.

References:

[ 1] Apte C, Damerau F,Weiss S M. Automated learning of dec-i sion rules for text categorization[ J] . ACM Transactions on Information Systems, 1994, 12( 3) , 233- 251.
[ 2] William W, Cohen, Singer Y. Context sensitive learning methods for text categorization [ A] . In Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval[ C] . Zurich, Switzerland: ACM Press, 1996. 307- 315.
[ 3] Hersh W R, Buckley C, Leone T J. Ohsumed: An interactive retrieval evaluation and new large test collection for research [ A] . In Proceedings of the 17th ACM- SIGIR Conference on Research and Development in Information Retrieval [ C] . Doblin, Ireland: ACM Dress, 1994. 192- 201.
[ 4] D. Alessio S, Murray K, Schiaffino R. The effect of topological structure on hierarchical text categorization[ A] . In Proceedings of COLING- ACL. 98[ C] . Quebec, Canadas: Morgam Kaufmam, 1998. 236- 250.
[ 5] Joachims T. Text categorization with support vector machines: Learning with many relevant feature[ A] . In Proceedings 10th European Conference on Machine Learning [ C ] . Berlin: springer, 1998. 137- 142.
[ 6] Yang Y, Chute C G. A linear least squares fit mapping method for information retrieval from natural language texts [ A] . In Proc COLING. 92[ C] . Nantes, France: ICCL, 1992. 447 - 453.
[ 7] Koller D D, SahamiM.Towards optimal feature selection[ A] . International Conference on Machine Learning [ C] . Bari, Italy: Morgan Kanfmann, 1996. 284- 292.
[ 8] Koller D, SahamiM. Hierarchically classifying documents using very few words [ A] . In Proc ICML- 97[ C] . Nashville, Tennessee: Morgan Kaufmann, 1997. 170- 176.
[ 9] Chakrabarti S, Dom B, Agrawal R. Using taxonomy, discrim-i nants, and signatures for navigating in text databases[ A] . In Proc of the 23rd Int. l Conference on Very Large Data Bases [ C] . Athens, Greece: Morgan Kaufmann, 1997. 446- 455.
[ 10] Sahami. Learning limited dependence Bayesian classifiers [ A] . In Proc KDD- 96[ C] . Portland, Oregon: Kluner Academic, 1996. 335- 338.
[ 11] Schapire R E, Singer Y. Boosttexter : A boosting- based system for text categorization[ J] . Machine Learning, 2000, 39 ( 2/ 3) : 135- 168.
[ 12] Yang Y. An evaluation of statistical approaches to text categorization[ R] . Morgan Kaufmann: Computer Science Department, Carnegie Mellon University, 1997. 127- 141.
[ 13] Duda R O, Hart P E, Stork D G. Pattern classification[ M] ( 2nd ed) . New York: John Wiley Sons Inc, 2001. 117 - 121.
[ 14] Glover E, Pennock D M, Lawrence S. Inferring hierachical descriptions[ A] . In Proc CIKM. 02[ C] . McLean, Virginia, USA:Acm Press, 2002. 123- 131.

Memo

Memo:
-
Last Update: 2013-03-03