|Table of Contents|

Hierarchical Document Categorization Based on Fisher Linear Discriminant


Research Field:
Publishing date:


Hierarchical Document Categorization Based on Fisher Linear Discriminant
XU Min~1 ZHANG Li-ping~2 ZHU Wu-jia~1
1.College of Information Science and Technology;2.College of Sciences,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
feature selection positive feature words negative feature words Fisher linear dicriminant hierarch-i cal document categorization
To categorize documents hierarchically according to their topics, the thought of Fisher linear discrim-i nant is utilized to get positive feature words and negative feature words in each category, and the algorithm of a hierarchical document categorization is given based on Fisher linear discriminant (HDCF) . The algorithm overcomes the assumption that the feature words appear independently in documents and deals with the problem of a document involving more than one category. With comparision with other algorithms by using the measure of recall and precision in experiments, the results show HDCF is more effective than others.


[ 1] Apte C, Damerau F,Weiss S M. Automated learning of dec-i sion rules for text categorization[ J] . ACM Transactions on Information Systems, 1994, 12( 3) , 233- 251.
[ 2] William W, Cohen, Singer Y. Context sensitive learning methods for text categorization [ A] . In Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval[ C] . Zurich, Switzerland: ACM Press, 1996. 307- 315.
[ 3] Hersh W R, Buckley C, Leone T J. Ohsumed: An interactive retrieval evaluation and new large test collection for research [ A] . In Proceedings of the 17th ACM- SIGIR Conference on Research and Development in Information Retrieval [ C] . Doblin, Ireland: ACM Dress, 1994. 192- 201.
[ 4] D. Alessio S, Murray K, Schiaffino R. The effect of topological structure on hierarchical text categorization[ A] . In Proceedings of COLING- ACL. 98[ C] . Quebec, Canadas: Morgam Kaufmam, 1998. 236- 250.
[ 5] Joachims T. Text categorization with support vector machines: Learning with many relevant feature[ A] . In Proceedings 10th European Conference on Machine Learning [ C ] . Berlin: springer, 1998. 137- 142.
[ 6] Yang Y, Chute C G. A linear least squares fit mapping method for information retrieval from natural language texts [ A] . In Proc COLING. 92[ C] . Nantes, France: ICCL, 1992. 447 - 453.
[ 7] Koller D D, SahamiM.Towards optimal feature selection[ A] . International Conference on Machine Learning [ C] . Bari, Italy: Morgan Kanfmann, 1996. 284- 292.
[ 8] Koller D, SahamiM. Hierarchically classifying documents using very few words [ A] . In Proc ICML- 97[ C] . Nashville, Tennessee: Morgan Kaufmann, 1997. 170- 176.
[ 9] Chakrabarti S, Dom B, Agrawal R. Using taxonomy, discrim-i nants, and signatures for navigating in text databases[ A] . In Proc of the 23rd Int. l Conference on Very Large Data Bases [ C] . Athens, Greece: Morgan Kaufmann, 1997. 446- 455.
[ 10] Sahami. Learning limited dependence Bayesian classifiers [ A] . In Proc KDD- 96[ C] . Portland, Oregon: Kluner Academic, 1996. 335- 338.
[ 11] Schapire R E, Singer Y. Boosttexter : A boosting- based system for text categorization[ J] . Machine Learning, 2000, 39 ( 2/ 3) : 135- 168.
[ 12] Yang Y. An evaluation of statistical approaches to text categorization[ R] . Morgan Kaufmann: Computer Science Department, Carnegie Mellon University, 1997. 127- 141.
[ 13] Duda R O, Hart P E, Stork D G. Pattern classification[ M] ( 2nd ed) . New York: John Wiley Sons Inc, 2001. 117 - 121.
[ 14] Glover E, Pennock D M, Lawrence S. Inferring hierachical descriptions[ A] . In Proc CIKM. 02[ C] . McLean, Virginia, USA:Acm Press, 2002. 123- 131.


Last Update: 2013-03-03