AUTHORS: Jung Song Lee, Han Hee Hahm, Soon Cheol Park
Download as PDF
ABSTRACT: In this paper, a novel text clustering technique is proposed to summarize text documents. The clustering method, so called ‘Ensemble Clustering Method’, combines both genetic algorithms (GA) and particle swarm optimization (PSO) efficiently and automatically to get the best clustering results. The summarization with this clustering method is to effectively avoid the redundancy in the summarized document and to show the good summarizing results, extracting the most significant and non-redundant sentence from clustering sentences of a document. We tested this technique with various text documents in the open benchmark datasets, DUC01 and DUC02. To evaluate the performances, we used F-measure and ROUGE. The experimental results show that the performance capability of our method is about 11% to 24% better than other summarization algorithms.
KEYWORDS: Text Summarization, Extractive Summarization, Ensemble Clustering, Genetic Algorithms, Particle Swarm Optimization
REFERENCES:
[1] Aliguliyev, R. M., A new sentence similarity measure and sentence based extractive technique for automatic summarization, Expert System with Applications, Vol.36, No.4, 2009, pp. 7764-7772.
[2] Choi, L. C., Choi, K. Ung., and Park, S. C., An automatic semantic term-network construction system, In International Symposium on Computer Science and its Applications, 2008, pp. 48-51.
[3] Cilibrasi, R. L. and Vitányi, P. M., The Google similarity distance, IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.3, 2007, pp. 370-383.
[4] Cui, X. and Potok, T. E., Document clustering analysis based on hybrid PSO+ K-means algorithm, Journal of Computer Sciences, 2007, pp. 27-33.
[5] Cui, X., Potok, T. E., and Palathingal, P., Document clustering using particle swarm optimization, In Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005, pp. 185- 191.
[6] Cutting, D. R., Karger, D. R., Redersen, J. O., and Tukey, J. W., Scatter/gather: A clusterbased approach to browsing large document collections, In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp. 318-329
[7] Dunlavy, D. M., O’Leary, D. P., Conroy, J. M., and Schlesinger, J. D., QCS: A system for querying, clustering and summarizing documents, Information Processing and Management, Vol.43, No.6, 2007, pp. 1588- 1605.
[8] Fattah, M. A. and Ren, F., GA, MR, FFNN, PNN and GMM based models for automatic text summarization, Computer Speech Language, Vol.23, No.1, 2009, pp. 126-144.
[9] Fragoudis, D., Meretakis, D., and Likothanassis, S., Best terms: an efficient feature-selection algorithm for text categorization, Knowledge Information System, Vol.8, No.1, 2005, pp. 16-33.
[10] Holland, J. H., Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence, University of Michigan Press, 1975.
[11] James K. and Russell E., Particle swarm optimization, In Proceedings of IEEE International Conference on Neural Networks, 1995, pp. 1942–1948.
[12] Kowalski, G., Information retrieval systems: Theory and implementation, Computer Mathematics Applications, Vol.5, No.35, 1998.
[13] Kupiec, J., Pedersen, J., and Chen, F., A trainable document summarizer, In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995, pp. 68-73.
[14] Li, Y.,Luo, C., and Chung, S. M., Text clustering with feature selection by using statistical data, IEEE Transactions on Knowledge and Data Engineering, Vol.20, No.5, 2008, pp. 641-652.
[15] Lin, C. Y. and Hovy, E., Automatic evaluation of summaries using N-gram co-occurrence statistics, In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003, pp. 71-78.
[16] Mihalcea, R. and Ceylan, H., Explorations in automatic book summarization, In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007, pp. 28-30.
[17] Mitra, V., Wang, C. J., and Banerjee, S., Text classification: A least square support vector machine approach, Applied Soft Computing, Vol.7, No.3, 2007, pp. 908-914.
[18] Mohan, B. C. and Baskaran. R., A survey: Ant colony optimization based recent research and implementation on several engineering domain, Expert System with Applications, Vol.39, No.4, 2012, pp. 4618-4627.
[19] Pavan, M. and Pelillo, M., Dominant sets and pairwise clustering, IEEE Transactions on Pattern Analysis, Vol.29, No.1, 2007, pp. 167- 172.
[20] Shelokar, P.S., Jayaraman, V.K., and Kulkarni, B.D., An ant colony approach for clustering, Analytica Chimica Acta, Vol.509, No.2, 2004, pp. 187-195.
[21] Shen, D., Sun, J. T., Li, H., Yang, Q., and Chen, Z., Document summarization using conditional random fields, In Proceedings of IJCAI, 2007, pp. 2862-2867.
[22] Shi, Y. and Eberhart, R., A modified particle swarm optimizer, In Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence, the 1998 IEEE International Conference on, pp. 69-73.
[23] Shi, Y. and Eberhart, R., Fuzzy adaptive particle swarm optimization, In Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, pp. 101-106.
[24] Song, W. and Park, S. C., Genetic algorithm for text clustering based on latent semantic indexing, Computer and Mathematics with Applications, Vol.57, No.11, 2009, pp. 1901- 1907.
[25] Song, W., Qiao, Y., Park, S. C., and Qian, X., A hybrid evolutionary computation approach with its application for optimizing text document clustering, Expert System with Applications, Vol.42, No.5, 2015, pp. 2517- 2524.
[26] Svore, K. M., Vanderwende, L., and Burges, C. J., Enhancing single-document summarization by combining RankNet and third-party sources, In Proceedings of the EMNLP-CoNLL, 2007, pp. 448-457.
[27] Wan, X., Using only cross-document relationships for both generic and topic-focused multi-document summarizations, Information Retrieval, Vol.11, No.1, 2008, pp. 25-49.
[28] Wan, X., Yang, J., and Xiao, J., Manifoldranking based topic-focused multi-document summarization, In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007, pp. 2903-2908.
[29] Yang, S. and Park, S. C., Generation of Nonredundant Summary Based on Sum of Similarity and Semantic Analysis, In Information Retrieval Workshop, 2005, pp. 11- 15
[30] Yeh, J. Y., Ke, H. R., Yang, W. P., and Meng, I. H., Text summarization using a trainable sum-marizer and latent semantic analysis, Information Processing Management, Vol.41, No.1, 2005, pp. 75-95.