AUTHORS: Aparna K., Mydhili K. Nair
Download as PDF
ABSTRACT: The task of clustering is to group the data items that are similar into different clusters in such a way that the similarity within each cluster is high and the dissimilarity between the clusters is also high. A novel partitional clustering algorithm called HB K-Means algorithm (High Dimensional Bisecting K-Means) based on high dimensional data set was developed in our previous work. In order to improve this novel algorithm, constraints such as Stability based measure and Mean Square Error (MSE) were incorporated resulting in CHB K-Means (Constraint Based HB K-Means) algorithm. In addition to these constraints, cluster compactness and density are also important to obtain better clustering results. In this paper, it is proposed to develop a MultiObjective Optimization (MOO) technique by including different indices such as DB-Index, XB-Index and Sym-Index. These three indices will be used as fitness function for the proposed Fractional Genetic PSO algorithm (FGPSO) which is the hybrid optimization algorithm to do the clustering process. The performance of this optimization algorithm is evaluated based on parameters such as Clustering Accuracy and Time Computation by executing the algorithm on some of the benchmark datasets taken from UCI Machine Learning Repository.
KEYWORDS: Partitional Clustering, Multi-Objective Optimization, DB Index, XB-Index, Sym-index, Fractional Genetic PSO Algorithm (FGPSO)
REFERENCES:
[1] Aparna K and Mydhili K Nair, “Comprehensive Study and Analysis of Partitional Data Clustering Techniques”, International Journal of Business Analytics, Vol 2, Issue 1, pp. 23 – 38, January-March 2015.
[2] Aparna K and Mydhili K Nair, “HB-K Means: An Algorithm for High Dimensional Data Clustering Using Bisecting K-Means”, Submitted for publication in International Journal of Computational Science and Engineering, Inderscience Publications.
[3] Aparna K and Mydhili K Nair, “CHB-K Means Algorithm: Incorporating Constraints to HB KMeans Algorithm”, Submitted to IETE Journal of Research.
[4] I E Evangelou. DG Hadjimitsis, A A Lazakidou, C Clayton, ”Data Mining and Knowledge Discovery in Complex Image Data using Artificial Neural Networks”, Workshop on Complex Reasoning an Geographical Datal Cyprus, 2001.
[5] T Lillesand, Ralph W Keifer & Jonathan Chipman, “Remote Sensing and Image Interpretation”, John Wiley & Sons. 1994.
[6] H C Andrews, “Introduction to Mathematical Techniques in Pattern Recognition”, John Wiley & Sons, 1972.
[7] M R Rao, “Cluster Analysis and Mathematical Programming”, Journal of the American Statistical Association, Vol. 22, pp 622-626, 1971.
[8] A. K. Jain, M. N. Murty, and P. J. Flynn, 'Data clustering: A review,' ACM Computing Surveys, Vol. 31, pp. 264-323, 1999.
[9] JR Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Inc. Publishers, 1993.
[10] G Potgieter, “Mining Continuous Classes using Evolutionary Computing”, Department of Computer Science, University of Pretoria, Pretoria, South Africa. 2002.
[11] Dharmendra K Roy and Lokesh K Sharma, “Genetic k-means clustering algorithm for mixed numeric and categorical datasets', International Journal of Artificial Intelligence and Applications, Vol.1, No.2, 2010.
[12] B. Everitt, S. Landau, and M. Leese, “Cluster Analysis”, London: Arnold, 2001.
[13] A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[14] A. Baraldi and E. Alpaydin, “Constructive feed forward ART clustering networks—Part I and II”, IEEE Trans. Neural Networks, Vol. 13, No. 3, pp. 645–677, 2002.
[15] V. Cherkassky and F. Mulier, “Learning From Data: Concepts, Theory, and Methods”, New York: Wiley, 1998.
[16] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, 'An Efficient K-Means Clustering Algorithm: Analysis and Implementation', IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, 2002.
[17] Ujjwal Maulik and Sanghamitra Bandyopadhyay, “Genetic algorithm-based clustering technique”, Pattern Recognition, Vol. 33 ,pp.1455-1465, 2000.
[18] Sanghamitra Bandyopadhyay, Ujjwal Maulik and Malay Kumar Pakhira, “Clustering using simulated annealing with probabilistic redistribution”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 15, no. 2, pp. 269-285, 2001.
[19] Weihui Dai, Shouji Liu and Shuyi Liang, “An Improved Ant Colony Optimization Cluster Algorithm Based on Swarm Intelligence” Journal of software, Vol. 4, No. 4, 2009.
[20] Jayshree Ghorpade - Aher and Vishakha A. Metre, “Clustering Multidimensional Data with PSO based Algorithm”, Soft Computing and Artificial Intelligence, 2014.
[21] H.S Behera, Rosly Boy Lingdoh And Diptendra Kodamasingh, 'An Improved Hybridized K-Means Clustering Algorithm (IHKMCA) For High dimensional Dataset & its Performance Analysis', International Journal On Computer Science And Engineering (IJCSE), Vol. 3, No. 3, pp. 1183-1190, 2011.
[22] Tulin Inkaya, Sinan Kayalıgil and Nur Evin Ozdemirel, “Ant Colony Optimization based Clustering Methodology”, Applied Soft Computing, vol. 28, pp. 301-311, 2015.
[23] Liyong Zhang, Witold Pedrycz, Wei Lu, Xiaodong Liu and Li Zhang “An interval weighed fuzzy C-Means clustering by genetically guided alternating optimization” Expert Systems with Applications, vol. 41, no. 13, pp.5960-5971, 2014.
[24] Spambase Data Set from https://archive.ics.uci.edu/ml/datasets/Spambase
[25] Localization data for person activity dataset https://archive.ics.uci.edu/ml/datasets/Localizat ion+data+for+person+activity.
[26] Pen-Based Recognition of Handwritten Digits Data Set from “https://archive.ics.uci.edu/ml/datasets/PenBased+Recognition+of+Handwritten+Digits”.