Seeding cluster centers of K-means clustering through median projection

Author	L. Suresh J.B. Simha R. Velur
Keywords	CSE
Abstract	K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses predefined number of clusters as input. The original algorithm is based on random selection of cluster centers and iteratively improving the results. However there are two major limitations in this approach. First, the need for number of clusters in advance, is difficult since the underlying structure is not known. Second selection of cluster centers randomly in local optima. In addition most of the K-means implementations are memory based structures limiting the data size. In this work, a novel approach to seeding the clusters with the latent data structure is proposed. This is expected to minimize: The need for number of clusters apriory, thereby reducing time for convergence by providing near optimal cluster centers. In addition the implementation of the algorithm is done in SQL, to provide the disk based solution, to handle large data sets, which cannot fit into memory. The proposed solution was tested on both row store and column store databases. The results are promising and the work is under progress to test in different domains. © 2010 IEEE.
Year of Conference	2010
Conference Name	CISIS 2010 - The 4th International Conference on Complex, Intelligent and Software Intensive Systems
Number of Pages	217-222, 5447429+
ISBN Number	978-076953967-6 (ISBN)
DOI	10.1109/CISIS.2010.133
	Conference Proceedings
Download citation	DOI Google Scholar
Cits	4