Seeding cluster centers of K-means clustering through median projection

Author
Keywords
Abstract

K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses predefined number of clusters as input. The original algorithm is based on random selection of cluster centers and iteratively improving the results. However there are two major limitations in this approach. First, the need for number of clusters in advance, is difficult since the underlying structure is not known. Second selection of cluster centers randomly in local optima. In addition most of the K-means implementations are memory based structures limiting the data size. In this work, a novel approach to seeding the clusters with the latent data structure is proposed. This is expected to minimize: The need for number of clusters apriory, thereby reducing time for convergence by providing near optimal cluster centers. In addition the implementation of the algorithm is done in SQL, to provide the disk based solution, to handle large data sets, which cannot fit into memory. The proposed solution was tested on both row store and column store databases. The results are promising and the work is under progress to test in different domains. © 2010 IEEE.

Year of Conference
2010
Conference Name
CISIS 2010 - The 4th International Conference on Complex, Intelligent and Software Intensive Systems
Number of Pages
217-222, 5447429+
ISBN Number
978-076953967-6 (ISBN)
DOI
10.1109/CISIS.2010.133
Conference Proceedings
Download citation
Cits
4
CIT

For admissions and all other information, please visit the official website of

Cambridge Institute of Technology

Cambridge Group of Institutions

Contact

Web portal developed and administered by Dr. Subrahmanya S. Katte, Dean - Academics.

Contact the Site Admin.