Seeding cluster centers of K-means clustering through median projection
Author | |
---|---|
Keywords | |
Abstract |
K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses predefined number of clusters as input. The original algorithm is based on random selection of cluster centers and iteratively improving the results. However there are two major limitations in this approach. First, the need for number of clusters in advance, is difficult since the underlying structure is not known. Second selection of cluster centers randomly in local optima. In addition most of the K-means implementations are memory based structures limiting the data size. In this work, a novel approach to seeding the clusters with the latent data structure is proposed. This is expected to minimize: The need for number of clusters apriory, thereby reducing time for convergence by providing near optimal cluster centers. In addition the implementation of the algorithm is done in SQL, to provide the disk based solution, to handle large data sets, which cannot fit into memory. The proposed solution was tested on both row store and column store databases. The results are promising and the work is under progress to test in different domains. © 2010 IEEE. |
Year of Conference |
2010
|
Conference Name |
CISIS 2010 - The 4th International Conference on Complex, Intelligent and Software Intensive Systems
|
Number of Pages |
217-222, 5447429+
|
ISBN Number |
978-076953967-6 (ISBN)
|
DOI |
10.1109/CISIS.2010.133
|
Conference Proceedings
|
|
Download citation | |
Cits |
4
|