Quantification Of Variation Of Results Of The K-Means Clustering Algorithm Run Ten Times On The First Twenty Five Primes – A Criterion For Applicability Of The K-Means Clustering
Author(s):
Chandini Giri, R Satya Ravindra Babu
DOI Number:
10.6084/m9.figshare.13650251
Keywords:
K-Means Clustering, Clustering Uncertainty
Abstract
Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different. In other words, we try to find homogeneous subgroups within the data such that data points in each cluster are as similar as possible according to a similarity measure such as euclidean-based distance or correlation-based distance. The decision of which similarity measure to use is application-specific. K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. K - Means clustering algorithm is a scheme for clustering continuous and numeric data. As K-Means algorithm consists of scheme of random initialization of centroids, every time it is run, it gives different or slightly different results because it may reach some local optima. Quantification of such aforementioned variation is of some importance as this sheds light on the nature of the Discrete K-Means Objective function with regards its maxima and minima. The K-Means Clustering algorithm aims at minimizing the aforementioned Objective function. In this research investigation, the author has attempted to quantify the variation of results of the K-Means Clustering Algorithm, run 10 times on the first 25 Prime numbers. Also, a notion of Percentage Uncertainty of clustering assignment for each data set point is computed for each run of the K- Means Clustering Algorithm. Also, a criterion is proposed for the applicability of K-Means Clustering Algorithm for the given data set.
Article Details
Unique Paper ID: 150490

Publication Volume & Issue: Volume 7, Issue 7

Page(s): 19 - 30
Article Preview & Download


Share This Article

Join our RMS

Conference Alert

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024

Last Date: 15th March 2024

Call For Paper

Volume 10 Issue 10

Last Date for paper submitting for March Issue is 25 June 2024

About Us

IJIRT.org enables door in research by providing high quality research articles in open access market.

Send us any query related to your research on editor@ijirt.org

Social Media

Google Verified Reviews