Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
@article{150490, author = {Chandini Giri and R Satya Ravindra Babu}, title = {Quantification Of Variation Of Results Of The K-Means Clustering Algorithm Run Ten Times On The First Twenty Five Primes – A Criterion For Applicability Of The K-Means Clustering }, journal = {International Journal of Innovative Research in Technology}, year = {}, volume = {7}, number = {7}, pages = {19-30}, issn = {2349-6002}, url = {https://ijirt.org/article?manuscript=150490}, abstract = {Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different. In other words, we try to find homogeneous subgroups within the data such that data points in each cluster are as similar as possible according to a similarity measure such as euclidean-based distance or correlation-based distance. The decision of which similarity measure to use is application-specific. K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. K - Means clustering algorithm is a scheme for clustering continuous and numeric data. As K-Means algorithm consists of scheme of random initialization of centroids, every time it is run, it gives different or slightly different results because it may reach some local optima. Quantification of such aforementioned variation is of some importance as this sheds light on the nature of the Discrete K-Means Objective function with regards its maxima and minima. The K-Means Clustering algorithm aims at minimizing the aforementioned Objective function. In this research investigation, the author has attempted to quantify the variation of results of the K-Means Clustering Algorithm, run 10 times on the first 25 Prime numbers. Also, a notion of Percentage Uncertainty of clustering assignment for each data set point is computed for each run of the K- Means Clustering Algorithm. Also, a criterion is proposed for the applicability of K-Means Clustering Algorithm for the given data set.}, keywords = {K-Means Clustering, Clustering Uncertainty }, month = {}, }
Cite This Article
Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.
Join NowNational Conference on Sustainable Engineering and Management - 2024 Last Date: 15th March 2024
Submit inquiry