Running Resilient Distributed Datasets Using DBSCAN on Apache Spark
Author(s):
AMARESH A M
Keywords:
RDD-DBSCAN, DBSCAN, RDDs
Abstract
The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets.
This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion.
This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.
Article Details
Unique Paper ID: 144018
Publication Volume & Issue: Volume 3, Issue 5
Page(s): 162 - 168
Article Preview & Download
Share This Article
Join our RMS
Conference Alert
NCSEM 2024
National Conference on Sustainable Engineering and Management - 2024