Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark
Author(s):
AMARESH A M
Keywords:
RDD-DBSCAN, DBSCAN, RDDs
Abstract
The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.
Article Details
Unique Paper ID: 144018 Publication Volume & Issue: Volume 3, Issue 5 Page(s): 162 - 168
Article Preview & Download

Share This Article

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024

Last Date: 15th March 2024

IJIRT.org enables door in research by providing high quality research articles in open access market.

Send us any query related to your research on editor@ijirt.org