Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

AMARESH A M

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Authors: AMARESH A M

Unique Paper ID: 144018
Volume: 3
Issue: 5
PageNo: 162-168

Keywords: RDD-DBSCAN DBSCAN RDDs

Abstract:
The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.

email to a friend

Cite This Article

ISSN: 2349-6002
Volume: 3
Issue: 5
PageNo: 162-168

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Available:https://ijirt.org/Article?manuscript=144018

Impact Factor
8.01 (Year 2024)

UGC Approved
Journal no 47859

Join Our IPN

IJIRT Partner Network

Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.

Join Now

Latest Publication

Recent Conferences

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024 Last Date: 15th March 2024

Submit inquiry

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Related Articles

Join Our IPN

IJIRT Partner Network

Latest Publication

Archive

Recent Conferences

NCSEM 2024