Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

  • Unique Paper ID: 144018
  • Volume: 3
  • Issue: 5
  • PageNo: 162-168
  • Abstract:
  • The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.
email to a friend

Cite This Article

  • ISSN: 2349-6002
  • Volume: 3
  • Issue: 5
  • PageNo: 162-168

Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

Related Articles