Running Resilient Distributed Datasets Using DBSCAN on Apache Spark

  • Unique Paper ID: 144018
  • PageNo: 162-168
  • Abstract:
  • The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.
add_icon3email to a friend

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{144018,
        author = {AMARESH A M},
        title = {Running Resilient Distributed Datasets Using DBSCAN on Apache Spark },
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {3},
        number = {5},
        pages = {162-168},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=144018},
        abstract = {The density based data clustering is used because of its ability to find shaped noisy cluster data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for inmemory computation of large data sets. 
This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. 
This paper presents a parallel DBSCAN algorithm on top of Apache Spark the experiment conducted in the paper shows that the proposed method can work well with maritime data.},
        keywords = { RDD-DBSCAN, DBSCAN, RDDs},
        month = {},
        }

Cite This Article

M, A. A. (). Running Resilient Distributed Datasets Using DBSCAN on Apache Spark . International Journal of Innovative Research in Technology (IJIRT), 3(5), 162–168.

Related Articles