PARALLELIZATION METHODS OF DATA MINING ALGORITHMS: ENHANCING PERFORMANCE IN THE AGE OF BIG DATA

M.A Sattarov

Authors

M.A Sattarov Author

Keywords:

data mining, clustering, big data, dbscan, parallelization.

Abstract

The exponential growth of data in recent years has presented significant challenges for traditional data mining algorithms. These algorithms, often designed for sequential processing, struggle to handle the massive datasets common in modern applications. Parallelization offers a solution by distributing the computational workload across multiple processors or machines, leading to significant improvements in efficiency and scalability. This article explores the importance of parallelization in data mining, examines common parallelization techniques, and discusses their application to popular algorithms like k-means clustering and DBSCAN, including their mathematical foundations.

References

[1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

[2] Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2014). Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1), 97-107.

[3] Zaki, M. J., & Ho, C. T. (2000). Large-scale parallel data mining. Springer Science & Business Media.

[4] Foster, I. (1995). Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley Longman Publishing Co., Inc.

[5] Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing. Pearson Education.

[6] Quinn, M. J. (2003). Parallel programming in C with MPI and openMP. McGraw-Hill Higher Education.

[7] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(281-297), 14.

[8] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, Jan. 2008.

[9] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231).

[10] Patwary, M. A., Kumar, V., & Canberra, A. C. T. (2012). Scalable parallel DBSCAN algorithm using the disjoint-set data structure. In Proceedings of the 2012 Siam International Conference on Data Mining (pp. 835-846). Society for Industrial and Applied Mathematics.

[11] El-Sayed, A., Ruiz, C., & Morales, E. (2019). A survey of parallel programming models and tools in the era of big data. Journal of Grid Computing, 17, 209-243.

[12] Zhao, W., Ma, H., & He, Q. (2009). Parallel k-means clustering based on MapReduce. In Proceedings of the 1st international conference on cloud computing (pp. 674-679).

PARALLELIZATION METHODS OF DATA MINING ALGORITHMS: ENHANCING PERFORMANCE IN THE AGE OF BIG DATA

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Latest publications

Make a Submission

Browse

Developed By

Language

Information

Keywords