PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS

Authors

  • Do Thanh Nghi College of Information Technology, Cantho University, Viet Nam
  • Pham Nguyen Khang College of Information Technology, Cantho University, Viet Nam
  • Nguyen Van Hoa Faculty of Technology, Engineering and Environment, Angiang University, Viet Nam
  • Ly Hoang Trong College of Information Technology, Cantho University, Viet Nam

DOI:

https://doi.org/10.37569/DalatUniversity.3.2.247(2013)

Keywords:

Random forest, Decision tree, Bagging, Boosting, MPI, Grids.

Abstract

The random forests algorithm proposed by Breiman is an ensemble-based approach with very high accuracy. The learning and classification tasks of a set of decision trees take a lot of time, make it intractable when dealing with very large datasets. There is a need to scale up the random forests algorithm to handle massive datasets. We propose parallel algorithms of random forests to take into account the benefits of Grids computing. These algorithms improve training and classification time compared with the original ones. The experimental results on large datasets including Forest cover type,KDD Cup 1999, Connect-4 from the UCI data repository showed that the training and classification time of parallel algorithms are significantly reduced.

Downloads

Download data is not yet available.

Published

30-06-2013

Volume and Issues

Section

Natural Sciences and Technology

How to Cite

Nghi, D. T., Khang, P. N., Hoa, N. V., & Trong, L. H. (2013). PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS. Dalat University Journal of Science, 3(2), 21-31. https://doi.org/10.37569/DalatUniversity.3.2.247(2013)