PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS

Do Thanh Nghi, Pham Nguyen Khang, Nguyen Van Hoa, Ly Hoang Trong

Tóm tắt


The random forests algorithm proposed by Breiman is an ensemble-based approach with very high accuracy. The learning and classification tasks of a set of decision trees take a lot of time, make it intractable when dealing with very large datasets. There is a need to scale up the random forests algorithm to handle massive datasets. We propose parallel algorithms of random forests to take into account the benefits of Grids computing. These algorithms improve training and classification time compared with the original ones. The experimental results on large datasets including Forest cover type,KDD Cup 1999, Connect-4 from the UCI data repository showed that the training and classification time of parallel algorithms are significantly reduced.


Từ khóa


Random forest; Decision tree; Bagging; Boosting; MPI; Grids.



DOI: http://dx.doi.org/10.37569/DalatUniversity.3.2.247(2013)

Các bài báo tham chiếu

  • Hiện tại không có bài báo tham chiếu.


Copyright (c) 2013 Do Thanh Nghi, Pham Nguyen Khang, Nguyen Van Hoa, Ly Hoang Trong

Creative Commons License
Công trình này được cấp phép theo Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Văn phòng Tạp chí Đại học Đà Lạt
Nhà A25 - Số 1 Phù Đổng Thiên Vương, Đà Lạt, Lâm Đồng
Email: tapchikhoahoc@dlu.edu.vn - Điện thoại: (+84) 263 3 555 131

Creative Commons License
Trên nền tảng Open Journal Systems
Thực hiện bởi Khoa Công nghệ Thông tin