Simplifying Random Forests using Diversity

WSEAS Transactions on Computers

Print ISSN: 1109-2750
E-ISSN: 2224-2872

Volume 18, 2019

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

Simplifying Random Forests using Diversity

AUTHORS: Souad Taleb Zouggar, Abdelkader Adla

Download as PDF

ABSTRACT: In this paper, we propose a diversity measure for random forests simplification using both SFS and SBE paths. This is performed in two stages: 1) we use first an overproduce method which generates a large number of trees; 2) We use SFS and SBE paths combined with diversity measurement to reduce the initial ensemble of trees. The proposed method is applied to UCI Repository data sets. A comparative study of the two types of paths with a performance-based pruning method is given. The results are encouraging and allow obtaining ensembles of reduced sizes exceeding, in some cases, the performances of the initial forest and the method used for comparison.

KEYWORDS: Classification, CART Trees, Random Forests, Pruning, Diversity, Accuracy, Forward Selection, Backward Elimination.

REFERENCES:

[ 1] L. Breiman, Random Forests, Machine Learning, vol. 45, 2001, pp. 5-32.

[2] L. Breiman, Bagging Predictors, Machine Learning, vol. 26, no. 2, 1996, pp. 123-140.

[3] L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees, Chapman & Hall, New York, 1984.

[4] A. Asuncion and D. Newman, UCI machine learning repository, URL :http://www.ics.uci.edu/»mlearn/MLRep ository.html, 2007

[5] S. Bernard, L. Heutte and S. Adam, On the Selection of Decision Trees in Random Forest. IJCNN, 2009, pp. 302-307.

[6] L. Breiman, and A. Cutler, Random Forests, Berkeley. Available from http://www.stat.berkeley.edu/users/breiman/Ra ndomForests/, 2005

[7] S. Shlien, Multiple binary decision tree classifiers, Pattern Recognition, vol. 23, no. 7, pp. 757-763, 1990.

[8] S. Shlien, Nonparametric classification using matched binary decision trees, Pattern Recognition Letters, vol. 13, 1992, pp. 83-87.

[9] B.G. Buchanan, E.H. Shortliffe, Rule Based Expert Systems, Addison-Wesley, Reading, Massachusetts, 1984.

[10] P.L. Bogler, Shafer-Dempster reasoning with applications to multisensor target identification, IEEE Trans. Sys. Man. Cyb., vol. 17, 1987, pp. 968-977.

[11] Y. Amit and D. Geman, “Shape quantization and recognition with randomized trees” Neural Computation, vol. 9, 1997, pp. 1545–1588.

[12] P. Latinne, O. Debeir and C. Decaestecker, Limiting the number of trees in random forests, 2nd International Workshop on Multiple Classifier Systems, 2001, pp. 178- 187.

[13] S. Salzberg, S., “On comparing classifiers: Pitfalls to avoid and a recommended approach”, Data Mining and knowledge discovery, vol. 1, 1997, pp. 317–327.

[14] F. Yang, W.H. Lu, L.K. Luo and T. Li, “Margin optimization based pruning for random forest”, Neurocomputing, vol. 94, 2012, pp. 54-63.

[15] F. Nan, J. Wang and V. Saligrama, Pruning random forests for prediction on a budget, NIPS, 2016.

[16] K. Dheenadayalan, G. Srinivasaraghavan and V.N. Muralidhara, Machine Learning and Data Mining, Pattern Recognition, 2016, pp. 516- 529,.

[17] M.N. Adnan and M.Z. Islam, Optimizing the number of trees in a decision forest to discover a sub-forest with high ensemble accuracy using a genetic algorithm, Knowledge-Based Systems, vol. 110, 2016, pp. 86-97.

[18] X. Jian, C. Wu and H. Guo, Forest Pruning Based on Branch Importance, Computational Intelligence and Neuroscience, 2017.

[19] S. Bernard, S. Adam and L. Heutte, Dynamic random forests, Pattern Recognition Letters, vol. 33, no 12, 2012, pp. 1580–1586.

[20] E. Tripoliti, D. Fotiadis and G. Manis, Dynamic Construction of Random Forests: Evaluation using Biomedical Engineering Problems, IEEE Society, 2010.

[21] V. Kulkarni, P. Sinha and A. Singh, Heuristic Based Improvements for Effective Random Forest Classifier, Proceedings of International Conference on Computational Intelligence, Chennai, India, 2012.

[22] L.I. Kuncheva and C. J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, vol. 51, 2003, pp. 181– 207.

[23] L. Breiman, Arcing classifiers, The Annals of Statistics, vol. 26, no. 3, 1998, pp. 801–849.

[24] P. Geurts, D., Ernst and L. Wehenkel, Extremely randomized trees, Machine Learning, vol. 63, no. 1, 2006, pp. 3-42.

[25] J. Demsar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, vol. 7, 2006, pp. 1–30.

[26] P. Somol, P. Pudil and J. Kittler, Fast branch and bound algorithms for optimal feature selection, IEEE transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 7, 2004, pp. 900–912.

[27] S. Taleb Zouggar and A. Adla, A New Function for Ensemble Pruning, In F. Dargam et al. (Eds): ICDSST 2018, LNBIP 313, Springer International Publishing AG, 2018, pp. 181-190.

[28] Taleb Zouggar, S., Adla, A., 2018.A Diversity-Accuracy Measure for Homogenous Ensemble Selection, International Journal of Interactive Multimedia and Artificial Intelligence, 2018.

WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2872, Volume 18, 2019, Art. #7, pp. 62-69

Copyright © 2018 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Computers

Bulletin Board