AUTHORS: Souad Taleb Zouggar, Abdelkader Adla
Download as PDF
ABSTRACT: In this paper, we propose a diversity measure for random forests simplification using both SFS and SBE paths. This is performed in two stages: 1) we use first an overproduce method which generates a large number of trees; 2) We use SFS and SBE paths combined with diversity measurement to reduce the initial ensemble of trees. The proposed method is applied to UCI Repository data sets. A comparative study of the two types of paths with a performance-based pruning method is given. The results are encouraging and allow obtaining ensembles of reduced sizes exceeding, in some cases, the performances of the initial forest and the method used for comparison.
KEYWORDS: Classification, CART Trees, Random Forests, Pruning, Diversity, Accuracy, Forward Selection, Backward Elimination.
REFERENCES:
[
1] L. Breiman, Random Forests, Machine
Learning, vol. 45, 2001, pp. 5-32.
[2] L. Breiman, Bagging Predictors, Machine
Learning, vol. 26, no. 2, 1996, pp. 123-140.
[3] L. Breiman, J.H. Friedman, R.A. Olshen and
C.J. Stone, Classification and Regression
Trees, Chapman & Hall, New York, 1984.
[4] A. Asuncion and D. Newman, UCI machine
learning repository,
URL :http://www.ics.uci.edu/»mlearn/MLRep
ository.html, 2007
[5] S. Bernard, L. Heutte and S. Adam, On the
Selection of Decision Trees in Random Forest.
IJCNN, 2009, pp. 302-307.
[6] L. Breiman, and A. Cutler, Random Forests,
Berkeley. Available from
http://www.stat.berkeley.edu/users/breiman/Ra
ndomForests/, 2005
[7] S. Shlien, Multiple binary decision tree
classifiers, Pattern Recognition, vol. 23, no. 7,
pp. 757-763, 1990.
[8] S. Shlien, Nonparametric classification using
matched binary decision trees, Pattern
Recognition Letters, vol. 13, 1992, pp. 83-87.
[9] B.G. Buchanan, E.H. Shortliffe, Rule Based
Expert Systems, Addison-Wesley, Reading,
Massachusetts, 1984.
[10] P.L. Bogler, Shafer-Dempster reasoning with
applications to multisensor target
identification, IEEE Trans. Sys. Man. Cyb.,
vol. 17, 1987, pp. 968-977.
[11] Y. Amit and D. Geman, “Shape quantization
and recognition with randomized trees” Neural
Computation, vol. 9, 1997, pp. 1545–1588.
[12] P. Latinne, O. Debeir and C. Decaestecker,
Limiting the number of trees in random
forests, 2nd International Workshop on
Multiple Classifier Systems, 2001, pp. 178-
187.
[13] S. Salzberg, S., “On comparing classifiers:
Pitfalls to avoid and a recommended
approach”, Data Mining and knowledge
discovery, vol. 1, 1997, pp. 317–327.
[14] F. Yang, W.H. Lu, L.K. Luo and T. Li,
“Margin optimization based pruning for
random forest”, Neurocomputing, vol. 94,
2012, pp. 54-63.
[15] F. Nan, J. Wang and V. Saligrama, Pruning
random forests for prediction on a budget,
NIPS, 2016.
[16] K. Dheenadayalan, G. Srinivasaraghavan and
V.N. Muralidhara, Machine Learning and Data
Mining, Pattern Recognition, 2016, pp. 516-
529,.
[17] M.N. Adnan and M.Z. Islam, Optimizing the
number of trees in a decision forest to discover
a sub-forest with high ensemble accuracy
using a genetic algorithm, Knowledge-Based
Systems, vol. 110, 2016, pp. 86-97.
[18] X. Jian, C. Wu and H. Guo, Forest Pruning
Based on Branch Importance, Computational
Intelligence and Neuroscience, 2017.
[19] S. Bernard, S. Adam and L. Heutte, Dynamic
random forests, Pattern Recognition Letters,
vol. 33, no 12, 2012, pp. 1580–1586.
[20] E. Tripoliti, D. Fotiadis and G. Manis,
Dynamic Construction of Random Forests:
Evaluation using Biomedical Engineering
Problems, IEEE Society, 2010.
[21] V. Kulkarni, P. Sinha and A. Singh, Heuristic
Based Improvements for Effective Random
Forest Classifier, Proceedings of International
Conference on Computational Intelligence,
Chennai, India, 2012.
[22] L.I. Kuncheva and C. J. Whitaker, Measures of
diversity in classifier ensembles and their
relationship with the ensemble accuracy,
Machine Learning, vol. 51, 2003, pp. 181–
207.
[23] L. Breiman, Arcing classifiers, The Annals of
Statistics, vol. 26, no. 3, 1998, pp. 801–849.
[24] P. Geurts, D., Ernst and L. Wehenkel,
Extremely randomized trees, Machine
Learning, vol. 63, no. 1, 2006, pp. 3-42.
[25] J. Demsar, Statistical comparisons of
classifiers over multiple data sets, Journal of
Machine Learning Research, vol. 7, 2006, pp.
1–30.
[26] P. Somol, P. Pudil and J. Kittler, Fast branch
and bound algorithms for optimal feature
selection, IEEE transactions on Pattern
Analysis and Machine Intelligence, vol. 26,
no. 7, 2004, pp. 900–912.
[27] S. Taleb Zouggar and A. Adla, A New
Function for Ensemble Pruning, In F. Dargam
et al. (Eds): ICDSST 2018, LNBIP 313,
Springer International Publishing AG, 2018,
pp. 181-190.
[28] Taleb Zouggar, S., Adla, A., 2018.A
Diversity-Accuracy Measure for Homogenous
Ensemble Selection, International Journal of
Interactive Multimedia and Artificial
Intelligence, 2018.