Back to Lucy’s homepage

Download Diversity Toolbox

Diversity in Classifier Ensembles

See:

[1] Kuncheva L.I. Combining Pattern Classifiers. Methods and Algorithms, Wiley, 2004.

[2] Kuncheva L.I., C.J. Whitaker. Measures of diversity in classifier ensembles, Machine Learning , 51 , 2003, 181-207, http://www.bangor.ac.uk/~mas00a/papers/lkml.pdf

 

Contents

Generate the ensemble output

clear all
clc
L = 10;  % number of classifiers
N = 15;  % number of objects
% xx = rand(N,L)>0.5 % produced the following matrix xx:
xx = [
     1     0     0     0     0     0     1     0     0     0
     0     1     0     0     1     0     1     0     0     0
     1     0     1     1     0     1     0     1     1     0
     1     0     0     1     1     1     0     0     1     1
     1     1     0     1     1     1     0     0     0     1
     1     1     1     1     0     1     0     0     0     0
     1     0     1     0     0     0     1     0     1     1
     0     0     0     0     1     0     0     0     0     1
     0     0     1     0     0     1     1     0     0     1
     0     1     0     0     1     0     0     0     0     0
     0     0     0     0     0     1     0     0     1     1
     1     0     1     1     0     0     0     0     0     1
     0     1     0     1     1     0     0     1     1     0
     1     0     1     1     1     1     1     0     1     0
     1     1     1     1     0     0     1     0     0     1
     ];

Calculate the individual accuracies and the majority vote accuracy

% Individual accuracies
individual = sum(xx)/(N);
 
% Majority vote
Pmaj = sum(sum(xx')>floor(L/2))/(N);

Pairwise diversity measures:

qcalc.m, rho.m, disagreement.m, double_fault.m

% Q
QQ = qcalc(xx);
t = mean(mean(QQ));
meanQ = (t*L-1)/(L-1); % average across all pairs of classifiers
 
% rho
R = rho(xx);
tempr = mean(mean(R));
meanrho = (tempr*L-1)/(L-1);
 
% disagreement
d1 = disagreement(xx);
Disagreement = mean(mean(d1))*L/(L-1);
 
% double fault
d2 = double_fault(xx);
DF = mean(mean(d2))*L/(L-1);

Non-Pairwise diversity measures

kw.m, kappa.m, entropy.m, difficulty.m, generalised_diversity.m, coincidence_failure_diversity.m

% Kohavi-Wolpert variance
KW = kw(xx);
 
% kappa
K = kappa(xx);
 
% Entropy
Entropy = entropy(xx);
 
% Difficulty (theta)
theta = difficulty(xx);
 
% GD
GD = generalised_diversity(xx);
 
% CFD
CFD = coincidence_failure_diversity(xx);

Print of the results

fprintf('\n\nDIVERSITY IN CLASSIFIER ENSEMBLES\n')
fprintf('(results with array xx created at random)\n')
fprintf('%s\n\n','----------------------------------')
fprintf('Average individual accuracy %7.4f\n',mean(individual))
fprintf('Majority vote accuracy %7.4f\n',Pmaj)
fprintf('%s\n','----------------------------------')
fprintf(' I. == Paiwrise measures ==\n')
fprintf('%s %7.4f\n','( 1) Q = ',meanQ)
fprintf('%s %7.4f\n','( 2) rho = ',meanrho)
fprintf('%s %7.4f\n','( 3) Disagreement = ',Disagreement)
fprintf('%s %7.4f\n','( 4) Double fault = ',DF)
fprintf('%s\n','----------------------------------')
fprintf('II. == Non-paiwrise measures ==\n')
fprintf('%s %7.4f\n','( 5) KW = ',KW)
fprintf('%s %7.4f\n','( 6) kappa = ',K)
fprintf('%s %7.4f\n','( 7) Entropy = ',Entropy)
fprintf('%s %7.4f\n','( 8) theta = ',theta)
fprintf('%s %7.4f\n','( 9) GD = ',GD)
fprintf('%s %7.4f\n\n','(10) CFD = ',CFD)
 
 
DIVERSITY IN CLASSIFIER ENSEMBLES
(results with array xx created at random)
----------------------------------
 
Average individual accuracy  0.4400
Majority vote accuracy  0.3333
----------------------------------
 I. == Paiwrise measures ==
( 1) Q =   0.0231
( 2) rho =   0.0156
( 3) Disagreement =   0.4889
( 4) Double fault =   0.3156
----------------------------------
II. == Non-paiwrise measures ==
( 5) KW =   0.2200
( 6) kappa =   0.0079
( 7) Entropy =   0.7200
( 8) theta =   0.0264
( 9) GD =   0.4365
(10) CFD =   0.4889