5.2 Approccio proposto per la progettazione di Multi Classificatori
5.2.3 Algoritmo 3
L’Algoritmo 3 è un ibrido tra gli Algoritmi 1 e 2 in quanto usa per la gestione dei conflitti elementi usati in entrambi i metodi. Come per gli Al- goritmi 1 e 2, durante la prima fase si escludono i classificatori che hanno erroneamente classificato alcune istanze nella regione locale. Successivamen- te, si considera l’insieme dei classificatori rimasti. I conflitti vengono risolti nel modo seguente:
• Rimangono più classificatori. I conflitti vengono gestiti come nell’Al- goritmo 2.
• Non rimane nessun classificatore di base. I conflitti vengono gestiti come nell’Algoritmo 1.
Anche in questo caso, si può usare come criterio di competenza quello basato sulla recall o quello basato sull’accuratezza.
Algoritmo 1
1: function Select the Best Classifiers based on Validation and Training Results(I, V S, T S, C, S, Option) . Where I is an instance of the test set; VS - Validation Set; TS - Training Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY)
Output: C∗the most promising classifier for I
2: P = C; B = 0; B1= 0
3: Exclude from P the classifiers that have misclassified some instances in S
4: if P contains one classifier then
5: C∗= P
6: end if
7: if P is empty AND (Option == RECALL) then
8: Compute the class h of majority instances in S
9: Put in B the classifiers from C with the highest Recall on h in VS
10: if B contains more classifiers then
11: Put in B1the classifiers with the best Recall on h in TS
12: if B1 contains more classifiers then
13: C∗= the classifier with the best Mean Absolute Error on TS in B1
14: else 15: C∗= B1 16: end if 17: else 18: C∗= B 19: end if 20: end if
21: if (P is empty) AND (Option == ACCURACY) then
22: Put in B the classifiers from C with the highest Accuracy in VS
23: if B contains more classifiers then
24: Put in B1the classifiers from B with the best Accuracy in TS
25: if B1contains more classifiers then
26: C*= the classifier with the best Mean Absolute Error on TS in B1
27: else 28: C∗= B1 29: end if 30: else 31: C∗= B 32: end if 33: end if
34: if (P contains more classifiers) AND (Option == RECALL) then
35: Compute the class h of majority instances in S
36: Put in B the classifiers from P with the highest Recall on h in VS
37: if B contains more classifiers then
38: Put in B1the classifiers from B with the best Recall on h in TS
39: if B1contains more classifiers then
40: C∗= the classifier from with the best Mean Absolute Error on TS in B1
41: else 42: C∗= B1 43: end if 44: else 45: C∗= B 46: end if 47: end if
48: if (P contains more classifiers) AND (Option == ACCURACY) then
49: Put in B the classifiers from P with the highest Accuracy in VS
50: if B contains more classifiers then
51: Put in B1the classifiers from P with the highest Accuracy in TS
52: if B1 contains more classifiers then
53: C∗= the classifier from with the best Mean Absolute Error on TS in B1
54: else 55: C∗= B1 56: end if 57: else 58: C∗= B 59: end if 60: end if 61: return C∗ 62:end function
Algoritmo 2
function Select the Best Classifiers based on Validation Results(I, V S, C, S, Option). Where I is an instance of the test set; VS - Validation Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY)
Output: C∗the most promising classifier for I
2: P = C; B = 0; B1= 0
Exclude from P the classifiers that have misclassified some instances in S
4: if P contains one classifier then C∗= P
6: end if
if P is empty AND (Option == RECALL) then
8: Compute the class h of majority instances in VS
Put in B the classifiers from C with the highest Recall on h in VS
10: if B contains more classifiers then
C∗= the classifier with the best Mean Absolute Error on VS in B
12: else C∗= B
14: end if end if
16: if (P is empty) AND (Option == ACCURACY) then
Put in B the classifiers from C with the highest Accuracy in VS
18: if B contains more classifiers then
C*= the classifier from with the best Mean Absolute Error on VS in B
20: else C∗= B
22: end if end if
24: if P contains more classifiers AND (Option == RECALL) then Compute the class h of majority instances in S
26: Put in B the classifiers from P with the highest Recall on h in VS if B contains more classifiers then
28: C∗= the classifier from with the best Mean Absolute Error on VS in B else
30: C∗= B
end if
32: end if
if (P contains more classifiers) AND (Option == ACCURACY) then
34: Put in B the classifiers from P with the highest Accuracy in VS if B contains more classifiers then
36: C∗is the classifier from with the best Mean Absolute Error on VR in B else 38: C∗= B end if 40: end if return C∗ 42:end function
Algoritmo 3
1: function Select the Best Classifiers based on Hybrid Method(I, V S, T S, C, S, Option) . Where I is an instance of the test set; VS - Validation Set; TS - Training Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY) Output: C∗the most promising classifier for I
2: P = C; B = 0; B1= 0
3: Exclude from P the classifiers that have misclassified some instances in VS
4: if P contains one classifier then
5: C∗= P
6: end if
7: if P is empty AND (Option == RECALL) then
8: Compute the class h of majority instances in S
9: Put in B the classifiers from C with the highest Recall on h in VS
10: if B contains more classifiers then
11: Put in B1the classifiers with the best Recall on h in TS
12: if B1contains more classifiers then
13: C∗= the classifier with the best Mean Absolute Error on TS in B1
14: else 15: C∗= B1 16: end if 17: else 18: C∗= B 19: end if 20: end if
21: if (P is empty) AND (Option == ACCURACY) then
22: Put in B the classifiers from C with the highest Accuracy in VS
23: if B contains more classifiers then
24: Put in B1the classifiers from B with the best Accuracy in TS
25: if B1contains more classifiers then
26: C*= the classifier from with the best Mean Absolute Error on TS in B1
27: else 28: C∗= B1 29: end if 30: else 31: C∗= B 32: end if 33: end if
34: if (P contains more classifiers) AND (Option == RECALL) then
35: Compute the class h of majority instances in VS
36: Put in B the classifiers from P with the highest Recall on h in VS
37: if B contains more classifiers then
38: C∗= the classifier from with the best Mean Absolute Error on VS in B
39: else
40: C∗= B
41: end if
42: end if
43: if (P contains more classifiers) AND (Option == ACCURACY) then
44: Put in B the classifiers from P with the highest Accuracy in VS
45: if B contains more classifiers then
46: C∗= the classifier from with the best Mean Absolute Error on VS in B
47: else 48: C∗= B 49: end if 50: end if 51: return C∗ 52:end function
Validazione sperimentale
Il capitolo descrive il protocollo sperimentale e i risultati ottenuti appli- cando a dataset pubblici gli approcci di classificazione descritti nel Capitolo 5. In particolare, vengono presentati e sintetizzati tutti i test eseguiti su ciascun dataset e le valutazioni che ne sono scaturite, fornendo un’analisi oggettiva della metodologia utilizzata. I test eseguiti mirano a verificare la bontà degli approcci proposti in termini di accuratezza di classificazione. I dataset utilizzati si riferiscono a importanti problemi decisionali in ambito medico-clinico. Gli approcci sono stati implementati in JAVA utilizzando le API (Application Programming Interface) del software WEKA (Waikato Environment for Knowledge Analysis). Il Capitolo è organizzato come segue. Nella Sezione 9.1 una panoramica dei dataset mentre nella Sezione 9.2 una panoramica dei classificatori utilizzati. La Sezione 9.3 descrive il protocollo sperimentale utilizzato per l’esecuzione dei test. La Sezione 9.4 mostra inve- ce i risultati ottenuti sui vari dataset per ciascun insieme di classificatori di base utilizzato.
6.1
Dataset
I sei dataset utilizzati per testare gli approcci proposti sono pubblici e disponibili su UCI [7]. Tutti i dataset si riferiscono a importanti problemi
decisionali dell’ambito medico-clinico, come di seguito descritto. Un riepilogo è mostrato in Tabella 6.1.
Tabella 6.1: Dataset
Dataset Num. Istanze Num. Attributi Num.Classi WDBC 569 30 2 WBC 699 9 2 WPBC 198 33 2 Cleveland 303 13 2 Mammographic Mass 961 5 2 Dermatology 366 34 6