Algoritmo 3 - Approccio proposto per la progettazione di Multi Classificatori

5.2 Approccio proposto per la progettazione di Multi Classificatori

5.2.3 Algoritmo 3

L’Algoritmo 3 è un ibrido tra gli Algoritmi 1 e 2 in quanto usa per la gestione dei conflitti elementi usati in entrambi i metodi. Come per gli Al- goritmi 1 e 2, durante la prima fase si escludono i classificatori che hanno erroneamente classificato alcune istanze nella regione locale. Successivamen- te, si considera l’insieme dei classificatori rimasti. I conflitti vengono risolti nel modo seguente:

• Rimangono più classificatori. I conflitti vengono gestiti come nell’Al- goritmo 2.

• Non rimane nessun classificatore di base. I conflitti vengono gestiti come nell’Algoritmo 1.

Anche in questo caso, si può usare come criterio di competenza quello basato sulla recall o quello basato sull’accuratezza.

Algoritmo 1

1: function Select the Best Classifiers based on Validation and Training Results(I, V S, T S, C, S, Option) . Where I is an instance of the test set; VS - Validation Set; TS - Training Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY)

Output: C∗the most promising classifier for I

2: P = C; B = 0; B1= 0

3: Exclude from P the classifiers that have misclassified some instances in S

4: if P contains one classifier then

5: C∗= P

6: end if

7: if P is empty AND (Option == RECALL) then

8: Compute the class h of majority instances in S

9: Put in B the classifiers from C with the highest Recall on h in VS

10: if B contains more classifiers then

11: Put in B1the classifiers with the best Recall on h in TS

12: if B1 contains more classifiers then

13: C∗= the classifier with the best Mean Absolute Error on TS in B1

14: else 15: C∗= B1 16: end if 17: else 18: C∗= B 19: end if 20: end if

21: if (P is empty) AND (Option == ACCURACY) then

22: Put in B the classifiers from C with the highest Accuracy in VS

23: if B contains more classifiers then

24: Put in B1the classifiers from B with the best Accuracy in TS

25: if B1contains more classifiers then

26: C*= the classifier with the best Mean Absolute Error on TS in B1

27: else 28: C∗= B1 29: end if 30: else 31: C∗= B 32: end if 33: end if

34: if (P contains more classifiers) AND (Option == RECALL) then

35: Compute the class h of majority instances in S

36: Put in B the classifiers from P with the highest Recall on h in VS

37: if B contains more classifiers then

38: Put in B1the classifiers from B with the best Recall on h in TS

39: if B1contains more classifiers then

40: C∗= the classifier from with the best Mean Absolute Error on TS in B1

41: else 42: C∗= B1 43: end if 44: else 45: C∗= B 46: end if 47: end if

48: if (P contains more classifiers) AND (Option == ACCURACY) then

49: Put in B the classifiers from P with the highest Accuracy in VS

50: if B contains more classifiers then

51: Put in B1the classifiers from P with the highest Accuracy in TS

52: if B1 contains more classifiers then

53: C∗= the classifier from with the best Mean Absolute Error on TS in B1

54: else 55: C∗= B1 56: end if 57: else 58: C∗= B 59: end if 60: end if 61: return C∗ 62:end function

Algoritmo 2

function Select the Best Classifiers based on Validation Results(I, V S, C, S, Option). Where I is an instance of the test set; VS - Validation Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY)

Output: C∗the most promising classifier for I

2: P = C; B = 0; B1= 0

Exclude from P the classifiers that have misclassified some instances in S

4: if P contains one classifier then C∗= P

6: end if

if P is empty AND (Option == RECALL) then

8: Compute the class h of majority instances in VS

Put in B the classifiers from C with the highest Recall on h in VS

10: if B contains more classifiers then

C∗= the classifier with the best Mean Absolute Error on VS in B

12: else C∗= B

14: end if end if

16: if (P is empty) AND (Option == ACCURACY) then

Put in B the classifiers from C with the highest Accuracy in VS

18: if B contains more classifiers then

C*= the classifier from with the best Mean Absolute Error on VS in B

20: else C∗= B

22: end if end if

24: if P contains more classifiers AND (Option == RECALL) then Compute the class h of majority instances in S

26: Put in B the classifiers from P with the highest Recall on h in VS if B contains more classifiers then

28: C∗= the classifier from with the best Mean Absolute Error on VS in B else

30: C∗= B

end if

32: end if

if (P contains more classifiers) AND (Option == ACCURACY) then

34: Put in B the classifiers from P with the highest Accuracy in VS if B contains more classifiers then

36: C∗is the classifier from with the best Mean Absolute Error on VR in B else 38: C∗= B end if 40: end if return C∗ 42:end function

Algoritmo 3

1: function Select the Best Classifiers based on Hybrid Method(I, V S, T S, C, S, Option) . Where I is an instance of the test set; VS - Validation Set; TS - Training Set; C is a pool of classifiers; S the local region; Option = (RECALL or ACCURACY) Output: C∗the most promising classifier for I

2: P = C; B = 0; B1= 0

3: Exclude from P the classifiers that have misclassified some instances in VS

4: if P contains one classifier then

5: C∗= P

6: end if

7: if P is empty AND (Option == RECALL) then

8: Compute the class h of majority instances in S

9: Put in B the classifiers from C with the highest Recall on h in VS

10: if B contains more classifiers then

11: Put in B1the classifiers with the best Recall on h in TS

12: if B1contains more classifiers then

13: C∗= the classifier with the best Mean Absolute Error on TS in B1

14: else 15: C∗= B1 16: end if 17: else 18: C∗= B 19: end if 20: end if

21: if (P is empty) AND (Option == ACCURACY) then

22: Put in B the classifiers from C with the highest Accuracy in VS

23: if B contains more classifiers then

24: Put in B1the classifiers from B with the best Accuracy in TS

25: if B1contains more classifiers then

26: C*= the classifier from with the best Mean Absolute Error on TS in B1

27: else 28: C∗= B1 29: end if 30: else 31: C∗= B 32: end if 33: end if

34: if (P contains more classifiers) AND (Option == RECALL) then

35: Compute the class h of majority instances in VS

36: Put in B the classifiers from P with the highest Recall on h in VS

37: if B contains more classifiers then

38: C∗= the classifier from with the best Mean Absolute Error on VS in B

39: else

40: C∗= B

41: end if

42: end if

43: if (P contains more classifiers) AND (Option == ACCURACY) then

44: Put in B the classifiers from P with the highest Accuracy in VS

45: if B contains more classifiers then

46: C∗= the classifier from with the best Mean Absolute Error on VS in B

47: else 48: C∗= B 49: end if 50: end if 51: return C∗ 52:end function

Validazione sperimentale

Il capitolo descrive il protocollo sperimentale e i risultati ottenuti appli- cando a dataset pubblici gli approcci di classificazione descritti nel Capitolo 5. In particolare, vengono presentati e sintetizzati tutti i test eseguiti su ciascun dataset e le valutazioni che ne sono scaturite, fornendo un’analisi oggettiva della metodologia utilizzata. I test eseguiti mirano a verificare la bontà degli approcci proposti in termini di accuratezza di classificazione. I dataset utilizzati si riferiscono a importanti problemi decisionali in ambito medico-clinico. Gli approcci sono stati implementati in JAVA utilizzando le API (Application Programming Interface) del software WEKA (Waikato Environment for Knowledge Analysis). Il Capitolo è organizzato come segue. Nella Sezione 9.1 una panoramica dei dataset mentre nella Sezione 9.2 una panoramica dei classificatori utilizzati. La Sezione 9.3 descrive il protocollo sperimentale utilizzato per l’esecuzione dei test. La Sezione 9.4 mostra inve- ce i risultati ottenuti sui vari dataset per ciascun insieme di classificatori di base utilizzato.

6.1 Dataset

I sei dataset utilizzati per testare gli approcci proposti sono pubblici e disponibili su UCI [7]. Tutti i dataset si riferiscono a importanti problemi

decisionali dell’ambito medico-clinico, come di seguito descritto. Un riepilogo è mostrato in Tabella 6.1.

Tabella 6.1: Dataset

Dataset Num. Istanze Num. Attributi Num.Classi WDBC 569 30 2 WBC 699 9 2 WPBC 198 33 2 Cleveland 303 13 2 Mammographic Mass 961 5 2 Dermatology 366 34 6

Nel documento Innovativi multiclassificatori con appliazione ai sistemi di supporto alle decisioni cliniche (pagine 102-107)