• Non ci sono risultati.

4.1 Supervised Learning

4.1.5 Evaluation Metrics

4.1 – Supervised Learning

Many problems with the Model Performance Mismatch can be avoided by using a more robust test set. To understand if you are using a suitable test set you need to perform an analysis before using it for evaluation.

However, this analysis is often complex and time-consuming.

4 – Machine Learning: Main Supervised Characteristics and Algorithms

my algorithm never recognizes a positive case, so it does not work! For this reason, for very unbalanced classes, other methods such as F1 score, Recall and Precision are used.

Confusion matrix

It is an array that completely defines the behavior of a model, used in classification problems. In case of binary classification the confusion ma-trix correlates the predicted results with the real ones. Its form is shown in figure 4.2. The terms true and false, positive and negative, refer to the

Predicted Negative Predicted Positive Real Negative True Negative False Positive

Real Positive False Negative True Positive Table 4.2: Confusion matrix for binary classification.

goodness of the forecast after comparing it with the real output in the test set. In detail:

True Positive : positive predicted and positive real value.

True Negative : negative predicted and negative real value False Positive : positive predicted but negative real value.

False Negative : negative predicted but positive real value.

Positive is usually understood as a minority value to which we tend to assign the value 1 (in other words, the action that happens rarely has a value of 1). You can calculate the previously defined accuracy with the use of the confusion matrix in this way:

Accuracy = T rue positive + T rue negative

T rue positive + T rue negative + F alse positive + F alse negative

= T rue positive + T rue negative T otal number of prediction Recall and Precision

These are two measures based on the values of the confusion matrix. They are especially useful in the case of unbalanced datasets, when the mea-surement of accuracy is not adequate as described above. The definition of precision and recall is as follows:

P recision = T rue positive

T rue positive + F alse positive = T rue positive n. predicted positive

4.1 – Supervised Learning

Recall = T rue positive

T rue positive + F alse negative = T rue positive n. actual positive These are two measures that are often calculated for the minority class but their definition can be modified to use it also with the majority class.

Accuracy is the reliability of the prediction. When precision tends to one for sure when the algorithm predicts a positive case the real output will also be positive. On the contrary when the precision tends to zero if the algorithm predicts positive the real value will almost certainly be negative. Therefore the more the precision tends to the value one, the higher the reliability is.

The recall instead indicates the amount of positive values found by the algorithm on the total of the real positive values. When the recall tends to one, the algorithm finds almost all the values that are actually positive.

If it tends to zero, however, the algorithm finds almost no positive values among the real ones.

There are cases where you have high precision and low recall: in these cases you can trust when the algorithm predicts one, because it will also be a real positive, but there will be many unforeseen positives. On the contrary, there are cases in which you have low precision and high recall:

the algorithm predicts many positives that will actually be negatives, but on the number of forecasts the algorithm includes almost all the real positive values. The reliability is low but it can be said to be a particularly conservative case.

F1 Score

Since precision and recall are often in contrast, this value represents the condensation of two metrics into one number. It is often defined in various ways but the most common method is as follows:

F 1 = 2 P R P + R

Where P and R represent precision and recall values. The relation be-tween precision and recall is usually non linear, for this reason is a good idea to use this single performance metric. Sometimes there are situatins in which the goal is to reach a good precision instead of a high recall or vice versa. So the better choice is to apply these metrics case by case.

Mean absolute error

The metric now analyzed is again a generic metric for both classification and regression problems that indicates the difference between the predic-tion and the real output, in practice it is a measure of the error of the

4 – Machine Learning: Main Supervised Characteristics and Algorithms

average prediction for all the examples of the training set. It is defined in the following way:

M AE = 1 m

m

X

i=1

|yi− ˜yi| i = 1,2, . . . , m

Mean squared error

Very similar to the previous one but it takes into consideration the square of the difference between true output and forecast, then it averages on the number of examples in the training set. The advantage compared to the previous metric is in the easy use for the search of the gradient because with this shrewdness of the square the calculation of the derivative is simplified a lot. The computational calculation is faster. In addition, the power of two enhances the larger errors making the algorithm more efficient for the main losses. The form is as follows:

M SE = 1 m

m

X

i=1

(yi− ˜yi)2 i = 1,2, . . . , m