41
3 First results
Before presenting the first results we must introduce the “Loss value”, this is to indicate a sum of the errors made for each example in the training or validation sets. The loss value implies how good or bad a given model behaves after each optimization iteration. Ideally, you would expect loss reduction after each or more iterations. The other variable to present is the number of “epochs”, it is a hyperparameter that defines the number of times the learning algorithm will work through the entire training dataset. An epoch means that each sample in the training dataset had the opportunity to update the parameters of the internal model. Our model was initially trained on 1000 epochs. The latter is a high value that leads to a slowdown in the learning phase of our model, but it is necessary to understand how this behaves and how we can improve and speed it up.
42
Figure 3.1.2: LOSS trend over 1000 epochs with standard model
Finding a good learning rate can be difficult, in fact if you set it too high, the training may diverge. On the contrary, if you set it too low, the training will eventually converge to the optimal, but it will take a lot of time. If the learning will be just above the optimal value, it will progress very quickly at first, but will end to drift around the optimal value, without ever reaching the plateau. In our case we used an adaptive learning rate, using as an activation function RMSProp, but even in this case it may take some time to stabilize. In addition, if you have a limited calculation budget, you will need to stop training before it converges properly, resulting in a non-optimal solution.
From Figures 3.1.1 and 3.1.2, it is noted to increase the speed of our model, we have to stop learning the algorithm on 200 epochs, this to avoid obvious overfitting problems. The ways to avoid overfitting are multiple:
- Retraining the neural networks using the same model, the same training set but with different weights, and then choose the network with the best performance
- Multiple neural networks that see multiple neural networks trained in parallel, same structure but different weights, and at the end make an average of their output for choose the best
- Regularization that for decrease the bias and weighs, adding a term to the error function, witch as result a smooth outputs with a low tendency to overfitting
- Tuning performance ratio that is similar to regularization but use a particular parameter that indicated how much the network need to regularized
- Early stopping that monitoring the error (LOSS) behaviour after each iteration, and stopping the training when overfitting starts.
Obviously we will use something easy to implement with phyton code, where we automatically stop training when the validation result does not improve. We will then use the EarlyStopping callback function that checks the training condition at each epoch. If a given number of epochs pass without showing improvement, then training is automatically stopped. The results of applying the function are shown in the figures 3.1.3 and 3.1.4 and reveal an improvement in both computation time and avoidance of overfitting.
43
Figure 3.1.3: MAE trend over 250 epochs with standard model
The graph shows that the error on the validation set is usually around +/- 1 [cm/s]. A satisfactory result, if we consider that our neural network has not been optimised now. It should also be noted that there is still margin for improvement, because there are still small oscillations around 200 epochs.
Figure 3.1.4: LOSS trend over 250 epochs with standard model
44 Loss value is also decreasing over the 200 epochs. Both training loss and validation loss decrease exponentially as the number of epochs increases, suggesting that the model acquires a high degree of accuracy as our epochs (or number of forward and backward steps) increase. Again, there is room for improvement.
Having seen how our algorithm performs in the training phase, it is now necessary to verify its performance when using our test dataset, remembering that it was not used for the learning and validation phase. Assess the hidden non-linear correlations from our training dataset and evaluate them on our test set. The graph below presents the speed obtained from the test set on the x-axis and on the y-axis the speed predicted by the training.
Figure 3.1.5: Results obtained by testing our standard algorithm
Our model seems to predict well, as evidenced by the many points that fit our regression line well. This shows that we can use deep learning to understand the non-linear relationships of the laminar velocity phenomenon in methane air mixtures. Now let us see from figure 3.1.6 how our error is distributed between actual and predicted value:
Figure 3.1.6: Error distribution obtained by testing our standard algorithm
45 The distribution of our error is similar to a Gaussian. We see that our model fits quite well and does not have high error values between predicted and actual values. There is a larger discrepancy around high velocity values, with a difference that however remains in the range of 10-20 cm/s. In general, we have an error between predicted and actual values that is always below 5 cm/s.