Up-sampling - convolutional neural networks for statistical post-processing of wind gusts speed

For our project we decided to balance the training data-set’s class distribution, focusing on a down-sampling approach. For sure will be of great interest in the future, to investigate also the use of up-sampling techniques. The Generative Adversarial Networks (GANs) have demonstrated good skills in generating synthetic images really similar to the real ones. This architecture can be applied also to our weather forecasting problem, to generate synthetic examples, related to extreme events. Our input data have similarities, with images’ data format. GANs could be able to simulate the spatial patterns, that characterize examples, related to extreme events.

A Pearson Correlation Coefficient

The Pearson correlation coefficient, is the ratio between the covariance of two random variables and the product of their standard deviations. It can be considered as the normalized version of the covariance and it takes values between -1 and 1. As the covariance this coefficient, can only identify the level of linear correlation between two random variables, which can still be highly non-linear correlated.

Using a sample from two random variables X and Y , the Pearson coefficient can be expressed in the following way:

rxy=

i=1(x_i−¯x)(y_i−¯y)

√

nPn

i=1(x_i−¯x)√

nPn i=1(y_i−¯y)

where:

• n is the number of entries in the sample

• xi and y_i are the i^thentries

• ¯x and ¯y are the samples’ means

B Predictors that characterize the NWP deterministic fore-casts, we have worked with

1. 10 meters wind speed

2. Specific humidity height above the ground 2 3. Relative humidity height above the ground 2 4. Surface roughness height above the ground 810 5. Surface roughness height above the ground 811 6. Momentum flux height above the ground 760 7. Momentum flux height above the ground 770 8. Momentum flux height above the ground 0

9. AROME hail diagnostic height above the ground 0

10. Lifting condensation level (LCL) height above the ground 0 11. Level of free convection (LFC) height above ground 0 12. Level of neutral boyancy (LNB) height above the ground 0 13. Vertical velocity hybrid 32

14. TKE hybrid 47

15. Convective cloud cover height above the ground 16. Low cloud cover height above the ground 0 17. Medium cloud cover height above the ground 0 18. ’High cloud cover height above the ground0 19. Wind components height above ground 10 20. wind components isobaric hPa 925 21. wind components isobaric hPa 850 22. Specific humidity isobaric hPa 850

23. Pressure height above sea 0

24. Geopotential height isobaric hPa 850 25. Geopotential height isobaric hPa 700 26. Geopotential height isobaric hPa 500

27. momentum of gusts out of the model height above the ground 10

C Variables considered during our Feature Selection and Cross-Validation procedure

1. 10 meter wind speed

2. Specific humidity height above the ground 2 3. Relative humidity height above the ground 2 4. Surface roughness height above the ground 810 5. Momentum flux height above the ground 770 6. AROME hail diagnostic height above the ground 0

7. Lifting condensation level (LCL) height above the ground 0 8. Vertical velocity hybrid 32

9. momentum of gusts out of the model height above the ground 10 10. High cloud cover height above the ground 0

11. Geopotential height isobaric In hPa 500 12. TKE hybrid 47

D Heaviside Function

The Heaviside step function or unit step function, is a discontinuous function. The output of the Heaviside step function is 1, when the input is positive, is 0 when the input is negative.

H(x) = 1 if x ≥ 0 H(x) = 0 if x ≤ 0

The Heaviside step function can be defined using also a different threshold, instead of 0.

E Binary Cross Entropy Loss

The Binary Cross Entropy Loss can be defined as:

B =_N¹ PN

i=1−((yilog (pi)) + (1 − yi) log (1 − pi)) where:

• B is the Binary Cross Entropy Loss

• N is the batch-size parameter.

• pi is the probability predicted by the Network for the i^thsample. Remember that in output from our Network, using the Sigmoid Activation Function, we have the probability for a sample to belong to the positive class or class 1 (We are considering a binary classification problem, classes can be 0 or 1).

• yi is the class to which the i^thsample actually belongs to. The classes can be 0 or 1.

The logarithm is used because it offers less penalty for small differences, and a high penalty for large differences. The probabilities are between 0 and 1 so the log values are negative. To compensate we take into consideration the negative average.

F F1-score

The F1-score is obtained combining two other metrics: precision and recall. The F1-score can be defined as the harmonic mean of precision and recall:

F 1 = 2 precision∗ recall precision+recall

where:

• Precision is the fraction of true positives examples, among the ones that the model classifies as positives

• Recall is the fraction of total examples, classified as positives, among the total number of positive examples

Differently from other metrics like the Brier Score or the Accuracy Score, the F1-Score, in a Binary Classification Problem Scenario, gives us a general idea, about the model’s performances on both classes

G Brier Score

The Brier Score is used to measure the accuracy of probabilistic predictions. The Brier Score can be applied to evaluate the performances of a model, which outputs are probabilities, assigned to a set of mutually exclusive discrete categories or classes. The Brier score is a cost function. Given a set of examples i ∈ 1....N the Brier Score measures the mean squared difference between, the predicted probabilities, for the set of possible outcomes of each entry i and the actual outcomes. Therefore lower is the Brier Score for a set of predictions, better the predictions are calibrated. Given a Binary Classification problem, the Brier Score can be defined in the following way:

BS =_N¹ PN

i=1(pi− yi)² where:

• pi is the predicted probability for the i^thexample.

• yi is the actual outcome, which can have value 0 or 1 (We are considering a binary classification problem).

• N is the number of examples we are considering.

This definition can be used only if we are considering binary events (for example wind gusts speed above the threshold or not).

H Results of the Cross-validation and Feature Selection

pro-cedure

Table 2: results using 1 and 2 variables

Features Best hyper-parameters Best mean f1-score

10m wind speed batch-size=256 initial learning-rate=1⁻⁴ 0.7540336762603642 Specific humidity height above the ground 2 batch-size=128 initial learning-rate=1⁻⁴ 0.651250810629865 Relative humidity height above the ground 2 batch-size=128 initial learning-rate=1⁻⁴ 0.616317447733764 Surface roughness height above the ground 810 batch-size=128 initial learning-rate=1⁻⁴ 0.6725195086718044 Momentum-flux height above the ground 770 batch-size=256 initial learning-rate=1⁻⁴ 0.7435976906486009 AROME hail diagnostic above the ground 0 batch-size=128 initial learning-rate=1⁻⁴ 0.30845142739782294 Lifting condensation level above the ground 0 batch-size=128 initial learning-rate=1⁻⁴ 0.5603352928567604 Vertical velocity hybrid 32 batch-size=128 initial learning-rate=1⁻⁴ 0.5596010925674355 TKE hybrid 47 batch-size=128 initial learning-rate=1⁻⁴ 0.6407367440801214 High cloud cover height above the ground 0 batch-size=256 initial learning-rate=1⁻⁴ 0.18960008101843293 momentum of gusts out of the model 10m batch-size=256 initial learning-rate=1⁻⁴ 0.7575750413510286 Geopotential height isobaric In hPa 500 batch-size=128 initial learning-rate=1⁻⁴ 0.6107486121171731

momentum of gusts out of the model 10m

10m wind speed batch-size=256 initial learning-rate=1⁻⁴ 0.7623877996958339 momentum of gusts out of the model 10m

Specific humidity height above the ground 2 batch-size=128 initial learning-rate=1⁻⁵ 0.7490948508936706 momentum of gusts out of the model 10m

Relative humidity height above the ground 2 batch-size=128 initial learning-rate=1⁻⁴ 0.7679143075093817 momentum of gusts out of the model 10m

Surface roughness height above the ground 810 batch-size=256 initial learning-rate=1⁻⁴ 0.7649268149280378 momentum of gusts out of the model 10m

Momentum-flux height above the ground 770 batch-size=256 initial learning-rate=1⁻⁴ 0.759729343574921 momentum of gusts out of the model 10m

AROME hail diagnostic above the ground 0 batch-size=128 initial learning-rate=1⁻⁴ 0.7395058162522723 momentum of gusts out of the model 10m

Lifting condensation level above the ground 0 batch-size=256 initial learning-rate=1⁻⁴ 0.7514691283026709 momentum of gusts out of the model 10m

Vertical velocity hybrid 32 batch-size=256 initial learning-rate=1⁻⁴ 0.7219321396504131 momentum of gusts out of the model 10m

TKEhybrid47 batch-size=128 initial learning-rate=1⁻⁴ 0.7449201414722832 momentum of gusts out of the model

High cloud cover above the ground 0 batch-size=256 initial learning-rate=1⁻⁴ 0.7449969263495483 momentum of gusts out of the model 10m

Geopotential height isobaric In hPa 500 batch-size=128 initial learning-rate=1⁻⁴ 0.762371842161128

I Categorical Cross Entropy Loss

The Categorical Cross Entropy loss is used for multi-class classification problems and is defined for each example in the following way:

i=1yi∗ log ˜yi

where:

• n is the number of classes

• ˜yi is the model output for class i, or using a Softmax activation function, the probability for the input example to belong to class i.

• yi is the target value for class i

If we use a Softmax activation function the target value for the class to which the example belongs to is 1, while the target value for the other classes is 0. Talking about the Quantized Softmax Technique the target value for the bin to wich the input example falls into is 1, while the target value for the other bins is 0.

J CRPS loss

The CRPS loss is defined in the following way:

R∞

−∞(F (y) − H(y − x))²dy where:

• y is the target value

• F (y) is the estimated cumulative distribution function

• H is the Heaviside step function (see appendixD).

The crps loss doesn’t focus on a single point of the distribution, but allows to consider the all distri-bution at the same time.

K Quantile Loss

The quantile loss can be defined as:

Lτ= max[τ (y − z), (τ − 1)(y − z)]

Where:

• y is the target value

• z is the estimated quantile of order τ

References of the Images

[1] url:https://medium.com/analytics-vidhya/machine-learning-intuition-9e20bf6cadf8.

[2] url: https : / / medium . com / @sagar . rawale3 / feature selection methods in machine -learning-eaeef12019cc.

[3] url: https://medium.com/mlearning-ai/training-feed-forward-neural-network-ffnn-on-gpu-beginners-guide-2d04254deca9.

[4] url:https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53.

[5] url:https://brilliant.org/wiki/feedforward-neural-networks/.

[6] url:https://pythonmachinelearning.pro/complete- guide- to- deep- neural- networks-part-2/.

[7] url: https : / / machinelearningmastery . com / rectified linear activation function -for-deep-learning-neural-networks/.

[8] url:https://clay- atlas.com/us/blog/2020/02/03/machine- learning- english- note-relu-function/.

[9] url:https://www.mathworks.com/matlabcentral/mlc- downloads/downloads/24e59eb7-2e7f - 4f5d - 8624 - e8796b617ac6 / c1512698 - fee2 - 4e87 - b57b - 2b8f86930b45 / previews / sigmoid/html/sigmoid_documentation.html.

[10] url:https://help.scilab.org/docs/6.0.0/en_US/tanh.html.

[11] url: https : / / towardsdatascience . com / a comprehensive guide to convolutional -neural-networks-the-eli5-way-3bd2b1164a53.

References

[12] Jos Diepeveen et al. 2015. url:http://euroforecaster.org/newsletter21/07-21-23.pdf.

[13] Sylvain Arlot and Alain Celisse. “A survey of cross-validation procedures for model selection”.

In: Statistics surveys 4 (2010), pp. 40–79.

[14] Jay L Devore. Probability and Statistics for Engineering and the Sciences. Cengage learning, 2011.

[15] Tilmann Gneiting et al. “Calibrated probabilistic forecasting using ensemble model output statis-tics and minimum CRPS estimation”. In: Monthly Weather Review 133.5 (2005), pp. 1098–1118.

[16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

[17] Keunhee Han, JunTae Choi, and Chansoo Kim. “Comparison of statistical post-processing meth-ods for probabilistic wind speed forecasting”. In: Asia-Pacific Journal of Atmospheric Sciences 54.1 (2018), pp. 91–101.

[18] Kaiming He et al. “Deep Residual Learning for Image Recognition”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.

[19] Kaiming He et al. “Deep residual learning for image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.

[20] Kaiming He et al. “Identity mappings in deep residual networks”. In: European conference on computer vision. Springer. 2016, pp. 630–645.

[21] Nikhil Ketkar. “Stochastic gradient descent”. In: Deep learning with Python. Springer, 2017, pp. 113–132.

[22] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. 2017. arXiv:

1412.6980 [cs.LG].

[23] William H Klein and Harry R Glahn. “Forecasting local weather by means of model output statistics”. In: Bulletin of the American Meteorological Society 55.10 (1974), pp. 1217–1227.

[24] Max Kuhn, Kjell Johnson, et al. Applied predictive modeling. Vol. 26. Springer, 2013.

Nel documento convolutional neural networks for statistical post-processing of wind gusts speed (pagine 50-57)