5 Results
5.2 Relation between blue fraction of uWF and climatology in Pino Torinese and at
The model Model_soil_water_balwnce_irrigated can be computationally demanding because it requires several input parameters to be able to run (climatic region, precipitation, ET0, soil conditions, etc.), also, it is not available to all the community of people interested in the investigation of crop water footprint. Therefore, it is worth to investigate an alternative way to obtain green and blue evapotranspiration without relying on all such input parameters. An
effective option can be the implementation of a multi regression analysis, where evapotranspiration is supposed to be dependent by a linear combination of some input parameters, for example precipitation P and reference evapotranspiration ET0, which are the parameters that can be mainly accessible to final users.
So, before jumping into the details of this multi regression analysis, we firstly analyse these two variables, by plotting their time series and checking how their evolution behaves in time.
Figure 5. 8: time series of P and ET0, 1970‐2019 on Pino Torinese
From this plot, we notice that, while ET0 remains approximately constant over time (except for the changes induced by climate change, which generate a mild increasing trend), precipitation pattern looks irregular, with very sharp fluctuations, hence it’s hard to see a clear trend.
Therefore, we apply a statistical inference on their time series, checking the level of significance of their trends. The results we obtain are coherent from the previous expectations: for ET0, |𝑇|>
𝑡 , thus we have a statistically significant trend, while for precipitation, the trend is not statistically significant. In fact, this variable has a too low angular coefficient of the linear regression model with respect to the high standard deviation of the oscillations, therefore it is not possible to state that this variable is characterised by a significant trend.
We now perform a multi regression analysis, trying to estimate evapotranspiration from the historical series of precipitation and ET0. More specifically, we suppose that the ratio of ETb
over ETa can be described with a regressive law, namely
𝐸𝑇
𝐸𝑇 𝐴 𝐴 ∗ 𝑃 𝐴 ∗ 𝐸𝑇
(5. 5)
where P and ET0 are respectively the precipitation and reference evapotranspiration time series in that pixel in the time interval 1970-2019, while 𝐴 , 𝐴 , 𝐴 are a set of constant coefficients.
In order to be able to determine the ratio of the two evapotranspiration parameters, we need to estimate the three coefficients applying the Least Square Method, to minimize the sum of the squared errors. We want to use 𝐸𝑇 𝐸𝑇⁄ instead of 𝐸𝑇 𝐸𝑇⁄ because, looking at the time series of the two ratios, we see that blue over green ET series shows much greater fluctuations with respect to 𝐸𝑇 𝐸𝑇⁄ (see Annexes, Figure 7. 5); since it’s too hard to reconstruct the annual variability of such an unpredictable variable, we focus of the ratio between blue and actual evapotranspiration, which instead exhibits weaker oscillations.
As mentioned before, the three coefficients are determined by minimising the sum of the squared prediction errors, that is by minimising the sum of the squared distances between the predicted and expected data points. To perform this operation, we need to identify a part of the dataset that will be used to “train” the model by finding the most appropriate coefficient of the linear regression and another one that performs the testing of the results. Therefore, the dataset is randomly split into training and testing data, where the training dataset contains 75% of data, while the testing dataset the remaining 25%. Since this operation is performed by a random selection of the data inside the series, we want to repeat this procedure for a certain number of times (in this case, we opt for 200 iterations), in order to have a stronger evaluation of the prediction error of the model. Then, the training set such as the testing set are identified: in this case, the training and testing X correspond to P and ET0, while the Y is associated to the ratio between ETb and ETa. Afterward, the Least Square Method is applied on the training dataset and the three coefficients 𝐴 , 𝐴 , 𝐴 are determined, having values of -1.5, -8.7*10-5 and 3.3*10-3 respectively. At this point, the three coefficients are used on the testing dataset, to create an estimation of the ratio 𝐸𝑇 𝐸𝑇⁄ , which is then compared with the true values (ytest) in a scatter plot, and the RMSE (Root Mean Square Error) between predicted and expected data is computed for each of the 200 iterations, with the equation
𝑅𝑀𝑆𝐸 𝑦 𝑦
𝑛
(5. 6)
where 𝑦 is the predicted 𝐸𝑇 𝐸𝑇⁄ ratio applying the regression model and 𝑦 is the expected 𝐸𝑇 𝐸𝑇⁄ ratio.
Figure 5. 9: scatter plot of predicted ETb/ETa ratio by the linear regression versus expected ETb/ETa (200 iterations)
From the scatter plot, we can see that the model provides rather well aligned results, considering that the dataset contains a limited number of data (only 50 data, as 50 are the years of the time-interval) and the only input parameters are P and ET0. By plotting the histogram of the error, we see that the pattern looks quite as a normal distribution, with the mean of RMSE corresponding almost to the peak of the curve, between 0.093 and 0.1, a mean coefficient of variation of RMSE that is around 0.34 and a coefficient of determination (R2) which is 0.41. In addition, looking at the three coefficients of the parametric equation, it can be clearly observed that the coefficient 𝐴 , that refers to precipitation, has a lower weight with respect to 𝐴 , the one of ET0, in fact the first has an order of 10-5 – 10-4, while the second a dimension of 10-3. The analysis can be extended at the spatial scale, thus aiming at the estimation of the spatial heterogeneities of the ratio ETb/ETa over all the world’s cells of a single year. We extract the values of ETg, ETb, ET0 and precipitation of year 2000 in all the pixels containing irrigated areas, then the four arrays are grouped in a single matrix named variables. At this point, the procedure for the creation of the regression model becomes equal to the one done for Pino Torinese, therefore with the random splitting of the dataset into training and testing, the determination of the three coefficients 𝐴 , 𝐴 , 𝐴 (which account for 5.3*10-3, -1.8*10-4, 4.13*10-4 respectively) and the estimation of ET ratio. The scatter plot between true versus expected data is then represented; it can be seen that there is a less clear correspondence between expected versus predicted data, particularly due to the very high amount of data. This
time, the estimation provides worse results of mean cv_RMSE (0.80), while the R2 is slightly higher, around 0.43. The analysis may be further deepened by performing a spatio-temporal regression, that is by estimating the temporal evolution of ETb/ETa ratio for all the pixels of the world, however, from the already obtained results, we can state that the best fitting results are found in the temporal variability rather than in the spatial one. In fact, the estimation of green and blue evapotranspiration at the spatial level must take into account different climatic regions and different crop growing periods, therefore the only information of annual precipitation and ET0 is not able to describe the irrigation requirements. Instead, by performing the multi-regression analysis at the temporal scale, there is no variability of such variables, therefore we can have a slightly better estimation of the irrigation needs.
Figure 5. 10: scatter plot of expected versus true ETb/ETa ratio in all the pixels of the world, 2000