• Non ci sono risultati.

Description and discussion of resulting LUR models

6.3 Development of a LUR model for the Province of Parma

6.3.2 Description and discussion of resulting LUR models

Figure 33 – Spatial distribution of residential addresses in the Parma municipality, together with passive sampler locations and definition of the urban area where to apply model B.

roads within 50 m adds 10.4% of explained variance to the model. The presence of industrial land use and buildings determine minor improvements in the model R2. The quality tests for the model were good and the R2LOO was 83.1% (Figure 34). Model A and model C shares 80% of the observations.

Table 27 – LUR models resulting from the supervised stepwise regression algorithm. ΔR2 is the gain in adjusted R2 obtained adding the variable to the model.

LUR model

Adjusted

R2 Predictors Coefficients p-values ΔR2

All sites (A) 90.3%

indu500.5000 1.77E-06 <0.001 70.4%

altitude.sq -7.93E-01 <0.001 +9.5%

majload25 3.97E-05 <0.001 +4.4%

indu500 3.56E-05 <0.001 +3.2%

bvol100 5.62E-05 <0.001 +1.3%

distmaj -1.67E-02 <0.001 +1.4%

(intercept) 2.73E+01 <0.001 -

Urban (B) 83.1%

majlen25 1.48E-01 <0.01 29.3%

indu2000 9.98E-06 <0.001 +14.6%

majload25 2.78E-05 <0.01 +9.1%

bvol100 6.03E-05 <0.01 +5.2%

ldres2000 2.33E-05 <0.001 +11.5%

majlen25.300 4.74E-03 <0.01 +5.6%

urbgreen2000 -9.12E-06 <0.01 +4.2%

bhgt25 6.12E-01 <0.01 +3.7%

(intercept) 4.45E+00 0.43 -

Extra-urban

(C) 86.8%

altitude.sq -8.55E-01 <0.001 69.2%

alload50 1.50E-05 <0.001 +10.4%

indu2000 7.85E-06 <0.001 +4.8%

distmaj -1.12E-02 <0.01 +1.3%

bvol100 4.15E-05 <0.01 +1.0%

(intercept) 2.94E+01 <0.001 -

Table 28 – Quality tests for the three LUR models of Table 27.

Model Multicollinearity Residual normality

Influent observations

Residual autocorrelation

Leave-one-out cross validation

All sites

(A) all VIF< 2.3

NO (Shapiro-Wilk p<0.01)

NO (max Cook's D

= 0.32)

NO (Moran I= 0.03,

p=0.14)

R2LOO = 88.6%

ERRLOO= [-25,+18]

RMSELOO=6.1 FAC2=96.6%

Urban

(B) all VIF< 2.6

YES (Shapiro-Wilk p=0.28)

NO (max Cook's D

= 0.74)

NO (Moran I= 0.02,

p=0.09)

R2LOO = 72.2%

ERRLOO= [-18,+11]

RMSELOO=6.9 FAC2=100.0%

Extra-urban

(C) all VIF< 1.6

YES (Shapiro-Wilk p=0.07)

NO (max Cook's D

= 0.81)

NO (Moran I= -0.01,

p=0.30)

R2LOO = 83.1%

ERRLOO= [-14,+16]

RMSELOO=4.9 FAC2=95.0%

VIF: variance inflativo factor, R2LOO= R2 between predicted and measured values, ERRLOO= range of errors between predicted and measured values, RMSELOO= root mean square error, FAC2 = fraction of predictions within a factor of 2 of observations

Figure 34 – Leave-one-out cross validation (LOO-CV) for models A, B and C. The scatter plots represents the comparison between concentrations measured at site n and concentration modeled with the N-1 model.

R2=determination coefficient , RMSE=root mean square error, FAC2= fraction of predictions within a factor of two of observation;

The models obtained in this case study have R2 and predictors comparable to many previously published models (Beelen et al., 2013; Dons et al., 2014; Hoek et al., 2008). Models developed in the ESCAPE project had adjusted R2 values varying between 55% and 92% and included 2 to 7 predictors, among which traffic intensity within 100 m or less was the more common traffic variables. Also the LOO-CV performance of models in Parma was comparable to those of the ESCAPE experience, where R2LOO where generally less than 10% lower than model R2 (Beelen et al., 2013).

The two models B and C performed badly in the external validation (Figure 35, Figure 36). This was expected, since they are calibrated on very different contexts. The urban model (B) overpredicts the majority of the concentrations of the extra-urban passive samplers (Figure 35): 61% of the predicted values are within a factor of two of observations (FAC2) and the R2 between observed and predicted values is 0.25. Truncation of predictors worsens the performance of model B and the R2 decreases to 0.12. Concentrations predicted with the extra-urban model (C) are more uniformly dispersed around the 1:1 line (Figure 36), with two outliers that are highly overestimated by the model. The R2 between measure and predicted values is 0.13 for model C, but rise to 0.47 if predictors are truncated. Thus, the performance of model C in predicting urban concentrations is better than the performance of model B in predicting concentrations in the extra-urban area, especially with truncated predictors, although the performance remains weak. Overall, model performance in in-sample cross validation (LOO-CV) was better than model’s performance in out-of-sample external validation.

Figure 35 – Application of the urban model B to predict NO2 concentrations at the extra-urban sampling sites.

Model B is applied with original (sx) and truncated (dx) predictors.

R2=determination coefficient , RMSE=root mean square error, FAC2= fraction of predictions within a factor of two of observation;

Figure 36 – Application of the extra-urban model C to predict NO2 concentrations at the urban sampling sites.

Model C is applied with original (sx) and truncated (dx) predictors.

R2=determination coefficient , RMSE=root mean square error, FAC2= fraction of predictions within a factor of two of observation;

The purpose of LUR modelling is not to study causal relationship between single predictors and NO2

concentrations, but rather to obtain models usable to predict concentrations in unsampled locations, i.e.

LUR models are not explanatory but predictive models (Sainani, 2014; Shmueli, 2010). The magnitude of association and statistical significance of each single predictor are not the focus of LUR analysis.

Nevertheless, it is important that the predictors included in each model have a scientific rationale and it is interesting to analyze the differences between predictors that were selected in the three models.

The variability of NO2 concentrations over the whole Province (model A) is almost totally characterized by the variable describing industrial land use within a buffer of 5 km and altitude above

sea level. It is unlikely that these two predictors can correctly represent the short-range variability in NO2 concentrations in the Parma urban area. It is also improbable that the presence of industries can explain such a large fraction of NO2 variability in the Province of Parma: more probably the predictor indu5000 is a proxy for other pollution sources, since it is highly correlated with many other variables describing traffic (e.g., majload1000, alload500) and population (e.g. pop5000, pdens2000, res5000).

The model developed on the urban samplers (B) is based on predictors with small-scale variability (e.g. road length and traffic load within 25 m, land use within 2 km, building characteristics within 100 m and 25 m). Interestingly, the model incorporates two predictors that may explain typical urban phenomena: (i) the presence of urban green areas, which reduces local air pollution and (ii) the average building height within 25 m, which may indicate the presence of some urban canyon effect in the city center (Eeftens et al., 2013).

In the extra-urban model (C) the altitude a.s.l. represent the most important predictor, indicating the presence of a strong North-South gradient in NO2 concentrations over the region. Altitude a.s.l. is correlated with many other predictors (e.g., Pearson’s correlation with hdres5000, indu5000, disthload and pop5000 are respectively -0.66, -0.66, 0.72, -0.57), thus it represent a general indicator for the North-South gradient in human activities. Given the altitude a.s.l., small-scale variation in traffic load determines another relevant quote of NO2 variance.

Overall, the difference in the predictors selected for the urban (B) and extra-urban models (C), together with the poor performance of these models in the external validation, suggests caution when transferring LUR models between different spatial scales and environmental contexts. Previous studies have analyzed the issue of LUR transferability between different cities (Allen et al., 2011) or countries (Vienneau et al., 2010), suggesting some concern about the use of LURs that are not developed locally.

Here I showed that also the difference between in-city and outside-city determinants in air pollution limits the spatial transferability of a LUR models. One of the issues that limits transferability is sometimes the unavailability of homogeneous GIS databases or the use of different pollution sampling methodologies. In this case studies both GIS data and NO2 sampling method were homogeneous over the entire study area.

This case study has some limitations. First of all the passive samplers used to develop the models were not positioned for this specific purpose, but must be considered as routine monitoring data. I used the ESCAPE methodology to develop the LUR models, although this method originally requires an accurate definition of sampling locations. Thus, pollution variability inside the urban area and across the region may have not been fully characterized. On the other hand, locations of passive samplers were chosen by ARPA-PR to be representative of populated areas and this is an important aspect to consider when using LUR to estimate population exposure.

The model B obtained with the ESCAPE procedure is probably overfitted: the proportion between observation and predictors (i.e. 28:8) is very low (Babyak, 2004). Basagaña et al. (2012) highlighted the poor value of LOO-CV in-sample validation with respect to LUR model ability to predict out-of-sample concentrations. Indeed, when used to predict extra-urban concentrations (Figure 36) model B performed poorly. It is nevertheless impossible to separate the effect of overfitting from the error deriving from the application of the model to a different environmental context.