• Non ci sono risultati.

Forecasting the Tourism in Tuscany with Google Trend

N/A
N/A
Protected

Academic year: 2021

Condividi "Forecasting the Tourism in Tuscany with Google Trend"

Copied!
57
0
0

Testo completo

(1)

Abstract

This thesis aims to forecast the number of tourists ar-rivals in Tuscany with the help of the Google Trends data. In the first section, search queries’ data are collected from Google Trends and operated with weights derived from the nationality of tourist arrivals in Tuscany. Informa-tion about the naInforma-tionality of tourist arrivals is obtained from Tuscany Tourism Report in Regional Institute for Economic Planning of Tuscany. Moreover, tourist arrivals data are collected from Regione Toscana’s website.

In the second section, correlation test is performed for the determining the relation levels between Google Trends data and tourism data. This test results can designate the lag between the Google Trends data and Tuscany Region’s tourism data.

In the third section; at first, the tourist arrivals in 2016 has been predicted by using ARIMA model, which is es-timated by tourist arrivals data from Tuscany Region. Then, the tourist arrivals in 2016 are calculated by us-ing Dynamic Regression Model, include the search queries data from Google Trends as a regressor.

Finally, the actual numbers of tourist arrivals in 2016 are discussed and compared with estimated numbers with the ARIMA model and the dynamic regression model.

(2)

Contents

1 Introduction 4

1.1 Background in Forecast of Tourism . . . 4

1.2 Search Engines Data . . . 14

1.3 Tuscany . . . 17

2 Data 21 2.1 Tuscany Tourism Data . . . 21

2.1.1 Unit Root Test . . . 24

2.2 Google Trends Data . . . 25

2.3 Correlation Test . . . 30

2.4 Use of the Internet for Travel and Tourism . . . 33

3 Model 35 3.0.1 Ljung-Box Test . . . 35

3.0.2 ARCH Test . . . 36

3.0.3 Shapiro-Wilk Normality Test . . . 36

3.1 ETS Model . . . 37

3.2 ARIMA Model . . . 41

3.3 Dynamic Regression Model . . . 46

4 Conclusion 50

List of Figures

1 Tourist Arrivals in Tuscany . . . 21

2 Seasonality Graph . . . 22

3 Polar Seasonality Graph . . . 23

4 ACF and PACF Graph . . . 23

5 Seasonality Adjusted Tourist Arrivals Data’s Graphs . . . 24

6 Diagram of Weight Operation . . . 26

7 Time Series Graph of Google Trends Data . . . 27

8 Seasonality Graph of Google Trends Data . . . 28

9 Polar Seasonality Graph of Google Trends Data . . . 28

10 ACF and PACF Graph of Google Trends Data . . . 29

11 Seasonality Adjusted Google Trends Data’s Graphs . . . 29

12 Travel Decision Diagram of Individuals . . . 30

13 Correlation Between the Weighted Google Trends Data and Tourist Arrivals data . . . 31

14 Correlation Between Google Trends and Tourist Arrivals data . . 32

15 The Usage of Internet for Travel and Accommodation (Source: Eurostat) . . . 33

(3)

16 Correlation Between the Weighted Google Trends Data and Tourist

Arrivals data(2010-2016) . . . 34

17 Autocorrelation test for the residuals in ETS model . . . 38

18 Prediction of the ETS Model . . . 40

19 Autocorrelation test for the residuals in ARIMA model . . . 44

20 Prediction of the ARIMA Model . . . 45

21 Autocorrelation test for the residuals in dynamic regression model 48 22 Prediction of the Dynamic Regression Model . . . 50

23 Prediction of the models . . . 51

List of Tables

1 ETS Model Variations . . . 6

2 International Tourist Arrivals(Source: UNWTO) . . . 18

3 International Tourism Receipts(Source: UNWTO) . . . 18

4 Top 5 Tourist Arrivals in Italy at Regional Level(Source: EURO-STAT) . . . 19

5 ADF Test for Tourist Arrivals Data . . . 25

6 ADF Test for Google Trends Data . . . 30

7 ETS model with Tuscany Tourism Data . . . 38

8 Ljung-Box Test Results for ETS Model . . . 39

9 ARCH Test Results for ETS Model’s Residuals . . . 39

10 Shapiro-Wilk Normality Test Results for ETS Model . . . 39

11 ARIMA model with Tuscany Tourism Data . . . 42

12 Ljung-Box Test Results for ARIMA Model . . . 43

13 ARCH Test Results for ARIMA Model’s Residuals . . . 43

14 Shapiro-Wilk Normality Test Results for ARIMA Model . . . 44

15 Dynamic Regression Model with Tuscany Tourism Data . . . 47

16 Ljung-Box Test Results for Dynamic Regression Model . . . 49

17 ARCH Test Results for Dynamic Regression Model’s Residuals . 49 18 Shapiro-Wilk Normality Test Results for Dynamic Regression Model . . . 49

(4)

1

Introduction

Niels Bohr, the famous Danish physicist, said: "Prediction is tough, especially if it is about the future." Based on this quote, this thesis ef-forts to improve forecast accuracy in Tuscany Tourism. Prelusively, prediction of tourism has been discussing with many researchers. If we want to improve forecast accuracy, then we need to know liter-ature. Tourism data are usually formed in time series. There are quite a few models to forecast in time series. Therefore, for the preferable predict, we need the know all this model. Also, we need to be aware of the culture of tourism area.

Travelers have frequently utilized the internet for tourism activities in last few years. Individuals, who have the plan for the tourism activity, search the destinations, hotels, and cities via the internet. These searches leave a trace in the search engines(Google, Yandex, Yahoo). The operators collect those traces and published by web-sites like Google Trends. For this reason, we want to use Google Trends data to improve forecast accuracy in Tuscany Tourism.

1.1 Background in Forecast of Tourism

Importance of tourism is growing over time in economics. Tourism is a valuable parameter for the economics of developing countries. For this reason, many researchers focus on forecasting tourism de-mand in the recent times. As a result of this, several papers focus on this topic. Not only, we examine these studies in this chapter; but also, we investigate the models too. Time series model, which is suitable for time-varying datasets, is generally applied in literature for forecasting in tourism demand.

As a beginning, there are some criterion methods in statistics such as the root mean square error(RMSE), mean absolute percentage

(5)

er-ror(MAPE) and the Akaike information criterion(AIC). RMSE and MAPE are successful methods for calculating the forecast accuracy. The other criterion method, the AIC, shows the quality of the model and provides the correct model selection. These criterion methods can be formulated as[1];

RM SE = q mean(e2 t) (1) pt= 100eytt M AP E = mean(|pt|) (2) AIC = L ∗ (ˆθ, ˆx0) + 2q (3)

Tourism demand is defined as; "The concept of tourism demand originated from the classical definition of demand in economics, namely the desire to possess a commodity or to make use of a ser-vice, combined with the ability to purchase it" by Haiyan Song et al.[2]. Authors investigate tourism demand in Hong Kong. Tourist arrivals and tourist expenditures are defined as two different mea-sures, and these measures are used for prediction of tourism demand. Also, empirical studies on this paper are operated with the general autoregressive distributed lag model. General formulation of this model are defined;

yt= α + θ1yt−1+ β0xt+ β1xt−1+ t (4) The method can be modeled with explanatory variable. Therefore this model is useful for this analysis.

Univariate time series models are typically employed for the pre-diction of tourist arrivals. Lim and Mcaleer[3] forecasted quarterly tourist arrivals to Australia from Hong Kong, Malaysia, and Sin-gapore with use of the various versions of exponential smoothing models(ETS). Researchers’ data focused on the years between

(6)

1975-1999 and researchers compared the models’ results with the help of the RMSE, which is the criterion of the model’s forecast accuracy. Also, Researchers contrast ETS models for different errors; which are additive and multiplicative errors. Hyndman and Athanasopou-los[4] described the general formulation of ETS model as;

Table 1: ETS Model Variations

Furthermore, Kim et al.[5]’s main issue is to evaluate the perfor-mance of several time-series models for forecasting interval. Authors modeled the AR, ARIMA and various version of ETS model by us-ing tourism data, which is comprised of tourist arrival numbers to Hong Kong and Australia. Hereafter, authors match the prediction interval for these methods and prediction intervals for the models,

(7)

except for the autoregressive model, are adequate. Besides, Chu[6] performed the fractionally integrated ARMA model for the forecast-ing the tourist arrivals in Sforecast-ingapore. This new approach is compared with seven benchmark models for MAPE. As a consequence, the new model has a better performance than the benchmark models. Moreover, Shareef and Mcaleer[7] modeled uncertainty in monthly international tourist flows to the Maldives from different sources, which are Italy, Germany, UK, Japan, France, Switzerland, Austria and the Netherlands, by using various version of ARMA model. Ad-ditionally, Univariate time series models are applied as the bench-mark model in this research. Chu[9] focused on inbound tourism in Macau. The researcher utilized a piecewise linear approach for forecasting tourist arrivals to Macau. After, this method tested with three benchmark models, namely autoregressive trend model, seasonal autoregressive integrated moving average(SARIMA) and, its arch-rival fractionally integrated autoregressive moving average models(ARFIMA). Results of this study referred that the piecewise linear approach outperforms all three competitors. Further, Jack-man and Greenidge[8] analyzed tourist arrivals to Barbados from main tourist sources; the USA, the United Kingdom, Canada, and CARICOM. Authors forecasted tourist flows to Barbados by using the multivariate and the univariate form of the structural time series model(STSM) model. For the benchmark model, a seasonal naive model is assigned. For the MAPE criterion, both methods of STSM model are better than the seasonal naive model. Authors described STSM model as;

(8)

lnARRi = f (lnYi, lnPi, T rend, Seasonal, Cycle) (5) Level ⇒ µt= µt−1+ βt−1+ ηt, ηt∼ N ID(0, ση2) (6) Slope ⇒ βt= βt−1+ ζt, ζt∼ N ID(0, σζ2) (7) seasonal ⇒ γt = γt−1+ · · · + γt−s+1+ ωt, ωt∼ N ID(0, σω2) (8) " ψt ψt∗ # = p " cosλc sinλc −sinλc conλc # " ψt−1 ψ∗t−1 # + " κt κ∗t # , t = 1, . . . , T (9)

ARRi denotes arrivals to Barbados, Yi is income and Pi refers the relative destination from source market i. Also, in this equation, level and slope are ingredients of the trend and, the components of the level and slope, ηt and ζt, are not correlated.

STSM model is frequently applied to tourism demand. For instance, Chen et al.[10] operated the STSM model for tourist arrivals to Hong Kong from 20 sources. Also, STSM model has been used for other topics in literature. For instance, Karimu[11] investigates the impact of economic and non-economic factors on gasoline demand with STSM model.

Another version of STSM model is used by Song et al.[12]. The re-searchers combined time-varying parameter(TVP) and STSM. Pre-dictors can be used in this pattern through the instrument of TVP model. The new model, TVP-STSM, is structured for prediction of tourist arrivals in Hong Kong from the primary source markets, which are China, South Korea, the UK and the USA. Researchers

(9)

termed "The TVP-STSM can be represented in the following state space form (SSF): yt= µt+ Ψt+ γt+ XtΓt+ t, t ∼ N ID(0, Ht) (10) µt+1 = µt+ βt+ υt, υt ∼ N ID(0, συ2) (11) βt+1 = βt+ δt, δt∼ N ID(0, σδ2) (12) " ψt+1 ψt+1∗ # = p " cosτ sinτ −sinτ conτ # " ψt ψt∗ # + " ωt ωt∗ # , ωt, ωt∗ ∼ N ID(0, σ 2 ω) (13) γt+1= s−1 X j=1 γt+1− j + κt, κt∼ N ID(0, σκ2) (14) Γt+1= TtΓt+ Rtηt, Γt ∼ N (K1, P1), ηt∼ N ID(0, Qt) (15)

where ytis a univariate time series, decomposed into its unobservable components, including a trend component (µt), a cycle component (ψt), a seasonal component (γt) and an irregular component (t). Xt is a vector of causal variables and Γt is the corresponding vector of coefficients". After that, TVP-STSM model’s estimation results for quarterly tourist arrivals to Hong Kong is compared with seven benchmark models. Eventually, this model’s prediction is better than all the other models.

(10)

Gounopoulos et al.[13] demonstrate the repercussion of the macroe-conomic indicators on tourism. Authors prove the macroemacroe-conomic shocks from the countries of tourist’s origin impact the tourism ar-rivals on Greece. In the light of this idea, authors forecast the tourist arrivals with several methods; vector autoregression(VAR) model, ARIMA model, and various ETS models. VAR model can be formulated[14];

yt = v + A1yt−1+ · · · + Apyt−p+ ut

t = 0, ±1, ±2, . . . , (16)

Also, ARIMA model is clarified as[15];

zt= p+d X i=1 ϕizt−i+ at− q X i=1 θiat−i (17)

Another research about this, Gunter and Önder[16] focused on forecasting tourist arrivals in Vienna. Google Analytics website traffic indicators are assigned as the predictor in this paper. An-alyzers want to improve the forecast accuracy with the help of the big data. Analyzers in this study suggest the big data application on the prediction of tourism demand. Furthermore, researchers re-ferred "The combination of forecast encompassing tests with Bates Granger weights were used in tourism demand forecasting for the first time," therefore this argument increases the value of the sur-vey. Many multivariate(VAR, BVAR, FAVAR and BFAVAR) and univariate(ETS model, MA model, naive model and forecast com-bination methods) methods are implemented for the prediction of tourist arrivals in Vienna. After that, the forecast accuracies of each model are compared to determine the best model.

(11)

al.[35] studies, another article is relevant to tourism, are constituted an autoregressive moving average with exogenous inputs(ARMAX) and threshold autoregressive(TAR) model by adopting the destina-tion marketing organizadestina-tion(DMO)’s web traffic data for estimating hotel demand in Charleston, South Carolina, United States. Also, authors used ARMA model for comparison. ARMAX model can be identified the classical ARMA model with the explanatory variable and, it can be specified as;

yt = α + j X i=0 βixt−i+ µt (18) µt= m X i=1 ρiµt−i+ n X j=1 θjt−j+ t (19)

Besides, the explanatory variable cannot enter directly in the TAR model. Authors used the explanatory variable as an indicator of forecasting system. The model can be defined as;

yt =    φ1+ Pm

i=1φ1iyt−i+ t for xt−1 ≤ Ψ φ2+Pmi=1φ2iyt−i+ t for xt−1 > Ψ

(20)

Consequently, authors compare the result of the models and this result proves the DMO’s data can be beneficial for forecasting of short−run hotel rooms demand. Moreover, Xiang et al.[30] and Liu et al.[19] analyzed hotel customers experience and satisfaction.

Tourism is the significant indicator of the economy. Akal[20] fore-casted Turkey’s tourism revenue with ARMAX model. The searcher suggests; Turkey try to increase tourism revenue by re-ducing the economic damages of post-2001 economic crisis. Turkey enhances the advertisement in this period and, after that, the actual tourist numbers are far beyond the government expectations. Also,

(12)

Turkish soccer team success in The World Cup of 2002 can impress the Turkey tourism.

Disasters affect negatively to tourism. Tourism demand is influ-enced by earthquakes, terrorist attacks, or epidemic. Huang and Min[21] concentrated on the effect of the Jiji earthquake in Taiwan tourism. Jiji earthquake happened on September 21, 1999. After the earthquake, Taiwan tourism decreased dramatically. This pa-per aims to evaluate whether Taiwan tourism has entirely recovered from the crisis or not. Authors employed SARIMA model for this purpose. The empirical result shows the tourist arrivals to Taiwan has not fully recovered from after the 11 months of the earthquake. Another paper on this topic, Eugenio−Martin et al.[22] inspected the effect of the tourism crisis on tourism demand on Scotland. Em-pirical results of the research imply that the different kind of crisis has influenced countries. For example; French tourists are affected by mouth and foot disease crisis, or Germans are severely affected by the September 11 events. Additively, authors manage the STSM model for this research. Furthermore, Goh and Law[23] forecasted the inbound tourism in Hong Kong. Researchers operated multi-plicative seasonal ARIMA with intervention(MARIMA), SARIMA, and eight time-series models for this issue and they compare these ten models. Researchers found MARIMA has the best forecast ac-curacy. Also, the results of this study mentioned; several indicators such as relaxation of the issuance of out-bound visitors visas, the Asian financial crisis, the handover, and the bird flu epidemic, are statistically significant for tourism demand in Hong Kong.

The further indicator is meteorological data for tourism. Álvarez− Díaz and Roselló−Nadal[24] try to increase forecast accuracy for tourism demand with the help of the meteorological explanatory variables. Authors focus on the tourist flows in the Balearic Islands from the United Kingdom. Authors have employed an

(13)

autoregres-sive neural network model for this aim, and ARIMA model is used for comparison as a benchmark model. The forecast estimation of the autoregressive neural network model can be denoted as;

ˆ Yt = φ(β0 + H X h=1 βh· Ψh(αh0+ K X k=1 J X j=1 αhjk· xk,t−j + P X p=1 ϑhp· Yt−p)) (21)

Authors refer, ˆYtdescribes the predicted value, x is explanatory vari-ables, and y is the dependent variable. Also, Ψ(·) and φ(·) identify the transfer functions of the hidden and output levels. The results of this study clarified that; the usage of meteorological variables increases the forecast accuracy for tourism demand. Zhang and Ku-lendran[25] investigate another study of this issue; the impact of the meteorological variables for tourist arrivals to Hong Kong. Results of this paper implied that climatic conditions present an important role for the seasonal fractions in tourist arrivals. Moreover, Kozak et al.[26] have shown the influence of the climate variables in Turk-ish tourism. Consequently, the influence of climatic conditions on tourism demand has been varied from city to city.

Another method used for forecasting tourism demand is singular spectrum analysis(SSA). Hassani et al.[27] conduct the four models; SSA, ARIMA, ETS and, neural networks, for forecasting tourist ar-rivals to the United States. This data consists of monthly tourist arrivals into the United States over the period 1996 to 2012. Com-parision of the empirical results refer, SSA has the best forecast accuracy. Also, SSA model has used for the several subjects. Has-sani[28] has forecasted monthly accidental deaths in the USA. Three benchmark models have been attached for checking the results which are Box-Jenkins SARIMA models, the ARAR algorithm, and the Holt-Winter algorithm. The results imply that SSA model gives the

(14)

better accurate forecast.

Song and Li[29] collected the published papers about tourism de-mand the years between 2000 and 2008. Besides that, authors exam-ine the modeling and forecasting techniques for the tourism demand. As a consequence, authors attempt to decide which model is com-mon or which model forecast accuracy better than all. However, if we consider forecast accuracy is the criterion, none of the models can outperform all the other models.

1.2 Search Engines Data

Search engines data is the relevant indicator for studies in the lit-erature. Therefore, this data is frequently used in papers in recent years. Xiang et al.[30] refer to the search engines as; "Consider the Internet as a virtual "galaxy" with information entities representing various domains. Perhaps no other set of tools have ever been so powerful as search engines in terms of representing the virtual world and, because of this, they shape the way people use the Internet. Search engines such as Google, Yahoo!, and Ask collect a huge num-ber of Web pages and thus, serve as the "Hubble Telescope" with which people access and learn about the entire virtual "galaxy"[31]." With this view; the data, provided from search engines, has the very critical position in future and today’s forecasting models.

Several sectors’ forecast accuracies can be improved with search engines data. Choi and Varian[32] investigate the usage of Google Trends data for different sectors, namely, automobile sales, unem-ployment claims, travel destination planning and consumer confi-dence. Consequently, if the model’s dataset includes Google Trends data, the forecast accuracy can increase for these sectors. Moreover, Fondeur and Karamé[33] used Google data for prediction of French youth unemployment and, Hand and Judge[34] employed Google

(15)

Trends data for calculating the UK cinema admissions.

The internet has a lot of information and search engines organize this information for helping users. Xiang et al.[30] examined the role of the search engines in tourism. Authors imply the content of the internet in people’s lives is increasing, and therefore search engines become more important with time. Consequently, search engines are started to use for Tourism. Even more, travelers have checked the attractions, hotels, flights, and price of the services with search engines and tourists make the travel planning on the internet before they go. Also, another result of this paper shows that the tiny amount of websites have dominated the search results.

Many queries are searched on the internet via search engines by users, and these search queries are collected from search engines for determining the trends on the internet. These trends have been based on the popularity of the searches. Search engines have pub-lished the trends over with the applications such as Baidu Index, Google Trends, and Yandex Wordstat. Yang et al.[35] interpreted the trends in the search engines for guessing the China tourism. Two search engines, Baidu and Google, are differently used for fore-casting visitor volumes to Hainan Province and next, these predic-tions are compared. Firstly authors try to found correlated search queries with the Hainan’s monthly visitor volumes. These searches must have at least one month lag earlier to the arrival month. Five queries in Google suited the conditions, and 17 queries in Baidu met the conditions. The result of this study demonstrates that Baidu data model outperforms the baseline model and the model with Google data. In China, Baidu is more popular than Google, and because of that, this result is not surprising. Before the start of the research, we need the know which search engine is more suitable for our research’s country. Bangwayo-Skeete and Skeete[36] employed the Google Trends as an indicator for prediction of the tourist flow

(16)

to five favorite Caribbean countries, namely, Jamaica, Dominican Republic, Bahamas, St. Lucia, and the Cayman Islands, from three key source markets which are US, Canada, and the UK. Authors forecasted tourist arrivals by use of Autoregressive Mixed-Data Sam-pling(AR-MIDAS), SARIMA, and AR model. Google Trends data assigned as a regressor in AR-MIDAS model. Later, authors com-pare the forecast accuracies, and as a result, AR-MIDAS model performed better than AR model as well as SARIMA model.

Search engines data are utilized with different models in the lit-erature. Rivera[37] suggested the dynamic linear model(DLM) for usage of Google Trends Data. The researcher predicted the hotel registrations in Puerto Rico with Google Trends data. The eco-nomic recession has been continuing in Puerto Rico since 2006. The government on the island has been trying to find ways to increase the economic activity. Therefore this research is valuable for Puerto Rico. Three benchmark models, SARIMA, HW, SNAIVE, assigned for comparison. As a conclusion, the offered method outperformed all the other models.

Search queries are examined in different ways within articles. Li et al.[38] studied the unified search queries dataset for determining to tourist demand in Beijing. Authors looked at correlations between tourist volume and keywords, which are searched on the internet. All the search queries data are added to the model by authors even it has a weak correlation. Researchers did this because they tried to prevent information loss in the model. The proposed model test with two benchmark models and it outperforms the other models. In another study, Artola et al.[39] try to improve forecast accuracy with the help of the Google Trends data. Researchers focus on the tourist flows to Spain. The results of this study imply that the model, which is created with Google Trends data, can be useful in short-terms.

(17)

As stated previously, the value of the forecasting the hotel oc-cupancy is significant for tourism. Therefore search engines data are utilized for forecasting the hotel demand in literature. Also, the accuracy of the forecast for hotel demand has a critical role in hotel management. Pan and Yang[40] forecast the hotel occu-pancy with the help of ARMAX model and the Markov switch-ing dynamic regression model(MSDR). Authors have attached in-dicators, namely, search engine queries, website traffic, and climate conditions, in these two models for predicting hotel occupancy in Charleston County. The while, ARMA model assigned as a bench-mark model in this study and ARMAX model performance is better than ARMA model as well as MSDR model. All these studies in-dicate that search engines data increase the accuracy of prediction. Hence, we utilize the model in deference to the Google Trends data.

1.3 Tuscany

According to World Tourism Organization(UNWTO) report in 2017 [41]; concerning international tourist arrivals, Italy takes the 5th place in the world. Also, for international tourist arrivals receipts, Italy obtain the 6th place in the world. 35.8 million tourists arrived Italy in 2016, and tourist arrivals increased %4 percent compared to 2015. For the receipts; tourists spend 40.2 billion dollars, and this number increased to %2.3 compared to 2015. Therefore we can say, Italy is one of the leading countries for tourist arrivals.

Moreover, the world travel & tourism council report in 2017[42], another report about Italy Tourism, referred; 77.3 billion Euro of GDP, %4.6 of GDP, is directly constituted by travel and tourism in Italy. The total contribution of travel and tourism to GDP (includ-ing broader effects from investment, the supply chain and induced income impacts) was 186.1 billion Euro, %11.1 of GDP, in 2016.

(18)

Million Change(%) Rank Countries 2015 2016 15/14 16/15 1 France 84.5 82.6 0.9 −2.2 2 United States 77.5 75.6 3.3 −2.4 3 Spain 68.5 75.6 5.5 10.3 4 China 56.9 59.3 2.3 4.2 5 Italy 50.7 52.4 4.4 3.2 6 United Kingdom 34.4 35.8 5.6 4 7 Germany 35 35.6 6 1.7 8 Mexico 32.1 35 9.4 8.9 9 Thailand 29.9 32.6 20.6 8.9 10 Turkey 39.5 .. −0.8 ..

Table 2: International Tourist Arrivals(Source: UNWTO)

Billion(US$) Change(%US$) Rank Countries 2015 2016 15/14 16/15 1 United States 205.4 205.9 7.0 0.3 2 Spain 56.5 60.3 −13.3 6.9 3 Thailand 44.9 49.9 16.9 11.0 4 China 45.0 44.4 2.1 −1.2 5 France 44.9 42.5 −22.9 −5.3 6 Italy 39.4 40.2 −13.3 2.0 7 United Kingdom 45.5 39.6 −2.3 −12.9 8 Germany 36.9 37.4 −14.8 1.4 9 Hong Kong(China) 36.2 32.9 −5.8 −9.1 10 Australia 28.9 32.4 −8.2 12.3

(19)

Also, This report mentions that employment is profoundly affected by Tourism. As a result of this, Tourism is a vital parameter in the Italian economy.

Italy has a lot of wine distilleries, great foods, and historical places which are generally protected as world heritage sites. Therefore, Italy is often preferred by travelers. According to Branchini[43], Milan, Venice, Florence, and Rome are most visited destinations in Italy. Additively, researcher referred "The cultural heritage of these cities results from centuries of civilization on the Italian peninsula. From the time of the Roman Empire, wealth flowed into the country, and the population proliferated. After the fall of Rome in the 8th century, many regions’ political powers grew, and new ruling gov-ernments were established with cities as their social, political and economic centers. As a result, Milan, Venice, Florence, and Rome all experienced different periods of prosperity leading to individual identities stemming from traditions, art, and culture largely inde-pendent of each other. Italy’s twenty regions were indeinde-pendent of one another until the country was unified in 1861. These regions

Time 2013 2014 2015 2016 Geo Emilia-Romagna 26, 611, 060 25, 561, 408 26, 939, 588 27, 735, 265 Veneto 20, 658, 970 20, 557, 253 21, 043, 436 21, 430, 727 Toscana 19, 530, 366 19, 996, 574 20, 432, 069 20, 194, 282 Lombardia 14, 660, 168 14, 616, 223 16, 123, 968 14, 904, 885 Lazio 10, 164, 520 10, 133, 418 12, 024, 702 12, 547, 786 Table 4: Top 5 Tourist Arrivals in Italy at Regional Level(Source: EUROSTAT)

have maintained some political autonomy and are roughly the equiv-alent of American states. The current political structure in Italy also includes provincial governments, similar to counties in the United

(20)

States". Italy has 20 regions, and lately, regional level statistics have been becoming more important. Therefore we study Tuscany tourism, which has the 3rd largest tourist flows in Italy.

Tuscany has many attractive places for visitors. Florence, Siena, Pisa are most famous cities in Tuscany. According to Tuscany Re-gion information, Florence is the most visited city in Tuscany. Even so, some of the locals and tourists have complained about crowding in Florence. Therefore Popp[44] examined the positive and negative effect of crowding on tourism by using interviews in Florence. Also, as to statistical data, climate conditions are influenced the tourist arrivals in Tuscany. Morabito et al.[45] investigate the effect of the meteorological conditions on tourist flows in Florence.

Moreover, Tuscany is also one of the most prosperous agricultural regions in Italy. Especially, olive, olive oil, grape, and wine pro-duction is common in Tuscany. These products are world-famous. Hence, travelers prefer Tuscany for agritourism and food tourism. Romano and Natilli[46] and, Brunori and Rossi[47] scrutinized wine tourism in Tuscany. Besides, rural tourism in Tuscany is researched by Randelli and Tortora[48]. Also, Sonnino’s study[49] considered agritourism in Southern Tuscany.

Because of these reasons, prediction of the future tourism is valuable for Tuscany. In consequence, we focus on Tuscany tourism and try to forecast tourist arrivals in Tuscany with high accuracy.

(21)

2

Data

We try to forecast tourist demand in Tuscany with high accuracy in this study. Hereat, our dependent variable in the model is tourist arrivals in Tuscany. We implement Google Trends data as an ex-planatory variable. Also, the first time in literature, we perform weights operation in Google Trends data in this study.

2.1 Tuscany Tourism Data

Tourism data contain the number of monthly tourist arrivals in Tus-cany years between 2005 to 2016. In this dissertation, the tourism data, contain the years between 2005 to 2015, describe the training data and the other part of data, involve the years between 2015 to

(22)

2016, refer the test data(Figure 1). Tuscany tourism data obtained from Tuscany Region’s website. This data created through the in-formation of tourists’ accommodations in Tuscany. Additionally, there was a small decrease in tourist arrivals to Tuscany in 2012. This drop shows the effect of the Euro Debt Crisis in tourism. The big part of tourist arrivals in Tuscany come from Europe, and so, Tuscany tourism is harshly affected by this crisis.

In terms of Figure 2, Figure 3 and Figure 4; Tuscany tourism de-mand is extremely seasonal. Tourist flows have been increasing in summer months. However, Tuscany is not popular for

Figure 2: Seasonality Graph

winter tourism. Besides, this can show the impact of the weather on Tourism. Tuscany has many historical places, and visitors com-fortably travel to these places in good weather conditions. Because of this reason, tourism activities in Tuscany can be preferred in summer by tourists. Further, according to ACF and PACF graphs,

(23)

there is no trend in tourist arrivals data, but for good measure, we operate the unit root test.

Figure 3: Polar Seasonality Graph

Figure 4: ACF and PACF Graph

Before the unit root test, we check the seasonally adjusted data in this argument. Forwhy, the tourist arrivals data have affluently seasonal. The trend in the data can be easily detected with

(24)

sea-sonality adjusted data. Seasonal circle of the tourist arrivals data is equal to 12 regarding the Figure 4. Therefore if we calculate the 12th difference of the tourist arrivals data, we can obtain seasonal-ity adjusted data. Figure 5 shows to seasonally adjusted data time series graph, ACF and PACF graphs. According to ACF and PACF graphs in Figure 5, tourist arrivals data has not trend. Even so, for the absolute score, we operate unit root test.

Figure 5: Seasonality Adjusted Tourist Arrivals Data’s Graphs

2.1.1 Unit Root Test

Unit roots of Tuscany tourism data are checked by Augmented Dickey-Fuller(ADF) test, which is most famous unit root test in literature. For ADF test, null hypothesis demonstrates the data are non-stationary and alternative hypothesis implies data are

(25)

sta-tionary. Consequently, the significant p-value, greater than 0.05, demonstrates non-stationary. Cryer and Chan[50] set ADF test as;

y∗t = γ + θt+ ηyt−1+ ϕ1yt−1∗ + · · · + ϕky∗t−k (22) We consider yt∗ describes the first differenced series. If ˆη close to zero then, the first difference of yt must be operated. If ˆη < 0, it means yt is already stationary.

Results Dickey-Fuller -3.6535

Lag order 0

p value 0.03093 Alternative hypothesis stationary

Augmented Dickey-Fuller Test

Table 5: ADF Test for Tourist Arrivals Data

If we return to our case, Table 5 represent, p-value is smaller than 0.05. Therefore, Tuscany tourism data are stationary, and the first difference of the data is not necessary to take.

2.2 Google Trends Data

Google Trends tender popularity of search queries based on search rate. This popularities are range from 0 to 100. However, Google does not make public its algorithm. Therefore, we do not know clearly, Google adjustment system in Google Trends. However, stud-ies in literature imply, Google Trends data is a valuable indicator. Also, Google Trends data can classify by countries. This section is useful for weight operation. Moreover, Google Trends data can col-lect for different time intervals. This time interval range is change between 2004 to today. Except for these, Google Trends data can categorize by our aim. However, Google refers to; this section is an

(26)

approximation. For this reason, obtained data with this section can be biased.

For this dissertation, we focus on 50 countries in Google Trends data. According to Tuscany tourism reports[51][52][53][54][55][56] [57][58][59][60], these countries’ tourist arrivals contain %96.2 per-cent of inbound tourism in Tuscany. These data are separately achieved from Google Trends with the help of the country section on the website. We collect the data for "Tuscany" search query in Google Trends. After, the weight operation is performed for these data. This operation can be defined as;“These 50 data are combined by using the nationality weights in tourist arrivals to Tuscany, which is calculated from Tuscany tourism reports. In these reports, some of the countries defined as included in the country groupings. For instance, Portugal, Ireland, Greece, and Spain called PIGS in these

Data Countrya Country Groupinga Countryb Countryb ... If country is in country grouping If country is not in country grouping a

The weights of these countries and country groupings are calculated from Tuscany tourism reports

bThe weights of these countries are calculated from the population of these countries

(27)

reports. If this happens, firstly, we used the population of the coun-tries for defining the weights of the councoun-tries in the group. Here-upon, these 50 data, collected from Google Trends in the country section, are compounded by the use of the weights which are ob-tained from Tuscany tourism reports”. Figure 6 refer to the illus-trated version of this operation.

Eventually, Google Trends data created by considering the weights. Figure 7 shows to time series plot for Google Trends data. This data used as an explanatory variable for prediction of tourist flows to Tuscany.

Figure 7: Time Series Graph of Google Trends Data

In the sense of Figure 8, Figure 9 and Figure 10, Google Trends data has seasonality. Especially, trends are increasing in May, June, and July and decreasing in September, October, November, and De-cember. It seems like, tourist arrivals in Tuscany follow the Google Trends data. Therefore, in the future topic, we check lag correlation

(28)

test between Google Trends and tourist arrivals in Tuscany data. Moreover, we need to control the trend of the data. However, this data are highly seasonal. With this, we remove seasonality of Google Trends

Figure 8: Seasonality Graph of Google Trends Data

(29)

Figure 10: ACF and PACF Graph of Google Trends Data

data. Figure 10 imply, the seasonal circle of the Google Trends data is even to 12. Accordingly, seasonality adjusted data are achieved by taking the 12th difference of the Google Trends data. Figure 11 represents, the seasonally adjusted Google Trends data. This figure implies the data has the trend. Because first eleven lags in ACF graph are above the limit in Figure 11. However, to clear this argu-ment, we must check the trend of the data with unit root test.

(30)

ADF test is performed for Google Trends data in Table 6. This result describes this data has no trend, and it is stationary. Still, Figure 11 referred, the data has the trend. However, ADF test gives more reliable information about the trend of data. Because of this, we prefer the ADF test result.

Results Dickey-Fuller -4.6741

Lag order 0

p value 0.01

Alternative hypothesis stationary Augmented Dickey-Fuller Test

Table 6: ADF Test for Google Trends Data

2.3 Correlation Test Deciding to Make the Travel Explore the Destination Hotels and Flights Search Visa and Other Process Travel is Performed

Figure 12: Travel Decision Diagram of Individuals

Figure 12 shows to individuals decision mechanism for making the travel. First of all, visitors give the decision to which city they want to travel, into consideration of their desire. After, travelers try to take information about this city and lately, travelers use the internet for this purpose. We put to use Google Trends like Hubble Telescope for taking information on the internet galaxy. We employ the Google Trends data as a regressor for predicting tourist arrivals

(31)

in Tuscany. In Figure 12, "explore" section happen before the "travel is performed" section. Therefore, there can be some lag between tourist arrivals data and Google Trends data.

Figure 13: Correlation Between the Weighted Google Trends Data and Tourist Arrivals data

"GT_Lag_1" refers to the first lagged version of the weighted Google Trends data and "GT_Lag_2" refers to the second lagged version of the data in Figure 13. Correlation between data of tourist arrivals to Tuscany and the weighted Google Trends data is 0.377. However, correlations of the tourist arrivals data with one month and two months lagged version of the weighted Google Trends data is 0.53 and 0.528. As a consequence, visitors examine Tuscany on the internet for one month or two months before they arrive. This two data has not the strong correlation, but they are moderately correlated. We mention the reason for this problem in next section.

(32)

Furthermore, the correlation between the non-weighted Google Trends data and Tuscany tourist arrivals data are described by Figure 13. This correlation values are worse than to Figure 14’s correlation values. The weight operation enhances the correlation between two data set. So, the weight operation is beneficial for Google Trends data.

Figure 14: Correlation Between Google Trends and Tourist Arrivals data

For the rest, correlation values in Figure 13 and Figure 14 imply visitors try to explore Tuscany before they arrive. This argument proves the correctness of Figure 12.

(33)

2.4 Use of the Internet for Travel and Tourism

The emphasis of the internet in individuals’ lives are becoming more important over time. For this reason, internet users are increasing with the passing of time. Latterly, people search for their plan on the internet before they schedule. Especially, usage of internet in tourism is increasing in recent times. Figure 15 indicate the usage of internet for travel and accommodation. Data of the graph are received from Eurostat. According to Figure 15, the usage of inter-net for travel and accommodation are increased with the significant trend until 2010. After 2010, the graph turns into the permanent form. However, the usage of internet for travel and accommodation decreased on the graph in 2012. Aforementioned before, this de-crease happens because of the Euro Debt Crisis. 2010 is the critical year for the relationship between internet and tourism concerning Figure 15.Consideringly information in Figure 15, we built another correlation test in Figure 16.

Figure 15: The Usage of Internet for Travel and Accommodation (Source: Eu-rostat)

(34)

Figure 16 demonstrates the relation level between Google Trends data and data on tourist arrivals in Tuscany, between 2010 and 2016. Correlations of the tourist arrivals data with Google Trends data, one month lagged Google Trends data and two months lagged Google Trends data are respectively 0.611, 0.784 and 0.755. As is seen, correlation level between the tourist arrivals data and one month lagged Google Trends data increase 0.784 from 0.53. The main reason for this; usage of internet for tourism is raising. There-fore, relation level between internet and tourism is rising year by year.

Figure 16: Correlation Between the Weighted Google Trends Data and Tourist Arrivals data(2010-2016)

However, historical values are valuable for time series. At this moment, the time interval of training data in this dissertation is 2005 to 2015. Otherwise, if we started the training set in 2010, we

(35)

would not have enough samples.

Finally, Google Trends adjust the data for selected the time inter-val. Thence, one point value has different worth with different time intervals in Google Trends. For this reason, the work with Google Trends is challenging. Time intervals in the data must be correctly select.

3

Model

ARIMA, ETS and Dynamic Regression Model are employed for the prediction of tourist arrivals in Tuscany. Also, Pankratz[61] called the Dynamic Regression Model as ARIMAX model. ARIMA and ETS model just fitted with tourist arrival data. These two models are most famous benchmark models in the literature. Therefore, the performance of the dynamic regression model is tested with these models. The dynamic regression model performed using Google Trends and tourist arrivals data. Both of the models, the data between the years of 2005-2015 are assigned as training data and data of 2016 are used for test data. Apart from that, models are selected with AIC criterion. For every method, we forecasted with the best model, which have minimum AIC value. After that, we control the residuals of the models with three tests which are Ljung-Box Test, ARCH test, and Shapiro-Wilk normality test. Besides, before forecasting, autocorrelation test is conducted for these three models. Later, predictions are compared by use of RMSE.

3.0.1 Ljung-Box Test

The Ljung-Box test is a control tool for fitted time series model. It checks the residuals in fitted model and investigates the

(36)

auto-correlations of the residuals. We can define the Ljung-Box test as[62];

Ho = The residuals of the model doesn’t have auto-correlation Hα = The residuals of the model have auto-correlation

Test Statistic; Q = n(n + 2) m X k=1 ˆ rk2 n−k (23)

The length of the time series defines n in equation 23. Besides ˆrk is the computed auto-correlation of the residuals at lag k, and m shows the number of lags being examined.

3.0.2 ARCH Test

Arch(Autoregressive conditional heteroskedasticity) test is the use-ful tool for testing to heteroscedasticity of the model. This test checks the variance of the residuals in the model, and this must be constant for the unbiased model. If the variance of the error is no constant, the model can be heteroscedastic. The distributions of the test can be defined as;

Ho = The variance of the residuals is constant Hα = The variance of the residuals is not constant

3.0.3 Shapiro-Wilk Normality Test

Shapiro-Wilk normality test checks whether the data are typically distributed. Test distributions are described as;

(37)

Ho = The data are normally distributed Hα = The data are not normally distributed

3.1 ETS Model

As stated previously, ETS model is one of the most valuable uni-variate predictors in time series literature. This model used as an estimator in many subjects including tourism. The studies in time series imply; ETS model can be advantageous in non-stationary time series data. Scilicet, ETS model can be better forecast with non-stationary time series data. Our data are non-stationary, but we still include ETS model as one of the benchmark models.

Table 1 shows to variations of ETS models[4]. This model is eval-uated within the univariate time series models. Therefore only one-time series data can be fitted with the model. For this reason, the ETS model in this dissertation just fitted with Tuscany tourism data. We choose the best model by taking into consideration AIC criterion. ETS(Additive Error, None Trend, Additive Seasonality) model has the minimum AIC value for this data. ETS(A,N,A) de-fined;

yt= lt−1+ st−m+ t (24)

lt= lt−1+ αt (25)

st= st−m+ γt (26)

In equation 23, yt describes the forecast value. Equation 24 is the smoothing equation, and yt indicate the seasonality in equation 25. Later, residuals in fitted models are checked. We made residual

(38)

Smoothing Parameters Criterions Values

Model Alpha Gamma AIC AICc BIC

ETS(A,N,A) 0.1338 0.067 −67.16762 −63.02969 −23.92559 Table 7: ETS model with Tuscany Tourism Data

autocorrelation test. Figure 17 demonstrated the residuals time series graph, the residuals distribution graph, and the residuals ACF graph. As is seen in the ACF graph, residuals for lag values are over the limits. Moreover, Figure 17 implies that the residuals are not normally distributed.

Figure 17: Autocorrelation test for the residuals in ETS model

Ljung-Box test is used for finding the relation between residuals and model. This test controls independence level of the residuals.

(39)

The residuals must be independent, if it is not the model can be biased. In Ljung-Box test, null hypothesis signifies the residuals are white noise, and alternative hypothesis demonstrates the residuals are not white noise. Table 8 illustrates the Ljung-Box test result for the ETS model. P value in the table is 8.963e−09. Therefore, the model and residuals have the relevance to the significance level of 0.05. Consequently, the ETS model cannot pass the autocorrelation test for the residuals.

Results

Q 57.917

df 10

P value 8.963e−09 Total lags used 24

Ljung-Box test for ETS(A,N,A) model Table 8: Ljung-Box Test Results for ETS Model

Q(m) of squared series(LM test) Rank-based Test

Test statistic Test statistic

14.54957 23.09583

P value P value

0.2669978 0.02692328

Table 9: ARCH Test Results for ETS Model’s Residuals

Results

P value 0.6312

W 0.99174

Shapiro-Wilk Normality Test for ETS(A,N,A) model Table 10: Shapiro-Wilk Normality Test Results for ETS Model

(40)

Accordingly, the result of the ETS model can be biased. So, the prediction of this model is untrusting. However, we still forecast the tourist arrivals with the ETS model. This model’s forecast result must be biased, and our aim is the test this.

Likewise, Table 9 shows to ARCH test results for residuals. P value is for Q(m) of squared series(LM test) is even 0.2669978, and we cannot reject the null hypothesis. However, the P value is for Rank-based Test is equal 0.02692328 and we reject the null hypothesis for the significance level of 0.05. Therefore ETS model is

heteroscedas-Figure 18: Prediction of the ETS Model

-tic. Besides to this, table 10 refers to the residuals of the model is typically distributed for the significance level of 0.05.

Figure 18 establish the forecasting results of the ETS model. This figure displays the prediction of the tourist arrivals in Tuscany with

(41)

ETS model is admissible for the autumn of 2016. Later in this dissertation, we calculate the RMSE for ETS model and compare with the other models’ RMSE results.

3.2 ARIMA Model

ARIMA model is the other favored method for calculating the fore-sight in literature. Recently this method is operated for testing the other models. Also, ARIMA model contains the ETS model varia-tions. Therefore, we can say ARIMA model is the most famous uni-variate model in time series literature. For this reason, this model is utilized for many sectors. As we mentioned in the first chapter, lot of researchers employed ARIMA model for predicting the tourist arrivals.

ARIMA model is demonstrated as ARIMA(p,d,q). The "AR" part of the ARIMA is referred by "p" in the model. This part specifies how many lag values involve the model from data. The "I" part of the models indicates the trend part of ARIMA. This part describes the how many times difference of the data must be taken. It is referred to by "d" in the model. The last part is the "MA" in the ARIMA model. This part is demonstrated by "q" in the model. This part shows the regression errors in the model.

Seasonal ARIMA model illustrate as ARIM A(p, d, q)(P, D, Q)m. "P","D","Q" imply to autoregressive, difference and moving average terms in seasonality part. The "m" part denotes the period of the season. In monthly data, it takes 12, it takes 1 in yearly data and seasonal data, it takes 4.

(42)

(1 − p X i=1 θiBi)(1 − P X i=1 φi(Bi)m)(1 − B)d(1 − Bm)Dyt= (27) (1 + q X i=1 ϑiBi)(1 + Q X i=1 ϕi(Bi)m)t

Equation 26 represents the backshift notation for ARIMA model.

We pick the best ARIMA model by AIC criterion. ARIM A(3, 0, 1)(1, 1, 2)12 is the leading ARIMA model by AIC criterion for Tuscany tourism

data. As it is mentioned earlier, ARIMA is univariate time series model. Therefore, the ARIMA model can include Tuscany tourism data. This model cannot comprise an exogenous variable.

(1 − θ1B − θ2B2− θ3B3)(1 − φ1B12)(1 − B12)yt= (28) (1 + ϑ1B)(1 + ϕ1B12+ ϕ2B24)t

Equation 27 describes backshift notation of the ARIMA model we use. yt refer the tourist arrivals in Tuscany, and t refer the error term in the model. This model is employed for predicting the tourist flows in Tuscany.

Model ARIMA(3, 0, 1)(1, 1, 2)12

AR1 AR2 AR3 MA1 SAR1

0.3179 0.1956 0.3674 −0.3828 −0.4975

SMA1 SMA2 AIC AICc BIC

0.3492 −0.6383 −325.89 −324.59 −303.59

(43)

After that, autocorrelation test is made for the residuals in fitted ARIMA model. Figure 19 shows the times series graph, ACF graph, and distribution graph for the residuals in the ARIMA model, which we use. Moreover, Table 12 indicates the Ljung-Box test result for the ARIMA model. For the distribution graph, the residuals in the model are normally distributed. One lag is above the limits in ACF graph in the figure. However, p-value in Ljung-Box test result is 0.4722, which is more significant than 0.05. However, p-value in Ljung-Box test result is 0.4722, which is more meaningful than significance level of 0.05. Therefore, ARIMA model passes the autocorrelation test for the residual and we can accept this idea, the model and the residuals have not the relation.

Results

Q 16.739

df 17

p value 0.4722

Total lags used 24

Ljung-Box test for ARIM A(3, 0, 1)(1, 1, 2)12model

Table 12: Ljung-Box Test Results for ARIMA Model

Q(m) of squared series(LM test) Rank-based Test

Test statistic Test statistic

17.60831 16.73023

p value p value

0.1281114 0.16003

Table 13: ARCH Test Results for ARIMA Model’s Residuals

The Ljung-Box test result was satisfied for the residuals in fitted ARIMA model. Also, Table 13 shows to ARCH test results for residuals. P value is for Q(m) of squared series(LM test) is even 0.1281114 and P value is for Rank-based Test is equal 0.16003. This

(44)

Figure 19: Autocorrelation test for the residuals in ARIMA model

Results

P value 0.3198

W 0.9882

Shapiro-Wilk Normality Test for ARIM A(3, 0, 1)(1, 1, 2)12 model

Table 14: Shapiro-Wilk Normality Test Results for ARIMA Model

result implies the null hypothesis cannot be rejected for the signifi-cance level of 0.05. Therefore ARIMA model is not heteroscedastic. Except this, table 14 implies the residuals of the model is normally distributed for the significance level of 0.05. For this reason, we can

(45)

proceed to the next step.

Figure 20: Prediction of the ARIMA Model

Prediction results of fitted ARIMA model are described in Fig-ure 20. The forecasting results of the ARIMA model for winter and autumn seems successful in the figure. For 2016, ARIMA model prediction results look better than the ETS model. Though for the absolute score, models will compare based on the RMSE score in the last section.

(46)

3.3 Dynamic Regression Model

Last two chapter, we introduced ETS and ARIMA model, which are used for the benchmark. These two models are univariate time series model. In this section, we want to insert other information to ARIMA model. In other words, we add the exogenous variable(s) in ARIMA model. This model called ARIMAX model or dynamic regression model in literature.

This model has the "AR","I" and "MA" parts, which are mentioned in ARIMA model chapter. Differently to ARIMA model, ARIMAX model has "X" part. This part describes the exogenous variable(s) in the model.

ARIMAX or dynamic regression model is multivariate time series model. In time series literature, there is a lot of multivariate time series model, especially VAR model. However, our aim and the data are tallies with the dynamic regression model. Therefore we focus on the dynamic regression model in this study.

Dynamic regression model formulated as;

yt = βtxt+ ut (29) (1 − p X i=1 θiBi)(1 − P X i=1 φi(Bi)m)(1 − B)d(1 − Bm)Dyt= (30) (1 + q X i=1 ϑiBi)(1 + Q X i=1 ϕi(Bi)m)t

Backshift notion of the dynamic regression model is described in equation 29. This equation is similar to equation 26, which is ex-pressed in the previous chapter. In equation 29, "yt" modified with ut and for the equation 28, ut = yt − βtxt. As a result of this,

(47)

the exogenous variable is added to the ARIMA model by using the equation 29 into the equation 30.

In this dynamic regression model, which is built for this disser-tation, "yt" demonstrates the tourist arrivals in Tuscany and "xt" denotes the one lagged Google Trends data. We select the suit-able fitted dynamic regression model in the light of AIC criterion. ARIM AX(3, 0, 0)(1, 1, 2)12has the minimum AIC value, and there-fore this is the best-fitted model. This model can be represented as;

(1 − θ1B − θ2B2− θ3B3)(1 − φ1B12)(1 − B12)ut= (31) (1 + ϕ1B12+ ϕ2B24)t

Model ARIMAX(3, 0, 0)(1, 1, 2)12

AR1 AR2 AR3 SAR1 SMA1

0.0598 0.2050 0.4104 −0.4876 0.3015

SMA2 xta AIC AICc BIC

−0.6976 −0.0037 −326.39 −325.09 −304.09

a

Google Trends Data

Table 15: Dynamic Regression Model with Tuscany Tourism Data

We test the residuals in the fitted dynamic regression model with autocorrelation test, as we apply to the other models. Figure 21 indicate the times series graph, ACF graph, and distribution graph for the residuals in the fitted dynamic regression model. According to this figure, the model is normally distributed. One lagged value is close to limit in ACF graph. However, for deciding, we need to glance Table 16. P-value is 0.4916 in Ljung-Box test result. As a reason for this, the residuals in the model are white noise for the significance level of 0.05. In addition to this, Table 17 demonstrates

(48)

the ARCH test results for residuals of the dynamic regression model. P value for Q(m) of squared series(LM test) is even 0.1729977, and P value is for Rank-based Test is equal 0.4196319. Therefore this model is not heteroscedastic for the significance level of 0.05. Also table 18 indicates the residuals of the model is normally distributed for the significance level of 0.05.

(49)

Results

Q 16.458

df 17

p value 0.4916

Total lags used 24

Ljung-Box test for ARIM AX(3, 0, 0)(1, 1, 2)12model

Table 16: Ljung-Box Test Results for Dynamic Regression Model

Q(m) of squared series(LM test) Rank-based Test

Test statistic Test statistic

16.41408 12.329

p value p value

0.1729977 0.4196319

Table 17: ARCH Test Results for Dynamic Regression Model’s Residuals

Results

P value 0.06533

W 0.98124

Shapiro-Wilk Normality Test for ARIM AX(3, 0, 0)(1, 1, 2)12 model

Table 18: Shapiro-Wilk Normality Test Results for Dynamic Regression Model

Residuals have not relation with the model. Therefore we can forecast the tourist number in Tuscany without biased. Figure 22 represents the prediction of the dynamic regression model.

(50)

Figure 22: Prediction of the Dynamic Regression Model

Prediction of tourist arrivals with this model is close to real tourist numbers in Tuscany. However, for clarifying this assumption, we compare all the models in next section.

4

Conclusion

Finally, all models forecast results for 2016 are compared by based on the real tourist numbers in 2016.

(51)

Dynamic Regression Model ARIMA ETS RMSE(h=1) 17408.5 18054.77 6560.39

RMSE(h=3) 46439.94 41698.29 25125.73

RMSE(h=6) 61086.31 72859 83336.37

RMSE(h=12) 56405.56 63551.6 68244.9

Table 19: RMSE results of the models

Figure 23: Prediction of the models

Table 19 shows to RMSE results of the predictions with different horizons. ETS model has the minimum RMSE score for one-month and three-months predictions. However, aforementioned, this model

(52)

can be biased because of the tests results. If we evaluate six-months and one-year predictions, Dynamic Regression has the best model for the RMSE result. Also, Figure 23 describes the time series graph of the forecast results. Accordingly to this figure, dynamic regression model has the closest graph of the real tourist numbers graph.

Consequently, the data, which is provided from Google Trends, can be beneficial for prediction of tourist arrivals. However, this data has some limitations. The primary difficulty of Google Trends is; Google did not publish their algorithm for calculation of trends. Moreover, if the researcher wants to work with Google Trends data, he or she must be select the research area attentively. For instance, Barcelona is the famous city for traveler and footballer. So for Barcelona case, it is very tough to define whether the users are searching on the Google for football or travel. This situation is another obstacle of the data. Even so, the model, which is created with Google Trends data, has excellent forecast result when the model and the working area select correctly.

The method we operated for Google trend data will advisor for other researchers, who desire to study with Google Data. For fur-ther works, the model can be tested with ofur-ther multivariate time series models. Also, more exogenous variables can be added to the model. Moreover, "bootstrapping" can be operated for improving the confidence of the model. Besides, Not only, Google Trends data give the better result for tourism; but also, it will proffer a good result for other sectors.

(53)

References

[1] Hyndman, R., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Fore-casting with exponential smoothing: the state space approach. Springer Science & Business Media.

[2] Song, H., Li, G., Witt, S. F., & Fei, B. (2010). Tourism demand modelling and forecasting: how should demand be measured?. Tourism Economics, 16(1), 63-81.

[3] Lim, C., & McAleer, M. (2001). Forecasting tourist arrivals. Annals of Tourism Research, 28(4), 965-977.

[4] Hyndman, R. J., & Athanasopoulos, G. (2014). Forecasting: principles and practice. OTexts.

[5] Kim, J. H., Wong, K., Athanasopoulos, G., & Liu, S. (2011). Beyond point forecasting: evaluation of alternative prediction intervals for tourist ar-rivals. International Journal of Forecasting, 27(3), 887-901.

[6] Chu, F. L. (2008). A fractionally integrated autoregressive moving average approach to forecasting tourism demand. Tourism Management, 29(1), 79-88.

[7] Shareef, R., & McAleer, M. (2007). Modelling the uncertainty in monthly international tourist arrivals to the Maldives. Tourism Management, 28(1), 23-45.

[8] Jackman, M., & Greenidge, K. (2010). Modelling and forecasting tourist flows to Barbados using structural time series models. Tourism and Hospi-tality Research, 10(1), 1-13.

[9] Chu, F. L. (2011). A piecewise linear approach to modeling and forecasting demand for Macau tourism. Tourism Management, 32(6), 1414-1420. [10] Chen, L., Li, G., Wu, D. C., & Shen, S. (2017). Forecasting Seasonal

Tourism Demand Using a Multi-Series Structural Time Series Method. Journal of Travel Research.

[11] Karimu, A. (2014). Impact of economic and non−economic factors on gaso-line demand: a varying parameter model for Sweden and the UK. OPEC Energy Review, 38(4), 445−468.

[12] Song, H., Li, G., Witt, S. F., & Athanasopoulos, G. (2011). Forecasting tourist arrivals using time-varying parameter structural time series models. International Journal of Forecasting, 27(3), 855-869.

[13] Gounopoulos, D., Petmezas, D., & Santamaria, D. (2012). Forecasting tourist arrivals in Greece and the impact of macroeconomic shocks from the countries of tourists’ origin. Annals of Tourism Research, 39(2), 641-666.

(54)

[14] Lütkepohl, H. (2005). New introduction to multiple time series analysis. Springer Science & Business Media.

[15] Bisgaard, S., & Kulahci, M. (2011). Time series analysis and forecasting by example. John Wiley & Sons.

[16] Gunter, U., & Önder, I. (2016). Forecasting city arrivals with Google An-alytics. Annals of Tourism Research, 61, 199-212.

[17] Yang, Y., Pan, B., & Song, H. (2014). Predicting hotel demand using des-tination marketing organization’s web traffic data. Journal of Travel Re-search, 53(4), 433-447.

[18] Xiang, Z., Schwartz, Z., Gerdes, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfac-tion?. International Journal of Hospitality Management, 44, 120-130 [19] Liu, Y., Teichert, T., Rossi, M., Li, H., & Hu, F. (2017). Big data for big

insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews. Tourism Management, 59, 554-563. [20] Akal, M. (2004). Forecasting Turkey’s tourism revenues by ARMAX model.

Tourism Management, 25(5), 565-580.

[21] Huang, J. H., & Min, J. C. (2002). Earthquake devastation and recovery in tourism: the Taiwan case. Tourism Management, 23(2), 145-154. [22] Eugenio−Martin, J. L., Sinclair, M. T., & Yeoman, I. (2006). Quantifying

the effects of tourism crises: an application to Scotland. Journal of Travel & Tourism Marketing, 19(2-3), 21-34.

[23] Goh, C., & Law, R. (2002). Modeling and forecasting tourism demand for arrivals with stochastic nonstationary seasonality and intervention. Tourism management, 23(5), 499-510.

[24] Álvarez−Díaz, M., & Roselló−Nadal, J. (2010). Forecasting British tourist arrivals in the Balearic Islands using meteorological variables. Tourism Eco-nomics, 16(1), 153-168.

[25] Zhang, H. Q., & Kulendran, N. (2017). The impact of climate variables on seasonal variation in Hong Kong inbound tourism demand. Journal of Travel Research, 56(1), 94-107.

[26] Kozak, N., Uysal, M., & Birkan, I. (2008). An analysis of cities based on tourism supply and climatic conditions in Turkey. Tourism Geographies, 10(1), 81-97.

[27] Hassani, H., Webster, A., Silva, E. S., & Heravi, S. (2015). Forecasting US tourist arrivals using optimal singular spectrum analysis. Tourism Manage-ment, 46, 322-335.

(55)

[28] Hassani, H. (2007). Singular spectrum analysis: methodology and compar-ison.

[29] Song, H., & Li, G. (2008). Tourism demand modelling and forecasting-A review of recent research. Tourism management, 29(2), 203-220.

[30] Xiang, Z., Wöber, K., & Fesenmaier, D. R. (2008). Representation of the online tourism domain in search engines. Journal of Travel Research, 47(2), 137-150.

[31] Castells, M. (2002). The Internet galaxy: Reflections on the Internet, busi-ness, and society. Oxford University Press on Demand.

[32] Choi, H., & Varian, H. (2012). Predicting the present with Google Trends. Economic Record, 88(s1), 2-9.

[33] Fondeur, Y., & Karamé, F. (2013). Can Google data help predict French youth unemployment?. Economic Modelling, 30, 117-125.

[34] Hand, C., & Judge, G. (2012). Searching for the picture: forecasting UK cinema admissions using Google Trends data. Applied Economics Letters, 19(11), 1051-1055.

[35] Yang, X., Pan, B., Evans, J. A., & Lv, B. (2015). Forecasting Chinese tourist volume with search engine data. Tourism Management, 46, 386-397.

[36] Bangwayo-Skeete, P. F., & Skeete, R. W. (2015). Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling ap-proach. Tourism Management, 46, 454-464.

[37] Rivera, R. (2016). A dynamic linear model to forecast hotel registrations in Puerto Rico using Google Trends data. Tourism Management, 57, 12-20. [38] Li, X., Pan, B., Law, R., & Huang, X. (2017). Forecasting tourism demand

with composite search index. Tourism management, 59, 57-66.

[39] Artola, C., Pinto, F., & de Pedraza Garcá, P. (2015). Can internet searches forecast tourism inflows?. International Journal of Manpower, 36(1), 103-116.

[40] Pan, B., & Yang, Y. (2017). Forecasting destination weekly hotel occupancy with big data. Journal of Travel Research, 56(7), 957-970.

[41] UNWTO, T. O. (2017). Tourism Highlights, 2017 edition. World.

[42] WTTC, T. O. (2017). Travel & Tourism Economic Impact 2017 Italy. World.

[43] Branchini, A. (2015). Tourism and Its Economic Impact in Italy: A Study of Industry Concentration and Quality of Life.

(56)

[44] Popp, M. (2012). Positive and negative urban tourist crowding: Florence, Italy. Tourism Geographies, 14(1), 50-72.

[45] Morabito, M., Cecchi, L., Modesti, P. A., Crisci, A., Orlandini, S., Marac-chi, G., & Gensini, G. F. (2004). The impact of hot weather conditions on tourism in Florence, Italy: the summer 2002-2003 experience. Advances in tourism climatology, 12, 158-165.

[46] Romano, M. F., & Natilli, M. (2010). Wine tourism in Italy: New pro-files, styles of consumption, ways of touring. Turizam: znanstveno-strućni ćasopis, 57(4), 463-475.

[47] Brunori, G., & Rossi, A. (2000). Synergy and coherence through collective action: some insights from wine routes in Tuscany. Sociologia ruralis, 40(4), 409-423.

[48] Randelli, F., Romei, P., & Tortora, M. (2014). An evolutionary approach to the study of rural tourism: The case of Tuscany. Land Use Policy, 38, 276-281.

[49] Sonnino, R. (2004). For a ’piece of bread’? Interpreting sustainable de-velopment through agritourism in Southern Tuscany. Sociologia Ruralis, 44(3), 285-300.

[50] Cryer, J. D., & Chan, K. S. (2008). . Springer Texts in Statistics. Time Series Analysis with Applications to R.

[51] Regional Institute for Economic Planning of Tuscany (2007). Turismo & Toscana La congiuntura 2006, Firenze.

[52] Regional Institute for Economic Planning of Tuscany (2008). Turismo & Toscana La congiuntura 2007, Firenze.

[53] Regional Institute for Economic Planning of Tuscany (2009). Turismo & Toscana La congiuntura 2008, Firenze.

[54] Regional Institute for Economic Planning of Tuscany (2010). Turismo & Toscana La congiuntura 2009, Firenze.

[55] Regional Institute for Economic Planning of Tuscany (2011). Turismo & Toscana La congiuntura 2010, Firenze.

[56] Regional Institute for Economic Planning of Tuscany (2012). Turismo & Toscana La congiuntura 2011, Firenze.

[57] Regional Institute for Economic Planning of Tuscany (2013). Turismo & Toscana La congiuntura 2012, Firenze.

[58] Regional Institute for Economic Planning of Tuscany (2014). Turismo & Toscana La congiuntura 2013, Firenze.

(57)

[59] Regional Institute for Economic Planning of Tuscany (2015). Turismo & Toscana La congiuntura 2014, Firenze.

[60] Regional Institute for Economic Planning of Tuscany (2016). Turismo & Toscana La congiuntura 2015, Firenze.

[61] Pankratz, A. (1991). Forecasting with dynamic regression models.

[62] Guthrie, W. F. (2010). NIST/SEMATECH Engineering Statistics Hand-book.

Riferimenti

Documenti correlati

Procedures for fungi and mycotoxin detection are important because if fumonisins content in maize batches is monitored, only uncontaminated grains enter in food chain allowing to

A solution of freshly prepared bromo(methoxy)methyl trimethylsilane (4.24 mmol) (obtained from methoxymethyl trimethylsilane and bromine in CCl4) was slowly added at room

Studies included in the analysis relative to in situ L. monocytogenes inactivation in fermented sausages. Range a/a Reference No. monocytogenes strain Type of fermented sausage

In riferimento alle applicazioni di queste piattaforme in saggi di detection, la rilevazione del fenomeno di binding tra un recettore e il suo analita viene misurato sulla base

With this motive, the international symposiums of the Global Science Institute, which have been organised in collaboration with the Mediterranean University, in

[1] Kadima, E., Delvaux, D., Sebagenzi, S.N., Tack, L., Kabeya, S.M., (2011), Structure and geological history of the Congo Basin: an integrated interpretation of gravity, magnetic

The preliminary Johansen test confirms one rank of cointegration almost always for the confidence indicator on future orders and to a minor extent for the composite