The visual narrative of Venice: an analysis of the touristic photographs in social media

(1)

— Ca’ Foscari Dorsoduro 3246 30123 Venezia

Università

Ca’Foscari

Venezia

Master’s Degree programme — Second Cycle

(D.M. 270/2004)

in Informatica — Computer Science

Final Thesis

The visual narrative of Venice: an

analysis of the touristic photographs in

social media

Supervisor

Ch. Prof. Andrea Torsello Candidate

Eric Boscaro

Matriculation number 835651 Academic Year

(2)

(3)

3

(4)

(5)

Abstract

"The popularity and diffusion of Social media have been growing constantly in the last years, making the automatic understanding of the giant amount of data produced fundamental in discovering recurrent patterns and other important information. While a huge body of work can be found in the liter-ature on the topic of extracting ‘mood’ information about a topic from textual information, very little work has been done on the problem of automatically analyzing the visual content of images in social media. In this thesis the images retrieved from social media are used to analyze how Venice is rep-resented in touristic photographs in different times of the year and in its different areas (sestieri). To this end, using techniques mutuated from the object classification literature, we built a classifier able to distinguish new photos’ category, and analyzed the variation of class distribution in space and time, thus providing an quantitative characterization of the visual nar-rative of Venice in social media."

(6)

(7)

List of Figures

3.1 Sift Keypoint in an image . . . 10

3.2 Sift Descriptors: spatial histogram of the image gradient . . 11

3.3 Canonical SIFT descriptor and spatial binning functions . . . 12

3.4 Geometry of the Dsift descriptors . . . 15

3.5 The SVM hyperplane example . . . 19

4.1 First category, Lagoon landscape . . . 27

4.2 Second category, Townscape . . . 27

4.3 Third category, Art . . . 28

4.4 Fourth category, Folklore . . . 28

4.5 Fifth category, Food . . . 29

4.6 Sixth category, Variessl . . . 29

4.7 The flow chart of the classification construction phases and recognition . . . 31

4.8 A singular feature . . . 33

4.9 Test of the rate of correctness, changing parameter K . . . . 35

4.10 Example of a class Histogram . . . 36

4.11 LinearSvm, test of parameter C . . . 37 vii

(12)

4.12 Comparison of the Confusion Matrix with parameter M=1

and M=2 . . . 40

4.13 Test 1, correctly classified Folklore image . . . 42

4.14 Test 2, correctly classified Food image . . . 43

4.15 Test 3, correctly classified Townscape image . . . 44

4.16 Test 4, incorrectly classified Townscape image . . . 45

4.17 Test 5, incorrectly classified Folklore image . . . 46

5.1 Comparison: Images Retrieved and Touristic Affluence in Venice on years 2014-2015 . . . 52

5.2 Quantitative representation of the categories over the months of the years 2014-2015 . . . 53

5.3 Normalized representation of the data categories over the months of the years 2014-2015 . . . 54

5.4 Normalized representation of Folklore over the months of the years 2014-2015 . . . 56

5.5 Normalized representation of Lagoon Landscape over the months of the years 2014-2015 . . . 57

5.6 Normalized representation of Townscape over the months of the years 2014-2015 . . . 58

5.7 Normalized representation of Art over the months of the years 2014-2015 . . . 59

5.8 Normalized representation of Food over the months of the years 2014-2015 . . . 59

5.9 Heat Map of Lagoon Landscape category, years 2014-2015 . 61 5.10 Heat Map of Townscape category, years 2014-2015 . . . 62

(13)

LIST OF FIGURES ix

5.12 Heat Map of Folklore category, years 2014-2015 . . . 63

5.13 Heat Map of Food category, years 2014-2015 . . . 64

5.14 Heat map of Varies vs the other classes, 2014 . . . 65

5.15 Heat map of Varies vs the other classes, 2015 . . . 65

5.16 Folklore category, Carnival vs After Carnival . . . 68

5.17 Heat map of the Carnival vs After Carnival periods, 2015 . . 69

5.18 Lagoon and Food, Carnival vs After Carnival . . . 70

A.1 Normalized Rate of the categories over the Venetian Districts, 2014 . . . 77

A.2 Normalized Rate of the categories over the Venetian Districts, 2015 . . . 78

(14)

(15)

List of Tables

4.1 Test on the parameters C and γ on the SVC with rbf kernel . 39

4.2 Votes and probabilities of Test 1 . . . 42

5.1 Quantitative data of the year 2014 . . . 50

5.2 Quantitative data of the year 2015 . . . 50

5.3 Touristic presence in Venice during the years 2014 and 2015 51 5.4 Normalized distribution of the data over the months of the year 2015 . . . 54

5.5 Normalized distribution of the data over the months of the year 2015 . . . 54

5.6 Normalized category rates, Carnival . . . 67

5.7 Normalized category rates, after Carnival . . . 68

A.1 Category results Carnival 2015 . . . 79

A.2 Category results after Carnival 2015 . . . 79 xi

(16)

(17)

Chapter 1 Introduction

In this work we will show how to construct a classifier on the pictures of Venice obtained from Social media using an approach inherited from text classification called “Bag of Visual Words”, and how the results can be ana-lyzed to discover interesting patterns in the behavior of the tourists visiting the lagoon city.

1.0.1 Structure of the thesis

The thesis is organized as follows:

1. In chapter 2 we present a summary of the historical approaches to the problem, focusing then on the concept of “Bag of Words” and similar strategies,

2. In chapter 3 we provide to the reader an overview of the “Bag of Word” model, how it can adapted to computer vision problems focusing in particular on the algorithm used as “SIFT” for Feature extraction, “K-means” for Clustering and “SVM” for Classification,

(18)

3. In chapter 4 we discuss the classification algorithm constructed, de-scribing in detail every issues discovered and the performance results of each phase,

4. In chapter 5 we present the the results of classifier applied on the case of study of the districts of Venice during the year 2014 and 2015, and in particular on the Carnival of 2015,

5. In chapter 6 we report some concluding considerations and discuss some possible future developments.

(19)

Chapter 2 The State of Art

Extracting meaningful information from images and using them to learn “scene categories” has always been an important topic of research since the birth of machine learning and computer vision research fields, in this chap-ter an overview of the main different methods will be done, focusing then on the “Bag of Keypoints” approaches and in particular on the work of Csurka and Dance [9].

The efforts in the first years of research were focused on the categoriza-tion of specific patters in images like “faces” [23] or peoples [24], then as in the notable work of Scheneiderman, Kanade [26](cars and pedestrian) and Fergus, Perona, Zisserman [12] many different elements were included in the categorization.

From these years the effort on the topic grown exponentially and many different the techniques for visual categorization of images were implemented however the great majority of those works can be approximated to four main approaches: the first presented is Fine-Grained Recognition, in this set of methods the objective is to distinguish between particular subset of

(20)

gories like car models [28] or dog breeds [16], the key is to localize important details and represent them as global clues, since describing overall features like shapes and colors cannot capture little differences, to distinguish be-tween different classes is needed a dictionary of “fine grain” informations. Many difficult problems such as distinguish between flower species [22] or fungus [10] can addressed using this methods, however in this case of study such “fine grain” distinction is not necessary since the categories chosen are quite different one from the other so a different approach has been preferred. Another technique called Annotation-based uses articulated input from humans such as asking question about the objects and clicking on particu-lar attributes in a method called “human-in-the-loop” [32] to create human customized feature vectors to perform visual classification. This method is used in cases of study of very detailed class categorization and usually the human performing the tasks are domain experts, for this reason this kind of approach is not ideal in a case of completely automatic learning where the adaptability of the algorithm is a key factor.

The idea behind Template-based approach is to create feature response-map by matching images with a large set of image templates randomly gen-erated, it has been show how using those predefined templates as “filters”, a classifier that performs well can be created for tasks such as object recogni-tion [18] or body parts recognirecogni-tion [20].

The final approach considered the state of art for solving a general Visual Categorization problem is Bag of Visual Words or Bag of Keypoints, motivated by Bag of Words learning methods used with success for text-classification by Joachims[15] and Tong [30], it has been adapted to solve the problem of image categorization: this approach consists of using clustering on descrip-tors of particular images patches to successively obtain an histogram of the

(21)

5

number of occurrences of images patterns and using it as feature vector for the classification process. The method, that will be explained in detail in the next chapters has been used as base to perform visual categorization using then different classification approaches: in the work of Sudderth, Toralba, Freeman and Willsky [29] after the “Bag of keypoints” construction phase a Transformed Dirichlet model is used to categorize, Sivic, Russle, Efros, Zis-serman and Freeman [27] instead use a Latent Semantic Analysis and Latent Dirichlet Allocation models while Fei-Fei and Perona [11] utilize a Bayesian Hierarchical Method.

In more recent works a Bag of Visual Words model paired with a Pyra-mid match kernel introduced by Grauman and Darell [13] showed promising results [33], this particular algorithm based on the idea of mapping the BoW features to multi-dimensional multi-resolution histograms, has the main ad-vantages of a linear complexity (the other methods are generally quadratic or worst) and the fact that the multi-resolution histograms have the ability to capture co-occurring features; despite those great advantages the more solid, state of art solution of SVM has been chosen to focus more on the analysis of the results of the model on social media and Venice, and less on building experimental solutions for the classifier.

The work in this thesis is an extension of previously cited paper of Csurka and Dance, after the image feature extraction and histogram construction phases a modified version of the Support Vector Machines Algorithm is used to adapt the classification on the selected photos’ categories of Venice ex-tracted from social networks: such dataset in fact contains pictures taken with many different cameras (usually mobile phones) so orientation, light and scale may change in a relevant way, while at the same time many of the images may retrieved may be useless for the analysis and must be removed,

(22)

therefore a method to handle the high data variability and the removal of undesirable elements has also been studied and implemented.

(23)

Chapter 3 The Bag of Words model

In this chapter will be explained in details the Bag of Words model for text categorization, and how this model can be used also in the computer vision field of study to correctly classify images.

3.1 The Bag of Words model

Any text such as a sentence or a document, in the Bag of Words model (or BoW model) is represented as Bag or Multiset of its elements considering in particular the multiplicity of each word, while at the same time ignoring word order and grammar; the frequency of the occurrences of every word can be used as feature vector in a classifier to perform document categorization. For example considering two simple text sentences:

1. Leo wants to win the Oscar, Matt wants the same 2. Matt eventually wins the Oscar

(24)

The correspondent dictionary is constructed as a list of the words exclud-ing repetitions:

[ Leo , wants , to , win , the ,

Oscar , Matt , same , e v e n t u a l l y , wins ]

Since the dictionary is composed by 10 distinct words, each sentence can be represented by a vector composed by 10 elements:

( 1 ) [ 1 , 2 , 1 , 1 , 2 , 1 , 1 , 1 , 0 , 0 ] ( 2 ) [ 0 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 1 ]

where each elements of the vector is the number of occurrences of that particular word in the sentence, for example the second element in the first vector corresponds to the word “wants” and its value is 2 because the word appears two times in the first sentence.

This representation is also called Histogram and it is used with success as a feature vectors in many application such as text categorization or e-mail filtering[25].

(25)

3.2. THE BOW MODEL IN COMPUTER VISION 9

3.2 The BoW model in Computer Vision

In computer vision a Bag of Visual Words (or Keypoints) is the set of the occurrences of local image features over the vocabulary of local features, therefore an image is considered in the same way as a text document and the consequent problem is to define what a word in the image context is. To fulfill this purpose three main phases are usually followed: feature detec-tion, feature description and codebook formation.

3.3 BoW: Feature detection and description

Starting from an initial set of measured data, the act of feature extraction builds a new set of derived values called features which are informative, non redundant used to facilitate the subsequent learning and generalization phases of machine learning algorithms. It is used also to reduce the number of informative elements and reduce repetitiveness, in fact, in the case of a Bag of Visual Words model and in particular in this thesis where the dataset is formed by pictures, using all the pixels of the whole image set as training data is computationally impracticable and may also cause a considerable loss in the informative capacity of the dataset.

In this work project the features are extracted using a dense version of the Scale-invariant feature transform algorithm, and the relative feature de-scription is based on the produced SIFT descriptors; SIFT will be described in details in the next section.

(26)

3.3.1 SIFT: Scale Invariant Feature Transform

Feature generation algorithm published by David Lowe in 1999 [19] the ap-proach transforms the image in a large set of features vectors with the prop-erty of invariance to image translation, scaling, and rotation.

Therefore a SIFT feature is a selected region of the image, called key-point with associated a descriptor vector: in particular a SIFT keykey-point is a circular region of the image with an orientation and it’s composed by four parameters: the coordinates x and y of the center, the scale (radius of the cir-cle) and the orientation (angle expressed in radiant). By searching for blobs (keypoint’s structure) at multiple location and scale, the SIFT detector is in-variant to translation, rotation and scaling of the image

Figure 3.1: Sift Keypoint in an image

To search the “best” keypoints of an image, a Gaussian Scale Space is con-structed which is basically a set of image convoluted with a DoG (Difference of Gaussian) filter with different levels of σ, the best keypoints obtainable are the one corresponding with the points of maxima and minima of the function. However, in the “dense” version of the algorithm used in this the-sis work the detection phase of the algorithm is not performed because the locations of the keypoints are selected every fixed number of pixels, metric called binsize.

(27)

3.3. BOW: FEATURE DETECTION AND DESCRIPTION 11

A SIFT descriptor is a three-dimensional spatial histogram of the image gradients, samples are weighted by the gradient norm and accumulated in the 3-D feature vector formed by the pixel location and the gradient orientation: the spatial coordinates are quantized in four bins each (x and y), orientations in eight bins, so in more practice terms the resulting SIFT Descriptor of a point is a 128-dimensional vector (8x4x4=128 bins). In the end an additional Gaussian function is applied to weight the gradient to give less importance to the ones farther away from the keypoint center.

Image _x 0 1 ... 9 x y Descriptor Geometry y 0 1 ... 9 0 1 ... 9 ₀ 1 ... 9

Figure 3.2: Sift Descriptors: spatial histogram of the image gradient The gradient vector computed at the scale σ can be denoted as:

J (x, y) = ∇Iσ(x, y) = ∂Iσ ∂x ∂Iσ ∂x

The Sift descriptor is a three-dimensional spatial histogram of the distri-bution of J(x, y), to describe how it is constructed it’s convenient to show it in the canonical frame. In this special frame, the axis of the descriptors coincide with the one of the image and each spatial bins have the size of 1. The histogram has Nθ× Nx× Nybins, (usually 8x4x4), as it is showed in the

following figure:

(28)

x y yj xi i j Nx/2 Ny/2 w(x-xi) w(y-yj) 1 1

Figure 3.3: Canonical SIFT descriptor and spatial binning functions

θt= 2π Nθ t, t = 0, ..., Nθ− 1, xi = i − Nx− 1 2 , i = 0, ..., Nx− 1, yj = j − Ny − 1 2 , j = 0, ..., Ny− 1.

The histogram is constructed by using trilinear interpolation, i.e. by ap-plying a weight to the contributions of the binning functions:

w(z) = max(0, 1 − z), wang(z) = +∞ X k=−∞ w Nθ 2πz + Nθk

The gradient vector field is then transformed in a 3-d density map of weighted contributions:

f (θ, x, y) = |J (x, y)|δ(θ − ∠J(x, y))

The histogram is localized in the keypoint support by a Gaussian win-dow of standard deviation σwin. The histogram can be retrieved using the

(29)

following formula: h(t, i, j) =

Z

gσwin(x, y)wang(θ − θt)w(x − xi)w(y − yj)f (θ, x, y)dθdxdy

= Z

gσwin(x, y)wang(∠J(x, y) − θt)w(x − xi)w(y − yj)J(x,y)dxdyb

In practice, the descriptors are not computed in the canonical frame but directly in the image one, so using a hat notation to distinguish between the quantities in the canonical frame and the one relative to the image frame, the two frames are related by an affinity:

x = Aˆx + T, x = " cx y # , ˆx = " cˆx ˆ y #

Then all the descriptors quantities can be computed directly in the image frame, the image at scale σ is in relation with the canonical image (in the same scale):

ˆ

I0x =Iˆ 0(x), x = Aˆx + T

ˆ

Iσˆx =Iˆ Aˆσ(x), x = Aˆx + T

Where generalizing the previous definitions: IAˆσ(x) = (gAˆσ∗ I0)(x), gAˆσ(x) =

1 2π|A|ˆσ2 exp (− 1 2 x>A−>A−1x ˆ σ2 )

Deriving can be shown how the gradient fields are in relation: ˆ

(30)

Therefore the descriptors can be computed in the image or canonical frame shifting between the following two formulas:

h(t, i, j) = Z

gσˆwin(x)wang(∠ ˆJ (ˆx) − θt)wij(ˆx)|J (ˆx)|dˆx

= Z

gAˆσwin(x − T )wang(∠J(x)A − θt)wij(A −1

(x − T ))|J (x)A|dx Defining the product between two binning functions as:

wij(ˆx) = w(ˆx − ˆxi)w(ˆy − ˆyj)

3.3.2 Standard Sift Descriptor

Considering a SIFT keypoint centered in T, with scale σ and orientation θ, the affine transformation (A,T) can be reduced to the similarity transformation:

x = mσR(θ)ˆx + T

where R(θ) is a counter-clockwise rotation of θ radians, mσ is the de-scriptor magnification factor which expresses the difference of scale of the descriptor bin and the keypoint σ. The standard SIFT descriptor computes the gradient of the image at the scale of the keypoints, which is equivalent to a smooth of ˆσ = 1/m in the canonical frame; since the default Gaussian window size has a standard deviation of ˆσwin = 2, the resulting formula is:

h(t, i, j) = mσ Z gσwin(x − T )wang(∠J(x) − θ − θt)wij( R(θ)>x − T mσ )|J (x)|dx σwin = mσ ˆσwin J (x) = ∇(gmσ ˆσ∗ I)(x) = ∇Iσ(x).

(31)

3.3.3 Dense SIFT

The Dense version of the Scale Invariant Feature Transform algorithm com-putes the descriptors in a dense grid of locations with a fixed scale and orien-tation, and depending on the density of the grid the algorithm can be faster than the “original” version, because several simplification can be applied.

Image Domain

Sampling Step Bin size

Figure 3.4: Geometry of the Dsift descriptors

In the case of computing descriptors differing only by their location and with null orientation the histogram can be computed as:

x = mσ ˆx + T h(t, i, j) = mσ Z gσwin(x − T )wang(∠J(x) − θt)w x − TX mσ − ˆxi · · w y − Ty mσ − ˆyj |J(x)|dx Since a lot of different values of T are sampled, the histogram formula can be expressed as a convolution between separable components.

Translating by xij = mσ(ˆxi, ˆyi)> and using the symmetry of the binning

(32)

T0 = T + mσ " xi yi # , h(t, i, j) = mσ Z gσwin(T 0_{− x − x} ij)wang(∠J(x) − θt)w T0 x− x mσ · · w _T0 y− y mσ |J(x)|dx

Defining then the kernels for x and y components as:

ki(x) = 1 √ 2πσwin exp −1 2 (x − xi)2 σ2 win w x mσ , kj(y) = 1 √ 2πσwin exp −1 2 (y − yj)2 σ2 win w y mσ

Getting as result the following simplified formulas for the histogram and the gradient vector:

h(t, i, j) =(kikj ∗ Jt)(T + mσ " xi yi # ), Jt(x) = wang(∠J(x) − θt)|J (x)|

The main advantages of the SIFT algorithm are its good recall rates or ac-curacy when its descriptors are used to compare images, robustness to occlu-sion, rotations and scale, and the fact that slightly different implementations of SIFT like DSIFT (not the original LOWE’s version) are free to use and in-cluded in various machine learning libraries. Comparing the algorithm with

(33)

a more modern one like SURF, SIFT gives a comparably good accuracy at the cost of a slowly computational time but since real-time execution speed is not required in this project SIFT is a solid choice. Another advantage of SIFT is the greater descriptor’s dimension (128 vs 64) which may bring more informations in the next steps of the classification algorithm. More informa-tions about SIFT and its variants can be found in the research done by Wu [14].

(34)

3.4 BoW: Codebook Formation

This phase is the final step in a Bag of Word model and is dedicated to the cre-ation of the image’s domain words called codewords and the corresponding dictionary called codebook, to perform the task a simple solution is to perform K-means (theory in Appendix A) clustering over the images patches descrip-tors extracted in the previous phase, the centroids (cluster centers) obtained at the end of the clustering algorithm corresponds to the codewords and that set is the codebook or images’ dictionary. Finally each patches descriptor of an images is assigned to the the nearest codeword, in a similar manner to the textual case, an image can be represented by an histogram of the codewords.

(35)

3.5. BOW: LEARNING AD CLASSIFICATION 19

3.5 BoW: Learning and classification

The feature histograms constructed with the Bag of Word model can be used as feature vector in many computer vision applications, in the case of this thesis work the purpose of image categorization is performed using a model used with success also in text categorization, Support Vector Machines.

3.5.1 SVM: Support Vector Machines

The Support Vector Machines or kernel machines are a set of supervised learning methods, for the classification or regression of patterns created by Vapnik in 1995 [8]. The main idea is to construct an hyperplane in a high or potentially infinite number of dimensions trying to find a good separations between the data, and that is achieved by the plane which has the higher distance between the training data point of any different class because in general bigger is the “margin” hyperplane lower is the generalization error of the classifier. Maximum Margin Optimal M ar_{gin H} yp er_plane X1 X2

(36)

Given some training data D, composed by a group of points in the following form:

D = {(Xi, Yi)kXi ∈ Rp, Yi ∈ {−1, +1}}ni=1

where Xi is the feature vector and Yi is the label of the class and can be

either -1 or 1.

The objective is to discover the hyperplane that can divide the elements hav-ing Yi = −1and those having Yi = 1with the maximum possible margin.

An hyperplane can be written as the set of points X satisfying: w · x − b = 0

Where w i the normal vector to the plane and · is the dot product.

Considering data linearly separable, two hyperplane can be selected in a way that there are no points between them and then their distance can be maxi-mized:

w · Xi− b ≥ 1 for the first class

w · Xi− b ≤ −1 for the second class

and can be compressed into the formula:

Yi(w · Xi− b) ≥ 1for all1 ≤ i ≤ n

So the optimization problem becomes: Minimize in kwk2

Yi(w · Xi+ b) ≥ 1

(37)

Including KKT multiplier [17] and substituting the object function with 1

2kw

2_k

for mathematical convenience, the resulting problem with added constraints becomes, in its primal form:

arg minw,b maxα≥0

( 1 2kwk 2 ₋ n X i=1 αi[Yi(w · Xi− b) − 1] )

In 1995 Corinna and Vapnik suggested a modified maximum margin ver-sion called “Soft Margin” that allows for mislabeled examples: the non-negative slack variable ξ is introduced to measure the degree of misclassification of the data Xi.

So the problem becomes:

arg minw,b,ξ ( 1 2kwk 2_{+ C} n X i=1 ξi )

subject to (for any i=1,..,n)

(38)

3.5.2 Non-linear SVM

If the problem leads to non-linear separable data using a “Kernel Trick” [17] SVM can be used as a non-linear classifier: the data can, in fact, be mapped into a richer feature space including non linear features; in the obtained space can be constructed an hyperplane that can be used to separate the data. In a more formal way:

x 7→ φ(x)

where φ is a non-linear kernel function, so the resulting objecting function becomes: f (x) = w · φ(x) + b

The most used kernel are: • Linear, [x, x0_]

• Polynomial, (γ[x, x0_{] + r)}2

• Radial basis function (Rbf), exp (−γ|x, x0_|2₎

• Sigmoid, tanh(γ[x, x0_{] + r)}

3.5.3 Multi-class SVM

In case of classification with multiple different labels (like in the case of this thesis work) the main approaches are:

• Building binary classifiers which distinguish between one of the la-bel and the rest (one-vs-all approach), the class then is assigned to the classifier which has the highest output function;

(39)

• Building binary classifiers between every class pairs (one-vs-one ap-proach): every classifier assigns the vote to one of the two classes and then the one with the maximum number of votes determines the final classification results.

The SVM has many different advantages: first of all, the theoretical steps behind results into a sound geometrical interpretation, so understanding how SVM operates is simple; the ability of use different Kernels corresponds to a great adaptability of the algorithm, giving good performance results to many different kinds of problems; the algorithm also has the property of not converging to local minima when non-linear kernels are used, called “con-vexity property”, so the solution obtained is always the global one.

On the other hand, SVM considerable limitations are the speed and size in both the training and testing phases which may be considerable, and the re-quirement of a good knowledge about the addressed problem to chose the right kernel and the optimal kernel parameters.

(40)

(41)

Chapter 4 Construction of the classifier

4.1 The Dataset

To retrieve a large set of images and photos taken in Venice with attached geographic coordinates, the social media Instagram has been chosen: the media library from which to select the images is huge, the API is free and simple to use, as a “cons” the images must be subjected to a first level of filtering to remove a remarkable amount of useless pictures for the project, but once this problem is sorted out the remaining ones are characterized by a lot of variability for each category which could lead to a good level of adaptability of the final classification algorithm.

In practice, to obtain the image set a wrapper in Python language of the Instagram API has been used, in particular one method of the API called media search given as input a geographical coordinate and a date and returns a JSON structure containing all the media information of a set of retrieved images, the most important to the analysis are:

1. The univocal id of the image 25

(42)

2. The url of the image in low resolution 3. Longitude and latitude of the picture 4. Tags, Comments, other informations

This method has been called along a grid of coordinates to cover the rect-angular area corresponding to the geographical zone of Venice to get the maximum coverage possible without overlapping, on the other hand another possible solution would have been to simply select a coordinate in the cen-ter of Venice and a radius big enough to contain all the city area, but this solution have been rejected because of the api limitation on the maximum number of results per request that would have excluded an important part of the dataset.

To recap the dataset is composed by a set of images of Venice, obtained from the users’ photos uploaded in the social “Instagram”, since there haven’t been any kind of preprocessing filtering operation, the photos are very differ-ent from each other with consistdiffer-ent variations in the subject pictured, scale, orientation and luminescence.

The categories in our analysis are selected with the objective of discover a pattern in the type of photos taken in the various districts of Venice, but due to the high variance in the data they must also be general enough to be able to effectively generalize; the chosen categories are the following:

(43)

4.1. THE DATASET 27

1. Lagoon landscape, photos taken near or inside the lagoon, are charac-terized by water, boats and “bricole”.

Figure 4.1: First category, Lagoon landscape

2. Townscape, photos shotted in the city regarding bridges, monuments, squares, churches.

(44)

3. Art, photos taken inside museums, art galleries

Figure 4.3: Third category, Art

4. Folklore, photos of Carnival events, masks, or Venetian folklore like “Gondole” or “ Murano Glass blowers”

(45)

4.1. THE DATASET 29

5. Food, photos of Venetian gastronomic specialties or drinks

Figure 4.5: Fifth category, Food

6. Varies, this category contains each photo that doesn’t belong to any of these categories.

(46)

As previously stated the dataset contains a lot of different images, but inside each category can be found some recurrent patterns (e.g. water in the bottom part of the image in lagoon landscape, buildings in the townscape ) with the exception of the last category: this class contains a very different set of images that don’t belong to any of the others classes, so they are basically useless in the analysis and must be removed from the data set.

(47)

4.2. THE CLASSIFIER 31

4.2 The classifier

The main phases of the classifier are described in the flow chart:

Figure 4.7: The flow chart of the classification construction phases and recog-nition

The training steps can be summarized as:

• Selection of a meaningful dataset with assigned the class label used for training the classifier

(48)

• Detection of meaningful parts of images, and constructing an appro-priate description

• Discovering and assignment of those descriptors to a “vocabulary” of clusters (Form Codebook)

• Construction of a “Bag of keypoints”, counting the number of descrip-tors assigned to each cluster (Create Class Histograms)

• The bag of keypoints is used as a feature vector to train a multi-class classifier, determining the category of the image

In the testing phase, the feature extraction is the same as the training but the clustering phase is not performed again, the extracted keypoints are assigned to the nearest Codeword (or centroid) and the histogram of the im-ages is then created and used as final feature in the classification phase to label the test image.

(49)

4.3. FEATURE EXTRACTION FROM THE IMAGES 33

4.3 Feature extraction from the images

To select the image patches and the appropriate descriptors, the Dense Sift algorithm (described in chapter 3) implemented by the Vlfeat library [4] has been chosen: the algorithm selects a grid of keypoints each one separated by a fixed number of pixels, and for each of those keypoints a set of 128 Sift descriptors is computed. Sift descriptors are a solid choice for image recogni-tion as stated by Mikolajczyk in his work [21] because of their properties of scale, orientation and distortion invariance and with respect to other descrip-tors SIFT has a high number of components which means a more complete representation.

To the set of descriptors the location x and y of the keypoints has been added, with the purpose of adding a location component in the features: a recurrent pattern in the photos belonging to the same category can be easily found by performing a small analysis on the dataset, in fact, it is very likely that some similar patches or areas of images of the same category are also located in the same or near location, so adding two parameter to the feature vector to describe that fact is necessary to capture a meaningful information.

1 2 128 x y

Sift descriptors Keypoint location

Figure 4.8: A singular feature

The main problem in this phase is selecting the distance between a key-point chosen for the feature extraction and the next one, this parameter called “binsize” has been chosen experimentally trying to balance the amount

(50)

of information and the number of keypoints: in fact selecting a too little bin-size value leads to having overlapping informations from the sift descriptors and consequent redundant informations, on the other size a binsize too big means less keypoints are selected from the photo and significantly reduc-ing the amount of valuable categorization information from the images; an experimentally good binsize is 5.

This process has been applied to a subset of 100 images for each category and the resulting descriptors have been concatenated creating a descriptors matrix of dimensions ∼4 millions x 130 element.

An important consideration in this phase is the fact that since the size of the pictures retrieved is not standardized, it may change a lot from photo to photo depending on the camera used, is necessary to keep trace of the descriptor-category relation, that is a range information of the the corre-spondence between a set of keypoints and the belonging class (e.g. keypoints from 1 to 627 000 correspond to the first category and so on... ).

4.4 Clustering

Clustering over the descriptors matrix is performed, giving as results the Keypoints Dictionary : the cluster’s centers obtained in the last step of the clustering algorithm are in fact the set of feature vectors that form the dic-tionary of the classes, they are called Keypoints because of the derivation from the Keywords of text categorization, they however may not have an understandable and repeatable meaning such “boats” for lagoon landscape photos, or “masks” for folklore ones, so selecting an ideal set of keypoints is not obvious, for this reason the objective is to select a set that performs the with best categorization rate on our dataset.

(51)

4.4. CLUSTERING 35

Having zero information on the distribution of the keypoints in the “key-space” choosing K-means as clustering method is a good option, in fact, be-cause of its quick speed of execution, it can be tested with many different values of it’s parameter K in a relatively small amount of time; the resulting clusters lead also to quite good results as we will see in the next paragraph. The main problem in the clustering phase is how to chose the number of cluster K, tests however have been executed with an increasing number of clusters (from k=100 to k=1200) in 5Fold-Crossvalidation, with Rbf kernel of parameters C=2000,gamma=2e-07 and the results are the following:

Figure 4.9: Test of the rate of correctness, changing parameter K The mean rate of the five iteration of the Crossvalidation is showed in the figure, the rate increases from K=100 up to K=600 from where the rate re-mains basically constant around 0.81 with little fluctuations, so a good value of the parameter is between the range K=600 and K=1200. Picking a K value bigger than 1200 is either computationally too costly because of the time taken by the clustering algorithm which takes more than a day, and either

(52)

not optimal because it means also an increase in the dimension of the fea-tures which will eventually lead to a decrease in the correctness rate. For the next tests a number of clusters parameter of 1000 has been chosen.

4.5 Construction of the Bag of keypoints

Construction of the Bag of keypoints : each descriptors taken from the set is assigned to the nearest centroid and for each category the number of as-signments descriptors-centroid is counted, in this way a “histogram” of each class is built giving a representation of how much particular descriptors are linked to which centroids for every classes.

C1 C2 C3 C998 C999 C1000

Figure 4.10: Example of a class Histogram

4.6 Classification

To perform classification the histograms computed in the previous phase are normalized dividing each component by the number of images to be able to use them as training vectors for the classification algorithm chosen: Support Vector Machines (SVM).

(53)

4.6. CLASSIFICATION 37

In practice, to classify a new image, a similar process is used to compute the features: dense sift is applied, but because the centroids have already been computed the clustering phase is not necessary, the descriptors are as-sociated directly to the nearest centroid and the consequent histogram of the image is computed by counting the number of descriptors in the exactly like in the previous case. The image-histogram is then given as feature vector to the SVM algorithm, that performs the classification.

4.6.1 The parameters of SVM

Two main versions of SVM have been implemented, using two different func-tions of the Scikit Learn Python library [1] :

1. The first one “LinearSVM”, implements the multi-class classification using a linear kernel and an one-vs-all approach,a number of models equal to the number of classes are trained, the label is assigned to the class which classifies the test with the largest margin.

The only parameter for this model is c, tests have been in 5-fold cross-validation trying orders of the parameter c, the optimal value has been discovered in c=2e-07:

(54)

2. The second version called “SVC” implements the multi-class classifi-cation using an one-vs-one approach: if C is the set of classes,

kCk ∗ (kCk − 1)/2

models are trained using every different pair of classes, then the label is assigned to the class that obtains the maximum number of assign-ments from the classifiers. This implementation of SVM allows the use of non-linear kernels, in particular for the purpose of this thesis’ project the Radial basis function kernel or RBF has been chosen, its mathematical formulation is

exp (−γ | x − x0 |2₎

because as stated by Hsu, Chang and Lin [7] the RBF is a good choice of kernel because it can handle the case of non-linear relation between class labels and attributes and it requires the setting of only two pa-rameters (less than the polynomial kernel) to give generally good re-sults, the first it’s the penalty parameter c, the second is γ. A loose grid search test using increasing values of the two parameters has been per-formed, the best result in the rate of correct results (considering all the classes) is obtained with the parameter of c set to 2e3 and γ to 2e-07.

(55)

The results of the entire test on the parameters is shown in the next table:

C/Gamma 2,00E-15 2,00E-13 2,00E-11 2,00E-09 2,00E-07 2,00E-05 2,00E-03 2,00E-01 2,00E+01 2,00E+03 2,00E+05 2,00E-07 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 2,00E-05 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 2,00E-03 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 2,00E-01 0,17 0,17 0,17 0,17 0,37 0,17 0,17 0,17 0,17 0,17 0,17 2,00E+01 0,17 0,17 0,17 0,43 0,70 0,17 0,17 0,17 0,17 0,17 0,17 2,00E+03 0,17 0,17 0,44 0,65 0,70 0,17 0,17 0,17 0,17 0,17 0,17 2,00E+05 0,17 0,41 0,64 0,64 0,68 0,17 0,17 0,17 0,17 0,17 0,17

Table 4.1: Test on the parameters C and γ on the SVC with rbf kernel

4.6.2 The Varies class problem

A severe problem encountered is that an important rate of the image re-trieved from the social “Instagram” belongs to the “Varies” category: being composed of photos of the most different subjects that couldn’t be assigned to any classes, it is the class with the higher percentage of miss-classifications, in fact, about half of the images that should belong to “Varies” are assigned to other classes.

Since the aforementioned class’ images are basically useless in the general analysis a method to remove them must be implemented: the solution cho-sen use the class membership probabilities estimates as explained by Chih and Jen Lin [6] to perform a modified classification process. The calcula-tion of these probabilities by the implementacalcula-tion done by Scikit library [1] are strictly related to the “SVC” one-vs-one methods, and gives an estimate of the probabilities of a particular element of belonging to the classes (the maximum probability may differ from the maximum decision voted class), the modified version multiply the probability of the “Varies” category by a parameter called M, and then the final assignment is done by assigning the

(56)

label to the class with the maximum probability.

In this way many images that were miss-classified by not having a strong “membership” with a particular class are now likely to be assigned to the “Varies” class. On the other hands some images that were correctly assigned now might be miss-classified to “Varies” and then removed, but for the pur-pose of the analysis it’s better to keep the maximum amount of correctly classified elements of the main categories even if it means to exclude a little portion of the images that before was rightful classified.

To discover a good value for the M parameter, tests have been made on a dataset of 100 images for each classes for a total of 600 images , using the “SVC” method with parameters K=1000, C=2000, gamma=2e-07, kernel=’rbf’, type=’onevsone’: an optimal value of the M parameter as been detected is 2, in the following schema can be seen how the confusion matrix changes varing from an M=1 in the first case to M=2 in the second:

Figure 4.12: Comparison of the Confusion Matrix with parameter M=1 and M=2

In the first case the confusion matrix of the classification with the pa-rameter M set to 1 shows a classification rate of 0.68 considering also the “Various” class elements, while removing its elements and considering only the sets belonging to the other five categories lead to a classification rate of

(57)

0.79; in the case with M=2, many images that were belonging to “Various” but miss-classified to other classes are now correctly assigned, on the other hand the inverse happens but only a negligible amount of times: in fact the general rate(“Various” class included) remains 0.68, while on the other hand the rate considering only the elements of the five classes grows to 84% of correctness.

4.6.3 Final considerations

Even if the performance of the one-vs-one classifier with rbf kernel is only slightly better than the one-vs-all linear one, it has been preferred and used as the final classifier in all the subsequent analysis because of the implementa-tion of the Varies eliminaimplementa-tion phase which uses the membership probabilities to exclude that category of elements.

(58)

4.7 Display of the results

In this section we will show the output of the classification algorithm giving some particular pictures, first of all some correct results will be presented fol-lowing with some misclassification cases and particular results. All the dis-played tests are performed with the one-vs-one version of the classification algorithm with a “rbf” kernel with parameters K=1000, C=2000, gamma=2e-07.

The image in the first test is the following:

Figure 4.13: Test 1, correctly classified Folklore image

The results of both votes and probabilities are in the following table:

Category Lagoon Landscape Townscape Art Folklore Food Varies

Votes 4 3 2 5 0 1

Probabilities 0.218 0.177 0.155 0.279 0.032 0.276

Table 4.2: Votes and probabilities of Test 1

This picture contains elements of both the categories folklore (the masked people and the gondola) and lagoon landscape (a good degree of water) and that fact is reflected on the votes of the one-vs-one algorithm (Folklore has 5 votes, Lagoon has 4), and on the other hand looking at the probabilities

(59)

4.7. DISPLAY OF THE RESULTS 43

it is important to notice that the second biggest class is now Varies since it has been multiplied by the factor M, but the maximum one remains Folklore which is the correct category.

The second test presented is:

Figure 4.14: Test 2, correctly classified Food image

The pictures belonging to the food category is characterized by very stan-dard patterns, usually a white plate with the food in the center, so it is one of the classes with the best classification rate:

Votes 0 1 2 3 5 4

Probabilities 0.009 0.009 0.013 0.018 0.93 0.041

From the table can be easily seen how the class Food has the maximum number of votes and at the same time the probabilities of belonging to that class is 0.93 which means a really strong membership rate.

(60)

Another test of the performance of the classifier is the next Townscape picture that contains also typical elements of the lagoon landscape class:

Figure 4.15: Test 3, correctly classified Townscape image

Votes 4 5 2 3 0 1

Probabilities 0.192 0.623 0.040 0.073 0.008 0.126

The presence of elements of both classes is observed on both votes and probabilities, in fact the membership of the picture is disputed by the cate-gories Lagoon Landscape and Townscape and their corresponding high votes and probabilities. The picture is finally assigned to Townscape.

(61)

In the fourth test an error in the classification algorithm due to the M factor is showed, given as input the following picture:

Figure 4.16: Test 4, incorrectly classified Townscape image The output given is:

Votes 2 5 2 3 0 3

Probabilities 0.114 0.251 0.184 0.208 0.032 0.422

It is important to notice in the table that the votes alone would have cor-rectly labeled the image assigning it to the category Townscape but since the probability of that class is not strongly bigger than the others it has been “wrongly” assigned to the Varies category which probability has been dou-bled thanks to the M factor.

(62)

The last case presented is a typical misclassification error, the picture of a gondola considered a member of the Folklore category is wrongly assigned to the Lagoon Landscape category:

Figure 4.17: Test 5, incorrectly classified Folklore image

Votes 5 3 1 4 0 2

Probabilities 0.583 0.166 0.073 0.088 0.007 0.163

In this case the votes shows the conflict between the categories Lagoon Landscape (5) and Folklore (4) being those the ones with more votes, on the other hand the probabilities don’t indicate the same in fact the prob of the first class is way higher (0.583) than the Folklore one (0.08).

(63)

4.7.1 Considerations on the classifier results

Taking into account the type of dataset, important reflections can be made: the Art and Food categories have a rate of correct classification higher than the others thanks to the fact that the type of pictures that falls into those categories are quite standard, while on the other hand folklore pictures are misclassified with more ease to Townscape or Lagoon categories due to the presence in the picture of recurrent patches belonging to both the categories, like a monument on the background of some masked people or water in a folklore picture regarding a Gondola.

For this reason a practical extension to this classifier is to allow the assign-ment of a picture to more than one class, by allowing multi-class categoriza-tion the previously described problem is avoided, however new issues arise regarding especially the training phase that must be modified to consider the multiple label assignment, and a new method to select the final class or classes labels must be chosen wisely since the algorithm must assign the picture to a not known a priori number of categories.

(64)

(65)

Chapter 5 Analysis of the results

5.1 The year analysis

The main purpose of this thesis is to make a touristic analysis on the images taken by the people in Venice during the year and loaded on social media, so pictures taken all over the years 2014-2015 are retrieved by the algorithm every 5 days, classified, grouped by month and then assigned to the nearest Venice district, with this method a conspicuous dataset of about 90 000 im-ages is finally created.

For the analysis three main methodologies have been followed: first of all a general analysis of the images’ category is performed by looking how the distribution of the categories varies in both quantities and normalized rate over the months of the year but excluding the districts localization, then the second analysis is focused on the Venetian district with the purpose of dis-covering patterns in the distribution of particular categories over the months, lastly an analysis of the two year dataset is performed using as medium some “Heat Maps”showing the density of data categories over the map Venice.

(66)

5.1.1 General Quantitative Analysis

In the next table the quantitative distribution of the categories over the months of the years are shown:

Table 5.1: Quantitative data of the year 2014

CAT/MON JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

LAGOON LANDSCAPE 160 169 189 195 206 200 238 232 255 267 252 184

TOWNSCAPE 489 386 452 596 598 707 693 618 633 696 750 612

ART 177 244 120 220 228 246 221 230 263 233 255 241

FOLKLORE 111 159 124 148 159 197 183 174 168 218 237 189

FOOD 159 169 178 208 215 237 207 213 230 267 249 197

Table 5.2: Quantitative data of the year 2015

LAGOON LANDSCAPE 225 245 215 249 259 311 345 294 303 292 310 221

TOWNSCAPE 682 765 750 921 914 951 1055 1047 1302 1441 1209 1313

ART 250 298 204 237 359 304 373 374 536 588 466 469

FOLKLORE 183 293 236 241 391 324 332 272 473 534 450 574

FOOD 227 231 251 270 330 336 284 284 468 396 313 367

Comparing just the quantity of photos obtained from the year 2014 (34102) to the one of the year 2015 (56274) a substantial increment is evident, due to the fact that while the method for image retrieval didn’t changed, the num-ber of global users of Instagram has more than doubled from 2014 to 2015 [3]. Adding now the months in the the quantitative analysis provides some interesting results: ignoring at first the categorization and focusing only on the quantities of photos obtained monthly can be noticed that there is a rela-tion with the quantity of photos taken and particular events of the Venetian life, for example in February takes place the last week of the carnival cele-bration where an increased rate of pictures are taken and the difference in

(67)

5.1. THE YEAR ANALYSIS 51

the total quantity with respect to the months of January and March is re-markable; another case regards the months of September and October and their considerable amount of images related, they are in fact the two months with the most number of photos obtained, this case is strictly related with the “Mostra del Cinema” and “Biennale” particular events that also takes place in one of the periods with the most touristic presence in Venice, by looking at the touristic statistical data, courtesy of “turismovenezia” [2]:

Touristic Affluence JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

2015 313.880,00 394.867,00 484.211,00 591.777,00 666.488,00 656.027,00 728.112,00 722.032,00 700.002,00 692.113,00

2014 299.660,00 358.883,00 503.144,00 600.901,00 644.397,00 640.148,00 681.859,00 678.655,00 624.553,00 637.994,00 413.559,00 341.556,00

Table 5.3: Touristic presence in Venice during the years 2014 and 2015 Can be highlighted, how taking in consideration particular events that lead to an increase of the daily rate of the people’ pictures in particular peri-ods like the previously mentioned Carnival, Mostra del cinema or Biennale there is a connection between the number of tourists visiting Venice and the total number of photos retrieved.

(68)

To better understand this fact a comparison with the graphs of the quan-tity of pictures retrieved and the statistical touristic presence of both the years 2014 and 2015 have been presented:

(a)Quantity of image - 2014 (b)Touristic presence - 2014

(c)Quantity of image - 2015 (d)Touristic presence - 2015

Figure 5.1: Comparison: Images Retrieved and Touristic Affluence in Venice on years 2014-2015

(69)

5.1.2 General Normalized Analysis

In the next picture is shown the same graph of the last paragraph with in addiction the distinct categories discovered:

Figure 5.2: Quantitative representation of the categories over the months of the years 2014-2015

The first consideration is the important amount of Townscape picture, in fact being the city area the field of analysis is obvious that the greater part of images that regards monuments, bridges or outside pictures that fall into this category.

However for the analysis is more relevant to look at the normalized rate of the categories, calculated by the following formula:

Considering α singular element of the dataset, and C = {1, 2, ..., 6} set of categories, Ni = P αj∈Ci αj P α

(70)

The resulting normalized rate of Venice along the year 2015 are displayed in the following tables and graph:

LAGOON LANDSCAPE 0,15 0,15 0,18 0,14 0,15 0,13 0,15 0,16 0,16 0,16 0,14 0,13 TOWNSCAPE 0,45 0,34 0,43 0,44 0,43 0,45 0,45 0,42 0,41 0,41 0,43 0,43

ART 0,16 0,22 0,11 0,16 0,16 0,16 0,14 0,16 0,17 0,14 0,15 0,17

FOLKLORE 0,10 0,14 0,12 0,11 0,11 0,12 0,12 0,12 0,11 0,13 0,14 0,13

FOOD 0,15 0,15 0,17 0,15 0,15 0,15 0,13 0,15 0,15 0,16 0,14 0,14

Table 5.4: Normalized distribution of the data over the months of the year 2015

LAGOON LANDSCAPE 0,14 0,13 0,13 0,13 0,11 0,14 0,14 0,13 0,10 0,09 0,11 0,08 TOWNSCAPE 0,44 0,42 0,45 0,48 0,41 0,43 0,44 0,46 0,42 0,44 0,44 0,45

ART 0,16 0,16 0,12 0,12 0,16 0,14 0,16 0,16 0,17 0,18 0,17 0,16

FOLKLORE 0,12 0,16 0,14 0,13 0,17 0,15 0,14 0,12 0,15 0,16 0,16 0,19

FOOD 0,14 0,13 0,15 0,14 0,15 0,15 0,12 0,13 0,15 0,12 0,11 0,12

Table 5.5: Normalized distribution of the data over the months of the year 2015

Figure 5.3: Normalized representation of the data categories over the months of the years 2014-2015

(71)

As expected the dominant category is Townscape/Monument always with around 40% of presence in both years, while the others categories all have rates of 0.20 or less in every months: to analyze the other categories with the highest rate a distinction must be made for the two years, in 2014 in fact Art and Lagoon landscape were the classes with the highest rate (excluding Townscape) with an exceptionally high peak in February for Art, while on the other hand in 2015 way less Lagoon Landscape pictures where retrieved, instead the new highest rates categories are Art and Folklore.

5.1.3 District focused Analysis

This paragraph of the analysis is centered on the Venetian districts, focused in particular on discovering patterns on the photos’ categories over the months of the year, for this purpose the normalized data of the categories are taken into analysis to find particular trends with respect to the other categories, and such normalized categories are preferred to the quantized ones because the information they carry are not influenced by the variance of the general number of photos taken: for example there isn’t a correspondence between an increase on the quantized number of folklore pictures on the district of Canareggio in the month of March and it’s relative normalized rate if the quantized number of all the other categories increase as well; while on the other hand a change on the relative rate of one category is always a mean-ingful information.

(72)

The normalized categories rate of all the Venice Districts (Canareggio, Castello, Dorsoduro, San Marco, San Polo, Santa Croce) divided by months can be found in the “appendix”, in the next paragraphs the most interesting results discovered will be presented.

The first pattern discovered regards the Folklore category by looking at its distribution in years 2014 and 2015 some interesting details can be noticed:

Figure 5.4: Normalized representation of Folklore over the months of the years 2014-2015

In almost every months of both years, the district with the highest rate of Folklore pictures is San Marco and that is certainly due to the fact that is one of the most famous touristic places “Piazza San Marco” is placed in that particular district where masks, gondole and other venetian folklore el-ements can be found during every periods of the year. A general increase can however be notices in all the districts during the months of February that cor-responds to the last week of the Venetian Carnival where the rate of folklore related picture is obviously higher with respect to the nearest months, how-ever in contrasts with the predictions February it’s not the month with the highest rate in general, it is in fact overcomed by November in 2014 and by some months in 2015.

(73)

Another interesting consideration can be stated about the category La-goon Landscape, the expectation was to find an higher rate on the more ex-ternal districts and an important feedback is in fact confirmed by the results:

Figure 5.5: Normalized representation of Lagoon Landscape over the months of the years 2014-2015

The districts with the highest rate in both 2014 and 2015 are Castello and Dorsoduro and, in fact, both districts are bordering the lagoon (Castello is near Murano, Dorsoduro is facing Giudecca and its canal), while the other districts located in more centrals positions have considerable lower rates of Lagoon pictures with the extreme of Santa Croce in February with zero pho-tos retrieved. In 2015 the highest rates of Lagoon Landscape were obtained in Summer (with maximum in July) but the same trend isn’t confirmed by the 2014 data where the highest rates were discovered in the months of March and September. It is also important to notice that the mean rate of that cate-gory of pictures has decreased by 0.06 from 2014 to 2015 indicating a relevant decrease in the number of lagoon pictures taken by the people. Further in-vestigations are however required to find the cause.

(74)

The opposite type of lagoon landscape’s photos are obviously Townscape’s, as stated before it’s the dominant category comprehending more or less half of the pictures obtained, but we should notice a decrease in the rate of pic-ture in correspondence with the districts that were characterized by a higher rate of lagoon picture:

Figure 5.6: Normalized representation of Townscape over the months of the years 2014-2015

As expected, the greatest rate of Townscape pictures in both 2014 and 2015 is found in the central districts of Santa Croce and San Polo, the ones with the minimum rates of lagoon pictures, confirming the presence of an inverse correspondence between the number of Lagoon and Townscape pic-tures.

(75)

Regarding the remaining categories of Art and Food, the corresponding graphs are the following:

Figure 5.7: Normalized representation of Art over the months of the years 2014-2015

Figure 5.8: Normalized representation of Food over the months of the years 2014-2015

There are no clear trends in the monthly distribution those two cate-gories, since there isn’t a district of Venice with an important majority or minority rate of those kind of pictures, but some considerations can be done on some outliers results: in both years can be observed a significant incre-ment on the rate of Art pictures in February followed by a decrease on the

(76)

next month moving from up to above the mean rate which is around 0.17, on the other hand regarding the Food category particular results are the rate of the months of September and October 2014 of the district of San Polo which grow to an important rate of 0.27 significantly higher than the mean of Food in 2014 of 0.15.

(77)

5.1.4 Analysis of categories densities

In this last section Heat Maps constructed using the yearly data and the coor-dinates of each pictures category are used to show a correspondence between the points with high density (highlighted in red in the heat maps) and actual places on the Venetian territory, however, since the amount of data obtained in 2015 is significantly greater than the year before and the scale in the two heatmaps is the same, the maps of year 2015 will always present an increased number of high density places with respect to 2014.

Heat Maps of Lagoon Landscape

Figure 5.9: Heat Map of Lagoon Landscape category, years 2014-2015 As expected for the Lagoon Landscape category the majority of high den-sity areas are found around the “Canale della Giudecca”, “Canal Grande” and nearby islands, in particular some places that are typically pictured with a great amount of lagoon elements are highlighted and easily found in the map, like Rialto, San Marco, Zattere, Isola della Giudecca and San Giorgio. As opposed the remaining city area shows a significantly less density of this category of photos, and that fact is a proof of the goodness of the classifier.

(78)

Heat Maps of Townscape

Figure 5.10: Heat Map of Townscape category, years 2014-2015 The results for the Townscape category are quite understandable, the point of highest density are all over the central city area of Venice, where the most of tourists are concentrated.

Heat Maps of Art

Figure 5.11: Heat Map of Art category, years 2014-2015

Considering the Art pictures is interesting to look at the difference be-tween the highest density area bebe-tween 2014 and 2015: in the first year the only area with a considerable density where the area of Biennale, Arsenale and Guggenheim, while on the on the other hand considering the data of

(79)

2015, in addiction to the places previously retrieved many other interesting cultural locations are discovered like Palazzo Grassi, San Giorgio or Ca Gius-tinian.

Heat Maps of Folklore

Figure 5.12: Heat Map of Folklore category, years 2014-2015

Folklore category presents interesting results: in both the heat maps but in particular in the 2015 one can be seen how the majority of picture taken belonging to this category are situated in the central and mostly touristic area of Venice, in addiction can be noticed a “Folklore Line”, a high density of Folklore photos retrieved are from a set of places (squares) aligned along one of the most direct path that leads tourists from Piazzale Roma to San Marco passing through Campo San Polo, Campo San Silvestro and Rialto. Heat Maps of Food

The last category analyzed is Food, and from the resulting Heat Maps can be noticed that the points with the highest density corresponds with im-portant restaurants or “osterie” typical of the Venetian gastronomical life: notable examples discovered are“Al timon”, “Tonolo”, “Al paradiso Perduto”.

(80)

Figure 5.13: Heat Map of Food category, years 2014-2015

It is also interesting to notice that some zones with a remarkable presence of notable restaurants were not discovered in the analysis because of their po-sition in the city, in fact, the majority of the places discovered are popo-sitioned around touristic routes since the analysis is mostly based on pictures taken by tourists that may miss “hidden” places out of the standard routes.

(81)

5.1.5 The distribution of Varies

Interesting considerations can be done on the distribution of the pictures assigned at the Varies class and excluded in the previous analysis, for this purpose a comparison on the density of the Varies category versus the den-sity of all the others assembled together is performed:

Figure 5.14: Heat map of Varies vs the other classes, 2014

Figure 5.15: Heat map of Varies vs the other classes, 2015

The density of Varies’ category pictures is distributed quite uniformly over the territory of Venice following with a strong relation the density of the set of the other categories, there are however some areas where this re-lation isn’t respected: in both years but with more evidence in 2015 in the area around “Tronchetto” (on the left side of the images) can be noticed how the Varies density is significantly higher than the density of the other classes