• Non ci sono risultati.

Analysis of Data from Airbnb, OMI and Idealista

3.1 Data Sources

Chapter 3

Analysis of Data from Airbnb, OMI and

ReservationDays: total number of listing calendar days that were classified as reserved during the reporting period (each calendar day is classified as either A=available, B=blocked, or R=reserved);

AvailableDays: total number of listing calendar days that were classified as available during the reporting period.

BlockedDays: total number of listing calendar days that were classified as blocked during the reporting period.

CityAirDNA: the city where the property is located;

Latitude & Longitude: the property’s geographical coordinates;

Active: vacation rentals that had at least one calendar day classified as reserved during the reporting period;

PropertyType: it is the type of accommodation (Flat, Villa, Studio, Cottage, etc.);

ListingType: a listing can either be an Entire Home, a Private Room, a Shared Room, or a Hotel Room;

PublishedNightlyRate: default nightly rate for the rental;

AirbnbPropertyID: it is the unique Airbnb property ID;

AirbnbHostID: it is the unique Airbnb host ID.

This database has been merged with data from Idealista, assigning each listing to a specific neighbourhood on the basis of its coordinates. The introduction of a common element between the two datasets allows for aggregate measures to be calculated in 3.3 and facilitates the econometric analysis in Chapter 4.

3.1.2 OMI

Data collected by OMI are aggregated on the basis of geographical areas named Zone OMI, which have different dimensions and are typically homogeneous in characteristics. Each area is identified by a unique code starting with the letter referring to the category it belongs to. The categories identify 5 different area groups:

▪ B – Central areas

▪ C – Semi-central areas

▪ D – Areas on the outskirts of cities

▪ E – Suburban areas

▪ R – Exurban areas

Properties are subject to grouping as well, since they are distinguished in residential properties, commercial properties, warehouses and others. For the purpose of this work, only residential properties have been considered; these belong to the ‘A’ category, which is divided into 11 different properties types, from code A01 to code A11. The A10 category refers to offices and has thus been discarded from the analysis.

The first type of data used for the analysis describes the stock of properties and housing units found in each of the OMI areas in the years 2016, 2017 and 2018. Data is available for any of the different property categories, but only values relative to the ‘A’ category have been used.

The data set includes the following variables:

▪ Year;

▪ City;

▪ OMI area;

▪ Property type among each category.

The second type of data provided by OMI has to do with prices. For both sales and rentals, the dataset provides the maximum and the minimum amount registered in the period. Assuming these prices are evenly distributed, an average value has been calculated for every data point. The main variables are the following:

▪ City;

▪ OMI area;

▪ Property type description;

▪ Conservation state: it can either be (on a descending scale) ottimo, normale, scadente;

▪ Minimum and maximum selling price (€/m2);

▪ Minimum and maximum monthly rent price (€/m2).

It is worth mentioning that price values obtained through this dataset are not necessarily accurate. Since the data provided by OMI is the same collected by Agenzia delle Entrate to determine tax payments, there is an incentive for property owners to under report incomes from rentals or transaction values for property purchases to avoid or reduce their tax liabilities. The extent of the phenomenon is difficult to assess so data cannot be adjusted, but it is important to be aware of the distortion.

3.1.3 Idealista

Other real estate data have been provided by Idealista, an online real estate portal and property website used by property owners and real estate agencies to list properties to either sell or rent.

The frequency of data is quarterly, starting from Q1 2012 until Q1 2020 and only average prices are provided. Every city is divided in different neighbourhoods, defined by Idealista, which correspond to areas supposedly homogeneous in characteristics such as the type of buildings, their distance from the city centre and similar parameters. For example, the city of Turin has been divided in 27 neighbourhoods, whereas Milan in 79.

The data provided refer to both rental and selling prices in euros per square meter but, differently from the dataset provided by OMI, the values collected by Idealista are expected to be slightly higher than the actual amount paid at the transaction. Indeed, people may tend to request higher prices to allow a margin for negotiation, because they do not know the market too well, or because the price the buyer (or tenant) is willing to pay is simply lower and the seller (or

landlord) ends up accepting it for a lack of demand. Therefore, it is impossible to determine the amount actually paid by the counterpart, but over time the general trend is likely to reflect the real evolution of transaction prices.

As previously mentioned, Idealista and AirDNA datasets have been merged to create a single database where Airbnb data is linked to real estate data by matching a listing’s location to an Idealista neighbourhood, identified by the variable NeighbourhoodIdealista.

3.1.4 Combining Idealista and OMI Datasets

The values in both the datasets are expected to present a distortion, so it would be interesting to merge the two datasets and see whether the trends reported are the same across the two sets of data. What complicates this process is the fact that both datasets report data for geographical areas that mostly do not correspond and have very different sizes. Indeed, OMI areas present strongly variable dimensions, with areas located in the city-entre much smaller than the ones in the outskirts. As Figure 3.1 shows, however, Idealista areas (outlined in red) are more homogeneous.

Figure 3.1. Idealista neighbourhoods (red boundaries) and Zone OMI (black boundaries) compared.

To establish a relationship between these values, a geographical confrontation is needed.

Since the areas are uneven, many of them do not have boundaries in common, and they often overlap, an approximate solution was found to tackle the problem. The first assumption is that any given OMI is uniform in its characteristics, meaning that the number of residential buildings, and also the number of properties sold or rented in a certain period, is evenly distributed across the district. This means that if a certain area – imagine a 100x100 m2 square – has 100 housing units, these are not concentrated in specific part of the square, there will rather be a housing unit once every 10 meters.

Using Google Earth, it is possible to import the coordinates defining the boundaries of the neighbourhoods defined by both Idealista and OMI and measure their extension (in either m2 or km2) Importing the boundaries from Idealista and overlapping them with OMI’s, it is possible to measure the percentage of an OMI area belonging to an Idealista area by calculating a ratio where the denominator is the total extension of a specific zone and the numerator is the area part of an Idealista neighbourhood. The formula:

These values can be used to populate a matrix where the rows represent OMI areas while the columns report the 27 neighbourhoods defined by Idealista. The numbers in each cell report what percentage of a specific OMI area (on the row) is included in any of the Idealista neighbourhoods (on the columns). For example, Table 1 shows that 77% of OMI area B4 is part of Idealista neighbourhood Centro Storico, while the remaining 23% belongs to Crocetta. It is important that each row’s total is 1, otherwise some data will be lost.

Centro Storico San Salvario Crocetta

B1 1 0 0

B2 1 0 0

B3 1 0 0

B4 0,77 0 0,23

B5 1 0 0

Table 1. Focus on how the OMI/Idealista matrix was populated.

The whole matrix can be found in Exhibit 2, whereas its use will be described in the following sections.

Documenti correlati