Department of Economics and Management
Master of Science in Economics
LM-56
Impacts of Public Transport Improvements on Population Structure:
the case of Paris region (1975-2011)
Martina Brach
Under the joint supervision of Professors
Angela Parenti (UniPi) and Camille Hémet (PSE, ENS)
Table of contents
Introduction
... 3
1. Literature review
... 4
2. General empirical set up
... 7
3. Identification strategy
... 10
4. Data sources and setup
... 12
4.1 Introduction ... 12
4.2 Data Sources ... 12
4.2.1 Travel time ... 12
4.2.2 Geographic information on French municipalities ... 14
4.2.3 Highways ... 14
4.2.4 Socio-economic characteristics of the population ... 14
4.3 Definition and construction of the main variables in the dataset ... 16
4.3.1 Accessibility index ... 16
4.3.2 Entropies ... 18
4.3.3 Percentages of residents by education, sector and socio-professional category... 19
4.3.4 Population density ... 19
4.3.5 Highway accessibility index ... 19
4.3.6 Distance to Paris ... 21
4.3.7 Geographic accessibility index ... 21
4.3.8 Three-group-method instruments ... 21
5. Descriptive statistics
... 22
5.1 Summary statistics ... 22
5.2 Evolution of the variables over time ... 23
5.3 Scatterplots and correlation coefficients ... 24
6. Estimation methods
... 26
7. Regressions results
... 27
7.1 General results ... 27
7.2 Results for the municipalities connected via highways ... 30
7.3 Results for the Petite Couronne ... 31
Conclusions
... 33
Figures
... 35
Tables
... 42
Introduction
During recent decades, the Paris region has seen some major demographic and
socio-economic changes. The evidence suggests that a sub-urbanization process has occurred, as
both the population and the employment have strongly increased in the suburbs, while
moderately increasing in the city of Paris. Moreover, the public transport network has
undergone a great expansion. In our work, we observe the evolution of population and
transports in the Paris region between 1975 and 2011. This period covers the main stages of
the expansion of the regional public transport system: the Regional Express Rail (RER) has
opened progressively between 1969 and 2006, with the major network improvements being
realized in the 1970s and 1980s. The tramway was reintroduced in 1992 and its coverage has
spread progressively with the opening of three further lines between 1997 and 2006 (see Table
1). Our aim is to estimate the effects of public transport improvements on population structure.
We measure the exposure to transport improvements by using an origin-specific,
commuting-time based accessibility index (following Gibbons et al., 2019), while we measure the spatial
distributions of the socio-economic variables (education, sector of activity, profession) by
using an entropy index (following Theil and Finezza, 1971).
Our work is structured as follows: first, we review the literature on the impacts of public
transport improvements on several socio-economic variables. Then, we provide arguments to
support the use of the accessibility index and the entropy index. We describe our identification
strategy, the data sources and set up. Finally, we illustrate the estimation methods and we
comment on the regression results. We find a negative effect of transport improvements on
the education, sector and category entropies. Results change when we focus on some subsets
of municipalities, for which the evidence suggests that an increase in accessibility leads to a
concentration of residents in the tertiary sector and in higher educational groups.
1. Literature review
In this section, we present some studies on the impacts of public transport improvements on
several outcomes. These works differ mainly in two regards: (i) the choice of the treatment
variable; (ii) the identification strategy used to deal with the endogeneity of transports.
Endogeneity may arise due to reverse causality or to the existence of confounding factors (for
example, we may wonder whether transport improvements cause employment growth or the
latter induces transport expansion via an increase in demand). A common solution to the
problem of causality is to use historical roads or plans as instruments. For example,
Baum-Snow (2007) evaluates the effects of new highways on sub-urbanization in the U.S. cities by
using a 1947 plan of the Interstate Highway System. This same plan is used by Duranton and
Turner (2012) to estimate the impact of highways construction on local employment, and by
Michaels (2008) to evaluate the effect of highway expansion on the demand for skilled labor.
1Garcia-López et al. (2017) use the 1870 railroads and the Roman roads in the Paris region to
instrument the contemporary railroads. On the other hand, there are studies where no
instrument is used, but the treatment group is restricted to the units that are “accidentally”
connected to the transport network (Mayer and Trevien, 2017; Ghani et al. 2016). As for the
differences in the definition of the treatment, the exposure to transport improvements has been
alternatively measured in terms of distance to the transport infrastructure (Baum-Snow and
Kahn, 2000;
2Baum-Snow, 2007; Ghani et al. 2016; Garcia-López et al. 2017), travel time
reduction (Mayer and Trevien, 2017), accessibility (Gibbons et al., 2019), length of the roads
in an area (Duranton and Turner, 2012) and whether an area is crossed by a road (Michaels,
2008). In our study, we will use a travel-time based accessibility index. Moreover, we will
estimate the impacts of public transport improvements in the Paris region. For these reasons,
we want to examine closely three of the studies listed above: Mayer and Trevien (2017),
Garcia-López et al. (2017) and Gibbons et al. (2019).
Gibbons et al. (2019) estimate the impact of new road infrastructure on employment and labor
productivity in Britain over the period 1998-2008. They measure the exposure to road
improvements through changes in a continuous index of accessibility. For a given origin
location 𝑗 and at a given time 𝑡, this index measures the accessibility of potential destinations
1
Both the studies make use of additional instruments.
2
Baum-Snow and Khan (2000) estimate the impact of new urban rail transit on usage and housing values in five
U.S. cities that upgraded their rail transit systems during the 1980s. They use the distance from each census tract
to the nearest railroad as their treatment variable. We note that they do not use an instrumental-variable strategy,
but assume the growth in rail transit to be exogenous conditional on a set of control variables.
𝑘 as the weighted sum of their proximity to 𝑗, where proximity is a decreasing function of
minimum journey time along the major road network, 𝑎(𝑡𝑖𝑚𝑒
𝑗𝑘𝑡):
𝐴
𝑗𝑡= ∑
𝑤
𝑘0𝑎(𝑡𝑖𝑚𝑒
𝑗𝑘𝑡)
𝑘≠𝑗The authors define proximity as an inverse time decay and destination weights (𝑤
𝑘0) as the
initial employment levels at the destination.
3Their identification strategy is the following.
First, they restrict their sample to locations 𝑗 within narrow distance buffers around the road
improvements. Second, they drop the closest locations (within 1 km) to minimize biases from
several sources: mismeasurement of travel times
4, potentially adverse effects of new schemes
on proximate locations
5and potential targeting of scheme routes based on local economic
conditions. They focus on moderately-close locations because these are the ones where
accessibility has increased the most following the road improvements.
6The authors’ choice
of a network-based accessibility index as the explanatory variable is part of their identification
strategy, as this index varies continuously over space in ways partly unrelated to the distance
to road improvements.
7Therefore, although the scheme location is likely to be endogenous,
the variation in accessibility changes amongst units close to each scheme may be exogenous.
8More in detail, they assume exogeneity of accessibility changes (i.e. the time-deaveraged
accessibility) conditional on origin-specific fixed effects and on some non-linear time trends
(e.g. interaction terms between time trends and nearest-scheme dummies). Distance to the
nearest scheme is included interactively among the controls. The authors justify their choice
to use the accessibility index with several arguments. First, since the road network was already
developed and dense at the beginning of the period considered, it does not suffice to observe
the binary outcome of being connected (or not) to the road system
9: a measure of the
improvement in transport service quality is needed. Second, given that the road network did
not expand much (increasing by 0.87% between 1998 and 2008), the use of measures such as
“area connected or not” or “kilometers of roads in an area” would generate a lot of zeros when
time-demeaning the data. This is especially the case for “kilometers of road in an area”, as the
3
However, their results are robust to the use of alternative distance decays and destination weights.
4
This is because they compute travel times with respect to the main road network, while minor roads may still
be used close to the trunk road improvement (especially for small-distance journeys).
5
e.g. loss of premises, environmental impact.
6
Hence, these wards are the ones potentially exposed to treatment.
7
For example, if a by-pass is built between locations 𝑗 and 𝑘, no matter how far it is from 𝑗: the minimum journey
time from 𝑗 to 𝑘 will fall by the same amount.
8
Consider two locations 𝑙 and 𝑚, equally distant from a trunk road: whether a by-pass is built closer to 𝑙 or 𝑚 –
with consequences in terms of 𝑙 or 𝑚 accessibility – may be only driven by technical or cost considerations, with
no relation to 𝑙 or 𝑚’s local economic conditions.
areas considered have a small-scale.
10Finally, given the likely endogeneity of scheme location
and the authors’ identification strategy, using “distance to roads” as an explanatory variable
would probably cause the estimator to be biased and inconsistent.
Mayer and Trevien (2017) have assessed the impact of the opening and the progressive
extension of the RER on employment and population growth in the Paris region between 1975
and 1990. They identify the effect of the RER by considering two subsamples of
municipalities: (i) the municipalities located outside the termini areas (i.e. the Paris
department and the towns targeted by the SDAURP plan
11), with at least one commuter-rail
station in 1975; (ii) the municipalities that were connected to the RER as a result of an
exogenous change to the initial RER project. In both cases, the treatment is defined as the fall
in the minimum travel time to central Paris between 1975 and 1990. The authors provide
several arguments for the use of this variable: (i) it accounts for intensive improvements in
the transport network; (ii) it reflects the fact that the RER network was developed to connect
some suburban centers to Paris; (iii) it is strongly correlated with the presence of an RER
station in 1990 but weakly correlated with several city demographic characteristics. This last
point hints at the absence of relevant, unobservable differences between the treatment and
control groups, hence at the correct identification of the causal effect. Conversely, the
presence of an RER station in 1990 is positively associated with the initial values of the
population and employment density. Also, the employment trends differ across municipalities
with and without stations. Therefore, the use of a variable such as “presence of an RER
station” would probably lead to incomparable treatment and control groups even after
controlling for the observables. Turning to the results, the authors find a positive impact of
travel-time reduction on the growth of the high-skilled population in the municipalities with
at least one RER station in 1990. Since higher education can be considered as a proxy for
income (and for the willingness to pay for housing), this finding suggests a gentrification
effect of the RER on the population of the selected municipalities.
Similarly to Mayer and Trevien (2017), Garcia-López et al. (2017) have analyzed the
influence of public transport infrastructures on employment and population growth in the Paris
metropolitan area between 1968 and 2010. They have considered the RER and the transport
modes that might complement or substitute it (i.e. metro, tramway, commuter trains,
10
The cross-sectional unit of analysis is the electoral ward. There are 10,300 wards in Britain with an average
area of 24km
2and population of 6,000.
11
The Schéma directeur d'aménagement et d'urbanisme de la région de Paris (SDAURP) is the urban plan that
envisaged the decentralization of jobs and people to eight “New Towns” in the Paris region and their gradual
connection to the city of Paris via the Regional Express Rail (RER).
highways) so as to obtain unbiased results. Their identification strategy consists in using the
Roman roads and the 1870 railways as exogenous sources of variation for the endogenous
transport variables. They find that, on average, each additional kilometer closer to an RER
station increases employment growth by 2% and population growth by 1%. The RER effects
are heterogeneous along several dimensions: (i) across space, as larger effects are found for
the municipalities
that are closer to an RER station and to an employment subcenter; (ii) over
time, as the effect on employment growth emerges only after 1990, while the one on
population growth decreases in magnitude after 1990; (iii) across population types, as the RER
has larger effects on the residents with high school and university degrees; moreover, it only
affects some categories of workers (executives and intellectual workers, intermediate workers,
employees and factory workers), in the Industry and Service sectors.
2. General empirical set up
In our work, we measure transport improvement by using an origin-specific, commuting-time
based accessibility index (Gibbons et al., 2019), while we measure the spatial distributions of
the socio-economic variables (education, sector of activity, profession) by using the entropy
index (Theil and Finezza, 1971). In what follows, we provide some arguments in support of
the use of these variables.
We start with the accessibility index. Alternative treatment variables may be: (i) a dummy for
the presence of a commuter-rail station (similarly to Michaels, 2008); (ii) a set of dummies
accounting for the opening, closure and presence of stations (mimicking the entry, exit and
membership dummies in Neffke et al., 2011
12); (iii) the distance to the closest commuter-rail
station (as in Baum-Snow and Kahn, 2000; Baum-Snow, 2007; Ghani et al. 2016;
Garcia-López et al. 2017); (iv) the change in the minimum travel time to Paris (Mayer and Trevien,
2017); (v) the kilometers of railroads in each municipality. However, none of these variables
seems appropriate in the context of our study.
In general, the advantages of commuting-time based measures on variables such as “presence
of a station” or “distance from the nearest station” are that: (i) the formers allow accounting
for differences in treatment intensity and transport modes (RER, metro, tramway, train), thus
providing a better representation of transit improvements in a complex network; (ii) the latter
are more likely to be endogenous, due to the targeting of transport scheme location. Instead,
a reduction in travel time may be partly unrelated to the distance from the origin location to
the closest transport improvement. Moreover, the actual development of the RER network
mainly consisted in the upgrade of some existing rail lines, implying that the treatment cannot
be identified by the mere existence of a connection.
In particular, the advantage of using the accessibility index rather than a variable such as
“change in the journey time to Paris” is that we make no assumptions on the desirable direction
of transport improvement. Indeed, depending on the place of work of commuters, a fall in the
travel time from a peripheral location to an employment subcenter may be even more valuable
than a fall in travel time from that same location to central Paris. Finally, since municipalities
are very small (9.3 km on average), the stock of railroads in each municipality probably does
not vary a lot over time, implying that this variable would not be a suitable treatment.
As for the entropy index (or “entropy score”), this is a statistic that measures the deviation of
a given distribution from complete concentration (minimum entropy) or dispersion (maximum
entropy). It has been used in a variety of contexts, including the measurement of racial
segregation (Theil and Finezza, 1971) and industrial variety (Frenken et al., 2007). For
example, Jacquemin and Berry (1979) write that most measures of corporate diversification
take the form of a weighted sum of the number of the firm’s products, where weights reflect
the relative importance of each product within the firm’s total product mix. Two of the most
commonly used measures of corporate diversification are the (inverse) Herfindahl index and
the entropy measure. Denoting the share of product 𝑖’s contribution to the firm total output as
𝑃
𝑖(with 𝑖 = 1, … , 𝑛), the Herfindahl index is 𝐻 ≡ 1 − ∑ 𝑃
𝑛𝑖 𝑖2, while the entropy measure is
𝐸 ≡ ∑ 𝑃
𝑛𝑖 𝑖ln 1/𝑃
𝑖. In words, the Herfindahl index weights each product share by itself, while
the entropy measure weights each product share by the logarithm of its inverse. Entropy attains
its maximum value under conditions of perfect diversity/dispersion (i.e. 𝑃
1= 𝑃
2= ⋯ = 𝑃
𝑛=
1/𝑛) and its minimum value under perfect specialization (i.e. where one of the 𝑃
𝑖= 1 and the
remainders are zero). The main advantage of entropy over the Herfindahl index is that entropy
can be decomposed into additive elements (i.e. between- and within-group components) that
define the contribution of diversification at different levels of product aggregation to the total
(see Reardon and Firebaugh, 2002).
13Entropy can also be used to measure industrial diversity
at the regional level. In this case,
𝑛 still denotes the number of industry classes, while 𝑃
𝑖denotes the proportion of total employment of the region that is in the 𝑖-th industry. In our
13
Although thisis not our case, we note that this feature of the entropy measure can be used to run regressions
that include both the within- and between- components without running into collinearity.
work, we do something similar, as we use entropy to measure the dispersion of workers across
sectors, socio-professional categories and educational groups at the municipal level. However,
we note that – due to data unavailability – we could not exploit the decomposability property
of the entropy index (e.g. we could not measure the dispersion of professions within each
socio-professional group).
Based on our specification of treatment and outcome variables and on the results of some
studies analyzed in the previous section (Mayer and Trevien 2017; Garcia-López et al. 2017),
we may expect an increase in accessibility to induce an increase in the percentage of graduates,
thus leading to a fall in the education entropy. If this is the case, we should also detect a
positive effect of accessibility on the percentage of intellectual workers. Since higher
education is usually associated with higher incomes, that would hint at a “gentrification effect”
of public transports. However, Garcia-López et al. (2017) also find that the RER has a positive,
large effect on employment growth in the Service sector. Since this sector is characterized by
a polarization between low- and high- skilled workers, we may find instead a positive effect
of accessibility on both the low- and high-educated. In this case, the final effect on the
education entropy would be ambiguous. Finally – and despite the previous considerations –
we may simply expect new transport lines to increase workers' spatial dispersion and
heterogeneity. Indeed, more accessible municipalities may attract the people who work in
connected locations. This would amount to a positive effect of accessibility on the entropies.
3. Identification strategy
The main challenge of estimating the causal effects of public transport improvements on
socio-economic variables is that new connections may be built to meet transport demand in
growing places or - vice versa - to boost the economic activities in deprived areas. Therefore,
the treatment variable is likely to be endogenous and to induce the biasedness and
inconsistency of the estimates. In this respect, the case of the Paris region is paradigmatic, as
the RER network was developed with the aim of supporting the economic and demographic
growth of the suburbs.
14As a consequence, the internal validity of our estimates is threatened
by the potential endogeneity of the scheme location. We have seen that Mayer and Trevien
(2017) have addressed this issue by restricting the observations to the municipalities which
were not explicitly targeted by the plan. However, these municipalities are only 96-98 out of
1300 in the Paris region. In our work, we have dealt with endogeneity concerns differently.
Our identification strategy is structured along three lines of argument.
First, following Gibbons et al. (2019), we have measured the exposure to public transport
improvements through a travel-time based accessibility index. The use of this variable should
mitigate the endogeneity problem, as changes in the accessibility index are partly unrelated to
location-specific characteristics which may jointly influence the railroad construction and the
outcome variables. We have also repeated the estimates on a restricted set of municipalities
so as to control for the highway accessibility, which was also improved as a result of the
SDAURP plan.
Second, we have used an instrumental variable strategy. Both the accessibility index and the
highway accessibility index were instrumented by using: (i) an ad-hoc 3-group method
instrument; (ii) a “geographic” accessibility index. The former is an instrumental variable that
takes value −1 if the regressor value is below its 1
sttertile, +1 if above its 2
ndtertile and 0 if
in the middle third. As detailed in Kennedy (2008), the use of group-method instruments
allows to average measurement errors
15, thus reducing their impact on the estimates. The latter
is a sort of unweighted accessibility index, defined as the sum of the inverse inter-centroid
distances. The rationale for using this instrument is that it should be related to the (potentially)
14
Indeed, the RER project was launched by the SDAURP plan (1965), which envisaged the creation of eight
new towns and their gradual connection to the city of Paris via new railroads and highways. Many of the road
schemes have been carried out, namely the completion of the Boulevard périphérique, building the A86, and
building a number of roads into Paris. However, due to a revision of the SDAUP plan in 1980, the project of a
third bypass around Paris has been abandoned (Source: Dagnaud, 1983).
endogenous variable, while being unrelated to transport policies. We point out that we have
tested the relevance of the instruments.
Third, based on the results of several diagnostics tests, we have estimated (almost) all the
models by including both year- and municipality-effects (more details will be provided in
Section 6). From an econometric perspective, fixed effects allow controlling for omitted
variables that are correlated with the regressors and which affect the outcomes, thus reducing
the risk of incurring an Omitted Variable Bias.
Even so, the Omitted Variable Bias remains the most likely source of endogeneity for our
estimates. Indeed, due to data unavailability, we could not control for differences in local
wealth. To partially address this issue, we have repeated the regressions analysis on the
municipalities in the Petite Couronne, i.e. the inner suburban ring of Paris. These
municipalities are more homogenous than the others
16and are at similar distances from Paris
(i.e. the economic center of the region), so, hopefully, they also display similar economic
trends. If this is the case, the omission of GDP per capita is not as serious as for the whole set
of observations. Moreover, the advantage of focusing on these municipalities is that the
treatment and control groups are of similar size: the number of municipalities whose
accessibility has increased between 1975 and 2011 is similar to the number of municipalities
that have experienced no transport improvements (Figure 5). Vice versa, the full set of
municipalities is characterized by a disproportion between the treatment and the control
groups, as the former is much smaller than the latter.
1716
e.g. in the Petite Couronne, population densities are not as diverse as in the rest of the Paris region.
17
Note that there is no clearcut distinction between “treatment” and “control” groups, as the treatment variable
is continuous. However, accessibility only increases over time in the case of transport improvements: in this
sense, we talk about “treated” and “untreated” municipalities.
4. Data sources and setup
4.1 Introduction
In our work, we observe the evolution of population and transports in the Paris region between
1975 and 2011. This period covers the main stages of the expansion of the regional public
transport system: the five RER lines opened progressively between 1969 and 2006, with the
major network improvements being realized in the 1970s and 1980s. One new metro line
opened in 1998 and seven existing lines were extended between 1981 and 2011. The tramway
was reintroduced in 1992 and its coverage spread progressively with the opening of three
further lines between 1997 and 2006. Conversely – and despite the opening of some new
sections between 1974 and 2004 – the train network slightly contracted (see Table 1).
Our unit of analysis is the municipality. The municipality is the smallest administrative
division in France and also the smallest unit for which Census data are available (smaller
divisions were only introduced in French statistics in the 1990s). In Île-de-France, there are
1300 municipalities (including the 20 Parisian arrondissements), with an average surface of
9.3 km
2. Municipality boundaries have rarely changed over time: between 1975 and 2011,
only 13 municipalities have experienced some geographic event (e.g. separation from another
municipality, exchange of plots with inhabitants). We removed these municipalities from the
dataset, thus reducing their number to 1287.
There are 6 years of observations, corresponding to as many census waves: 1975, 1982, 1990,
1999, 2006 and 2011.
18In what follows, we present the sources we used and the way we
constructed the main variables in the dataset. Table 2 provides the full list of variables in the
dataset, while Table 3 lists the municipalities that were removed.
4.2 Data Sources
4.2.1 Travel time
Travel time data were kindly provided by Mayer and Trevien (2017), who evaluated the
impact of the expansion of the RER on firms, employment and population growth in the Paris
region between 1975 and 1990. These data contain information on the minimum journey time
(in minutes) between every pair of stations in the Paris region for all the years between 1975
and 2013.
19Travel time was indeed computed with respect to the urban rail transport (metro,
train, tramway and RER). Calculations were based on data from the Institut Paris Region (ex
IAU) and from the two main public transport operators in Ile-de-France, i.e. the Régie
Autonome des Transports Parisiens (RATP) and the Société Nationale des Chemins de fer
Français (SNCF).
20In particular, the following elements were considered: (i) the opening and
closure years of stations and rail-line segments, (ii) the connection time (on foot) between
different lines at a given station and (iii) the presence of pedestrian passageways connecting
stations. Regarding this last point, it should be pointed out that passageways were assumed to
be always open. Therefore, it is possible to find the travel time between stations in years where
one or both the stations were closed, if just a passageway exists connecting the closed stations
to some active station.
21On average, 3% of observations concerning not-yet-existing stations
have some non-NA travel time values, implying the existence of passageways. For any given
track, the authors also assumed the train speed to be constant over time, and particularly to be
equal to its 2013 value. This assumpion was due to the absence of historical data on speeds
and to their choice of basing their computation on 2013 timetables. An important implication
of this is that time variability fully comes from openings and closures, thus being very low:
32% of pairs of stations are associated with the same travel time throughout 40 years of
observation, while 46.6% of observations experienced only one change in their travel time
value. For 14.8% observations, travel time changed twice, and for just 5.1% observations, it
changed 3 times. The maximum number of changes is 7, concerning just 46 observations.
A few more words must be said about directionality. For any pair of stations A and B, the
dataset contains the travel time from A to B and vice versa. Given that in Île-de-France there
are 809 stations, this amounts to have 653,672 observations. Is travel time symmetric? i.e. is
the travel time from A to B always equal to the travel time from B to A? The answer is
negative: given 326,836 unique combinations of stations, 68,5% have different travel times
according to the direction. As explained by the authors, this occurs for two reasons: (i) some
lines have different paths depending on the direction (metro 10, metro 7 bis); (ii) travel time
includes the waiting time at stops, where waiting time depends on the transportation type: it
is set to 2 minutes for metro and tram, 4 minutes for RER, 6 for suburban trains. In the
algorithm, there is no waiting time for the first ride of a route, but only for subsequent rides.
19
This is indeed the updated version of their dataset.
20
As pointed out in the French version of their paper: Mayer and Trevien (2016).
21
For example, the dataset contains information on the travel time from station 1 (“Breval”) to station 151
(“Musée de Sèvres”) in 1969, although station 151 only opened in 1997. This is because station 151 is connected
by a passageway to station 156 ("Pont de Sèvres"), where station 156 opened in 1934. (see
https://fr.wikipedia.org/wiki/Pont_de_S%C3%A8vres_(m%C3%A9tro_de_Paris
, retrieved on 01/04/2020).
For example, given a route composed of an RER ride followed by a metro ride, waiting time
will be 2 minutes, and 4 minutes for the symmetric route. Accordingly, when aggregating
travel time data at the municipal level, we took account of directionality. Details on this
process and the construction of the accessibility index can be found in Section 4.3.1.
4.2.2 Geographic information on French municipalities
To estimate geographical distances and conduct some point-in-polygon queries, we relied on
GEOFLA® Communes (Version 1.1). This is a SpatialPolygonsDataFrame on French
municipalities, available on the website of the Institut National de l'information Géographique
et forestière
(IGN). The dataset contains information on the centroids and the extension of
each municipality. The coordinate reference system is RGF93 Lambert 93.
4.2.3 Highways
Spatial data on highways and highway ramps in the Paris region can be found on the website
of the Institut Paris Region (IAU). For our purposes, we used a SpatialLinesDataFrame on
highways and express roads, and some SpatialPointsDataFrames representing highways
ramps in the years 1975, 1982, 1990, 1999 and 2010. All these spatial objects have the same
projection (RGF93 Lambert 93). Inter alia, the associated data contain information on the
opening years of each highway section.
4.2.4 Socio-economic characteristics of the population
Data on population characteristics come from the French census. Census data are collected by
the French statistical institute (INSEE), which processes them in two phases: the first phase
or “principal processing” covers all of the bulletins collected. The second phase or
“complementary processing” provides additional information on socio-professional
categories, economic sectors of activity and education levels; this phase only concerns a
sample of the collected questionnaires (INSEE, 2019). The sample covers 25% of households
in municipalities of less than 10,000 inhabitants, and around 40% of households
22(i.e. 100%
of the forms collected) in municipalities of 10,000 inhabitants or more. Therefore, the
precision of the estimates depends both on the type of municipality (less than 10,000
inhabitants v. 10,000 inhabitants or more) and on the type of processing (main v.
complementary).
In our study, we relied on the information derived from the complementary processing of
census data from 1975 to 2011.
23These data were harmonized by the INSEE so as to be
comparable over time
24and contain information about the educational level, the sector of
activity and the socio-professional category of residents aged 24-54. The educational level
corresponds to the highest degree obtained by the individual; it takes one of four values: (i)
“edu1”: middle school diploma (BEPC, brevet des collèges, DNB) or lower; (ii) “edu2”:
pre-baccalaureate vocational diploma (Diplôme de niveau CAP, BEP); (iii) “edu3”: high school
diploma (bac général, techno, pro); (iv) “edu4”: a university degree or a vocational
post-secondary degree (Diplôme d'études supérieures). The sector of activity refers to the
establishment where the individual works (or worked). There are four sectors: Agriculture
(AGR), Construction and Public Works (BTP - Bâtiment et travaux publics), Industry (IND)
and Services (TERT - including public services). Finally, the socio-professional category
depends on the profession, the position and the status (employee or self-employed) of the
individual. For the unemployed workers, the socio-professional category is determined with
reference to the main profession carried out in the past. Based on the French classification of
occupations (PCS, 2003), there are six socio-professional groups: (i) farmers; (ii) craftsmen,
tradesmen and entrepreneurs: business owners, either working alone or employing a small
number of workers, in a field where manual work is important (e.g. commerce and restaurant
services); (iii) executives and higher intellectual professions: scholars, doctors, lawyers,
engineers, public officials and other high-skilled workers; (iv) intermediate professions:
workers who hold an intermediate-level job (e.g. teachers, nurses and social workers); (v)
employees; (vi) workmen: skilled and unskilled laborers, drivers and other workers in
handling, storage and transport. We note that some categories
25are less numerous than others,
implying that the precision of the estimates varies across the professions.
23
In detail, we used four datasets: (i) “Resident population by five-year age groups and sex”; (ii) “Active
population aged 25-54 by employment status”; (iii) “Employed and unemployed with past job experience, aged
25-54 and having completed their education, by socio-professional category and educational level”; (iv)
“Employed population, aged 25-54, by sector of activity” (translation mine).
24
e.g. in 2006, the Census survey was modified so as to cover mixed employment situations (e.g. students having
a "gig", or retirees who continue to work). These situations were not considered by previous censuses. So, to
ensure comparability of census data before and after 2006, the observations were restricted to the population
aged 25-54, where these cases are less frequent.
4.3 Definition and construction of the main variables in the dataset
4.3.1 Accessibility index
We define the accessibility of a municipality 𝑗 at time 𝑡 as the weighted sum of the inverse
average travel time from
𝑗 to destinations 𝑘 at time 𝑡, with weights equal to the initial
employment rate at destination:
𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥
𝑗𝑡= ∑
𝑒𝑚𝑝𝑙
𝑘,1975(𝑡𝑖𝑚𝑒
𝑗𝑘𝑡)
−1 𝑘≠𝑗We used employment weights to attach greater importance to municipalities with larger labor
markets, while we used initial values to avoid endogeneity. The drawback of using initial
values in the presence of a long period of observation is that the initial employment situation
is likely to be profoundly different from the one observed at the end of the period. For
example, Garcia-López et al. (2017) identified the employment subcenters in the Paris
metropolitan area in 1968 and 2010 (where a subcenter is defined as an area with significantly
higher employment density than the nearby locations). Out of a total of 37 subcenters
identified, only 12 were present both in 1968 and 2010, while the remainders had either
disappeared or changed their extension, or were totally new. Returning to our index, we used
three types of weights, thus generating three versions of accessibility index: the (general)
employment rate and the employment rates specific to the secondary and tertiary sectors. All
the employment rates were computed with respect to the resident population aged 25-54.
The first step to building the accessibility index was to aggregate travel time data at the
municipal level.
26With this regard, stations located at the border between two or more
municipalities were treated as belonging to multiple municipalities.
27For every year 𝑡, given
two municipalities 𝑗 and 𝑘, we computed the travel time from 𝑗 to 𝑘 as the average travel time
from 𝑠
𝑗to 𝑠
𝑘, where 𝑠
𝑗and 𝑠
𝑘are generic stations in 𝑗 and 𝑘 respectively. Obviously, it was
not possible to compute the travel time between municipalities with no stations. In these cases,
we replaced missing values with the approximated travel times by bus, i.e. the inter-centroid
distances divided by the average bus speed, converted to minutes. The inter-centroid distances
were calculated from the Communes shapefile by using the Euclidean metrics. We did not use
the Great Circle Distance metrics because the shapefile projection (RGF93 Lambert 93) was
26
Note that only 344 out of 1287 municipalities in Île-de-France are endowed with stations.
27
These stations are: Gare de Gennevilliers, Charles de Gaulle - Étoile, Porte de Vanves, Duroc, Montparnasse
- Bienvenüe, Gare de Paris-Saint-Lazare, Nation, Gare de Châtelet - Les Halles, Bastille, Porte de Vincennes,
Gare du Raincy - Villemomble - Montfermeil, République.
not compatible with the rdist.earth() function. However, since the Paris region is just 12’011
km
2wide, we believe the Euclidean distances to provide a good approximation of true
geographical distances.
As for bus speeds, a small digression is needed. We built four distinct versions of the
accessibility index, based on four plausible bus speeds: 8, 10, 15 and 20 km/h.
28These speed
values were chosen in that they are often reported on official documents on the state of public
transport in the Paris region. Given the fragmented nature of the documentary sources, we
tried to adhere as close as possible to 2013 speeds, i.e. we tried to maintain the parallelism
with the use of 2013 train speeds in travel time computation. Our main reference is a study
from the Centre d’études sur les réseaux, les transports, l’urbanisme et les constructions
publiques (CERTU, 2004), which is also cited in another official report (CEREMA, 2018): it
estimates that a bus travels at 10-20 km/h during peak hours, with frequent slowdowns in
high-traffic areas. Other studies provide similar results: the Observatoire de la Mobilité du
2014 (IFOP, 2014) reports that “on a dedicated lane, a bus travels at 22 km/h, while it does
not exceed 17 km/h in automobile traffic”; the Centre national de la fonction publique
territoriale
29writes that the commercial speed of a bus is generally between 12 and 16 km/h
in dense urban areas, and can increase up to 18 km/h in suburban areas. Finally, the City of
Paris informs that the average driving speed observed in Paris in 2013 was 15,3 km/h. This
provides an “upper bound” value for the speed of buses (generally slower than cars). As for
the “lower bound” value in our dataset (8 km/h), a study carried out in 2016 reveals that for
43 bus lines in Paris, commercial speeds were far below 10 km/h (Trans-Missions and TTK,
2016). Similarly, in Les comptes des transports en 2013 (CGDD, 2015), the average speed of
public transport means (including faster means such as RER and trains) in Paris and
contiguous departments is estimated at 8.81 km/h.
30So, how to orient ourselves among all
these values? When running regressions, we will focus on 𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥10, i.e. the version of
accessibility based on a speed of 10 km/h. The reasons why we chose this value are threefold:
(i) based on the official documents, it seems the most likely estimate, (ii) it is neither too
pessimistic nor too optimistic, and (iii) it is mentioned as the lower bound for bus speed during
peak hours (when people either go to work or return home from work); this last point is
particularly important, as we are interested in commuting patterns. To conclude, we note that
28
This brings the number of versions of the accessibility index to 12 (3 weights x 4 speeds).
29
Source:
https://www.wikiterritorial.cnfpt.fr/xwiki/wiki/encyclopedie/view/Mots-Cles/Vitessecommerciale
,
retrieved on 01/04/2020.
30
More recently, the IdFM has complained about a drop of commercial bus speeds to 8 km/h in Paris
intra-muros (
http://www.leparisien.fr/info-paris-ile-de-france-oise/transports/grand-paris-des-bus-la-region-presse-la-capitale-d-accelerer-ses-travaux-20-06-2018-7784217.php
, retrieved on 01/04/2020).
there is one type of bus which is faster than the others: this is the Bus à Haut Niveau de Service
(BHNS), characterized by dedicated lanes, priority at intersections and greater interstation
distances. To date, there are four BHNS in the Paris region: the Trans-Val-de-Marne (TVM),
opened between 1993 and 2007; the line 91-06, opened in 2009; the T-Zen 1 and the line 393,
both opened in 2011.
31These buses have an average commercial speed of 23-48 km/h.
32Therefore, in the presence of a BHNS connection between municipalities, the assumption on
a speed of 10 km/h would be misleading. We verified that the municipalities connected via
BHNS are all endowed with stations, implying that we don’t need to account for this higher
speed: indeed, travel time data are available and have been computed with respect to an even
higher speed (about 60 km/h)
33.
4.3.2 Entropies
Given a categorical variable with
𝑀 groups, the entropy index (or “entropy score”) is the
weighted sum of the proportions of units belonging to each group (𝜋
𝑚), where weights are the
natural logs of these inverse proportions:
𝐸 = ∑ 𝜋
𝑚∙ log(1/𝜋
𝑚)
𝑀𝑚=1
Following Reardon and Firebaugh (2002), we treated 0 proportions as follows:
0 ∙ log (
1
0
) = lim
𝜋→0𝜋 ∙ log (
1
𝜋
) = 0
We computed one entropy for each socio-economic variable, thus obtaining an Education,
Sector and Category entropy. The Category entropy is the only one with
𝑀 = 6, while the
others have 𝑀 = 4. Note that the Education and Category entropies are defined on a different
population than the Sector entropy: the formers are computed with respect to the population
of those aged 25-54, employed or unemployed and having completed their studies, while the
latter refers to the employed population aged 25-54.
31
See:
https://www.data.gouv.fr/fr/datasets/lignes-de-bus-a-haut-niveau-de-service-bhns/
,
retrieved
on
01/04/2020.
32
Sources: Trans-Missions and TTK (2016), p.16; Délibération n.2010/0113, Syndicat des Transports
d’Ile-de-France, p.32; wikipedia.fr (
https://fr.wikipedia.org/wiki/Ligne_de_bus_RATP_393,
https://fr.wikipedia.org/wiki/T_Zen
; both retrieved on 01/04/2020).
33Source: CGDD (2015), p.26.
4.3.3 Percentages of residents by education, sector and socio-professional category
To better understand the impact of transport improvements on population structure, we built
some group-specific outcome variables. These are the rates associated with each of the
possible values of the education, sector and category variables:
𝑒𝑑𝑢1_𝑅𝐴𝑇𝐸, 𝑒𝑑𝑢2_𝑅𝐴𝑇𝐸,
𝑒𝑑𝑢3_𝑅𝐴𝑇𝐸 and 𝑒𝑑𝑢4_𝑅𝐴𝑇𝐸; 𝐴𝐺𝑅_𝑅𝐴𝑇𝐸, 𝐵𝑇𝑃_𝑅𝐴𝑇𝐸, 𝐼𝑁𝐷_𝑅𝐴𝑇𝐸 and 𝑇𝐸𝑅𝑇_𝑅𝐴𝑇𝐸;
𝑓𝑎𝑟𝑚𝑒𝑟𝑠_𝑅𝐴𝑇𝐸, 𝑎𝑟𝑡_𝑅𝐴𝑇𝐸,
34𝑖𝑛𝑡𝑒𝑙𝑙_𝑅𝐴𝑇𝐸, 𝑖𝑛𝑡𝑒𝑟𝑚_𝑅𝐴𝑇𝐸 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒_𝑅𝐴𝑇𝐸 and
𝑜𝑢𝑣𝑟_𝑅𝐴𝑇𝐸. All these percentages were computed on the resident population aged 25-54,
i.e., the same reference population as for the weights used in the accessibility index.
4.3.4 Population density
This is the number of residents per square kilometer. We used data on the total resident
population (not just the 25-54 age cohort). Data on population come from the INSEE, while
those on areas come from the IGN.
4.3.5 Highway accessibility index
We used the spatial data on highways and highway ramps to compute a highways-specific
accessibility index:
𝐻_𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥
𝑗𝑡= ∑
𝑒𝑚𝑝𝑙
𝑘,1975(𝐻_𝑡𝑖𝑚𝑒
𝑗𝑘𝑡)
−1 𝑘≠𝑗where
𝑗 and 𝑘 are municipalities, 𝑒𝑚𝑝𝑙
𝑘,1975are the usual (general and sector-specific)
employment weights, and 𝐻_𝑡𝑖𝑚𝑒
𝑗𝑘𝑡is the average travel time by highways from 𝑗 to 𝑘 at
time 𝑡.
To compute the journey time, we used the average speeds reported in V-Traffic (2015). The
latter is a study on the state of road traffic in Île-de-France in 2014. It provides detailed
information on the average speed attained during peak hours (7.00-10.00 a.m., 4.30-7.30 p.m)
on the Boulevard périphérique and on 9 express roads leading to Paris (s.c. Grands axes).
35The average speed observed on the Boulevard périphérique is 37.75 km/h, while on the
34
Or “𝑎𝑟𝑡_𝑐𝑜𝑚𝑚_𝑐ℎ𝑒𝑓_𝑅𝐴𝑇𝐸”; this is the percentage of business owners.
35
The Boulevard périphérique is the ring road bordering the city of Paris. The Grands axes are the roads A1,
A3, A4, A6a, A6b, A13, A15, A14 and N118. Note that the highways data contain information on additional 32
roads, so, we are assuming that the speeds observed on the Grands axes are a good approximation for the speed
on these further roads.
Grands axes it is 80.25 km/h. Note that we refer to 2014 values for consistency with the speeds
used for travel time computation (based on 2013 timetables).
Once we defined the speeds, we used the information on the opening years of road sections to
derive a highway network for every year of observation (1975, 1982, 1990, 1999, 2006, 2011).
As for ramps, we lacked data for 2006 and 2011. Since highways did not change after 2006,
we assumed the ramps observed in 2010 to be the same as the ramps in 2006 and 2011.
For every year, we integrated points (ramps) into networks (highways) by using the function
points2network() in the shp2graph package. Integration was made possible by the fact that all
the shapefiles had the same projection. Since ramp points were almost overlapping with
highways, we mapped each ramp to the nearest point on the network, adding it as a node if it
was not.
36As a part of this process, we computed the length (in kilometers) of each edge of
the newly generated network.
37We converted edge lengths into travel time (expressed in
minutes) by using a different speed according to whether edges belonged to the Boulevard
périphérique (37.75 km/h) or not (80.25 km/h).
Then, we identified the municipalities where ramps were located by doing a point-in-polygon
query. We converted the network to an igraph graph and we assigned to each edge a weight
equal to the associated travel time. We used the Djikstra algorithm to compute the length of
the shortest paths between any two ramps in the network. The algorithm returned two
distances for every pair of ramps, depending on directionality. However, since the graph was
undirected, these distances were symmetric.
38We restricted the observations by removing
distances between ramps falling within the same municipality, then, we aggregated travel
times at the municipal level. Note that the municipalities connected via highways are about
5% - 10% of the total, where their exact number varies from year to year.
39To make
accessibility indices comparable over time, we restricted the sample to the 65 municipalities
that are connected via highways throughout the period of observation (about 5.05%). Finally,
we derived the highway accessibility index, i.e., for each municipality
𝑗 and each year of
observation 𝑡, we computed the weighted sum of the inverse travel time from 𝑗 to any other
municipality 𝑘 at time 𝑡.
36
The topology of the network was preserved throughout this process, meaning that no graphical simplifications
were made.
37
We verified the correctedness of the distances returned by the algorithm by checking Google Maps distances
for some randomly picked routes.
38
This marks a difference from the (public transport) accessibility index, where travel times are not symmetric.
39The municipalities connected via highways are 69 in 1975, 87 in 1982, 99 in 1990, 120 in 1999 and 126 in
2006-2011.
4.3.6 Distance to Paris
We computed three versions of the distance from each municipality to Paris:
1. Border-to-Border distance: minimum distance between the municipality border and
the Paris border (distance is 0 for Paris arrondissements and for the municipalities
bordering on Paris)
2. Centroid-to-Border distance: minimum distance between the municipality centroid
and the Paris border (distance was set to 0 for Paris arrondissements)
3. Centroid-to-Centroid distance: distance between the municipality centroid and the
centroid of the closest Paris arrondissement (distance is 0 for Paris arrondissements)
Distances were computed based on the Communes shapefile. This variable was conceived as
a control for our regressions, i.e. it should have been part of an interaction term with the
accessibility index. The idea was to check whether the effects of the accessibility varied
depending on the distance to Paris. However, we anticipate that we had to drop this variable
from the regressions due to severe multicollinearity.
4.3.7 Geographic accessibility index
This is one of the two instruments used for the accessibility index. The geo-accessibility index
is defined as the sum of the inverse inter-centroid distances:
𝐺𝐸𝑂__𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥
𝑗= ∑
(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑗𝑘𝑡)
−1 𝑘≠𝑗Since we only considered “stable” municipalities, this index is fixed over time.
404.3.8 Three-group-method instruments
We built a 3-group method instrument for each of the potentially endogenous regressors, i.e.
the (public transport) accessibility index and the highway accessibility index. Given an
endogenous variable 𝑥, the 3-group method variable is:
𝐼𝑉(𝑥) = {
−1 𝑖𝑓 min(𝑥) ≤ 𝑥 < 𝑇
1(𝑥)
0 𝑖𝑓 𝑇
1(𝑥) ≤ 𝑥 < 𝑇
2(𝑥)
1 𝑖𝑓 𝑇
2(𝑥) ≤ 𝑥 ≤ 𝑇
3(𝑥)
40
Indeed, any change to the boundary of a municipality can affect the position of its centroid, hence the distance
from that municipality to the others. Since we removed such unstable municipalities, both the centroid positions
and the inter-centroid distances are fixed over time.
5. Descriptive statistics
We are now going to do a descriptive statistical analysis of our data. We first provide some
summary statistics, and we outline the evolution of the main variables in the dataset. Then,
we present the relationships between the variables by using scatterplots and correlation
coefficients.
5.1 Summary statistics
The dataset contains 7722 observations, corresponding to 1287 municipalities observed over
6 years. There are no missing values except for those in the highway accessibility index, which
was computed for 65 municipalities only.
41All the entropies and rates assume value 0 for at
least some observations. These zero-valued observations will be discarded when estimating
the log-linear regressions; to keep them instead, we have transformed each variable by
replacing 0s with the smallest strictly positive value divided by 10.
42According to the statistics
in Table 4, municipalities are on average very small (9.3 km
2) and are at the maximum 80-86
km from Paris. Entropies attain their maximum at 𝑙𝑜𝑔(𝑚), where 𝑚 is the number of groups
entering the definition of each entropy.
43Also, 𝑙𝑜𝑔(𝑚) is the value corresponding to perfect
dispersion: this means that there are municipalities whose population is equally distributed
among all the educational, sector or socio-professional groups. On average, the Category and
Education entropies are quite high (about 80% of their maximum value), while the Sector
entropy is not (about 58% of its maximum). This is because workers are largely concentrated
in the tertiary sector (67%), while they are pretty evenly distributed among the educational
and socio-professional groups. With this regard, one exception are farmers (3%) and business
owners (7%), who represent only a small percentage of the total.
44Given that the Category
entropy was defined with respect to a higher number of groups than the Education and Sector
entropies, we would have expected the former to have a higher variability than the latters, but
in fact, all the entropies have similar standard errors. This implies that the coefficients of the
accessibility index in the regression models for the Education, Category and Sector entropies
are directly comparable. For some rates, the maximum value is greater than 100 (ranging from
103 to 110).
45This occurs because the values at the numerator and denominator of each rate
41
i.e. the municipalities which have always been connected via highways between 1975 and 2011.
42
In Tables 2 and 4, the original entropies and rates are identified by the suffix “_0”. Note that the descriptive
statistics were computed with respect to these original versions.
43
Where 𝑚 = 6 for the Category entropy and 4 for the Sector and Education entropies.
44NB: these percentages are the average values of the shares used in entropy computation.
45𝑒𝑑𝑢1_𝑅𝐴𝑇𝐸, 𝑒𝑑𝑢2_𝑅𝐴𝑇𝐸, 𝑒𝑑𝑢3_𝑅𝐴𝑇𝐸, 𝑖𝑛𝑡𝑒𝑟𝑚_𝑅𝐴𝑇𝐸 and 𝑇𝐸𝑅𝑇_𝑅𝐴𝑇𝐸.
are estimates (indeed, they are the result of the complementary processing of Census data).
So, in case of very high rates, even slight deviations of the estimates from the true values can
cause the resulting rate to be higher than 100%. Finally, to get an idea of the scale of the
accessibility index, note that
𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥10 ranges from 250 to 1090, while
𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥10_𝐼𝑁𝐷 and 𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥10_𝑇𝐸𝑅𝑇 cover smaller intervals. These differences in
scales are due to the fact that the “general” accessibility index has larger weights than the
sector-specific ones.
5.2 Evolution of the variables over time
The maps in Figures 1 and 2 show the distribution of the accessibility index in the initial and
final years of the period of observation (1975-2011). Despite the evident change in the
accessibility of the municipalities which were gradually connected to the transport network,
the average accessibility has very slightly increased between 1975 and 2011 (Figure 6.a). The
same holds for the highway accessibility index, whose evolution is displayed in Figures 3, 4
and 6.b. Entropies have followed distinct trends: the Sector entropy has decreased, the
Education entropy has increased and the Category entropy has increased between 1975 and
1982, remaining stable thereafter (Figure 7.a). It is apparent from Figure 7.c that the Sector
entropy has diminished because the tertiary sector has expanded to the detriment of the others.
Particularly, the downward trends of
𝐼𝑁𝐷_𝑅𝐴𝑇𝐸 and 𝐴𝐺𝑅_𝑅𝐴𝑇𝐸 reflect the decline in the
shares of workmen and farmers (Figure 7.d). Conversely, the shares of employees,
intermediate workers and intellectual professions have increased over time. The increase in
the intellectual professions has been accompanied by a substantial rise in the percentage of
graduates, as well as by a large drop in the percentage of low-educated (Figure 7.b). With this
regard, we remark that the educational composition of the population has changed
dramatically between 1975 and 2011: in 1975, about 50% of the population had a
middle-school diploma or lower, while only 10% had some post-secondary diploma; in 2011, the
situation had reversed: the high-educated have become the largest group in the population
(35%), while the low-educated have become the smallest one (15%). Of course, this evolution
is part of a historical process of gradual improvement of the average level of education. When
we break down these trends based on the average change in the accessibility index, we find
two different dynamics, depending on whether accessibility has increased or has remained
constant. Table 5 shows the average percentage change of each variable between 1975 and
2011 for two groups of municipalities: the treated (i.e. the municipalities whose accessibility
has improved over time) and the untreated (i.e. those whose accessibility has remained
constant). We see that the magnitude of changes differs across these groups: whenever a
variable increases, it increases to a larger extent among the untreated; vice versa, whenever a
variable decreases, it decreases to a larger extent among the treated. This means, for example,
that the percentage of graduates has increased more in the absence of public transport
improvements; or that the percentage of farmers has fallen to a lesser extent in their presence.
This sounds strange, as we would have expected the expansion of the public transport network
to bring about an improvement in the socio-economic conditions. However, there may be an
explanation. As shown in the table, population density has increased by +39.7% in the control
group, while it has only increased by +8.8% in the treatment group.
46This seems due to
differences in the initial population levels: despite their similar extension (about 9 km
2),
untreated municipalities had about 1900 residents each in 1975, against the average 23’260 of
the treated. Assuming population density to approximates economic wealth, the (poorer)
untreated municipalities may have grown faster via catch-up. An alternative explanation for
the aforementioned differences is that RER had connected some under-developed areas to
Paris so as to favor their economic (and urban) development, but these areas have nonetheless
continued to lag behind in terms of development. Of course, it's either one of two things: either
untreated municipalities were poor and have converged to the rich, or treated municipalities
were not rich at all and have remained poor. To verify these claims, we should have data on
GDP per capita or some alternative measures of wealth. This would also be a valuable control
for our regression analysis. Unfortunately, these data are not available for the period
considered.
5.3 Scatterplots and correlation coefficients
We are now going to comment on the relationships for which the Pearson’s correlation
coefficient |𝑟| > 0.10. Accordingly, we have not included plots for correlations that are closer
to zero than 0.1 and -0.1.
47The accessibility index (𝑎𝑐𝑐_𝑖𝑛𝑑𝑒𝑥10) is positively correlated with
the Education entropy (0.16) and 𝑒𝑑𝑢4_𝑅𝐴𝑇𝐸 (0.24), while it is negatively correlated with
𝑒𝑑𝑢2_𝑅𝐴𝑇𝐸 (-0.22). These correlations are weak, as can be clearly seen from Figures 9.a –
9.c. Some moderate relationships are found between accessibility and the Sector entropy
(-0.35) and between accessibility and
𝐴𝐺𝑅_𝑅𝐴𝑇𝐸 (-0.35). There is also a mild positive
association between accessibility and
𝑇𝐸𝑅𝑇_𝑅𝐴𝑇𝐸 (0.26), while the correlations between
accessibility and the other sector rates are negative and weak (see Figures 9.d – 9.h). The
46