• Non ci sono risultati.

The methodology for demographic estimation in small priority areas Case study: Demographic estimation in small disadvantaged priority area in France

N/A
N/A
Protected

Academic year: 2021

Condividi "The methodology for demographic estimation in small priority areas Case study: Demographic estimation in small disadvantaged priority area in France"

Copied!
95
0
0

Testo completo

(1)

The methodology for demographic estimation in

small priority areas

Case study: Demographic estimation in small

disadvantaged priority area in France

Author:

Tran Thi Thanh Huong

Supervisor:

Prof. Monica Pratesi

(2)

Declaration of Authorship

I, Tran Thi Thanh Huong , declare that this thesis titled,

“The methodology for demographic estimation in small priority areas Case study: Demographic estimation in small disadvantaged priority area in France “ and the work presented in it is my own. I confirm that:

□ This work was done wholly or mainly while in candidature for the master degree of this University.

□ Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated. □ Where I have consulted the published work of others, this is always clearly attributed. □ Where I have quoted from the work of others, the source is always given. With the

exception of such quotations, this thesis is entirely my own work. □ I have acknowledged all main sources of help.

Signed:

(3)

UNIVERSITY OF PISA

Abstract

European Master in Official Statistic Department of Economics and Management

by Tran Thi Thanh Huong

In order for local communities (cities, municipalities) to carry out their duties and make decisions, they need to mobilize information on small sub-municipality-level areas. This is especially the case for disadvantaged priority areas in France.

Estimation methods exist for some types of infra communal areas which were developed in 2010 and depend on the number of dwellings sampled in the area. Nowadays, these methods must be reviewed for several reasons: the demand for information on small areas has increased, the supply of auxiliary geo-located data has expanded, and geo-localization of population census data has been greatly developed.

This research is going to propose a method to estimate the population as well as estimate some of the population characteristics on essentially infra-city communities.

Depending on the sample size in these domains, research will apply direct estimation methods, indirect estimation methods or small areas estimation methods. The auxiliary information from financial baselines or administrative data will be taken into account, to improve the estimation.

In order to assess the statistical quality of the produced estimations, research will consider variance and coefficient ò variation of each demographic estimation.

(4)

Table of contents List of Tables List of Abbreviations Chapter 1: Introduction 1.1 Problem statement 1.2 Research objective

1.3 Organization of this report Chapter 2: Literature review

2.1 Balanced Sampling - Cube method 2.1.1 Balanced Sampling

2.1.1 Cube method 2.2 Small area estimation

2.2.1 Horvitz- Thompson estimation 2.2.2 Ratio Estimation

2.2.3 Calibration Estimation

2.2.4 Generalized regression estimators - GREG 2.2.5 Composite Estimation

2.2.6 FH- EBLUP Chapter 3: The Data descriptions

3.1 French census overview 3.1.1 Data collection

3.1.2 Principles of the official population counts

3.1.3 Addresses register used for French census sampling in large municipalities. 3.3 Balance sampling design in large municipalities

3.4 Auxiliary information data

Chapter 4: The new priority neighborhood (QPV) 4.1 Definition

4.2 Characteristics 4.3 Typologies of QPV 4.4 Demographic estimation

Chapter 5: The estimation methodologies apply in QPV using French census 5.1 Direct Estimation - DSAU

(5)

5.3 Ratio Estimation 5.4 Calibration Estimation 5.5 Composite Estimation

5.6 Small Area Estimation - The Fay Herriot Model Chapter 6: Writing and Analytical calculation of Variance

6.1 Statistics significant of total estimator

6.2 Coefficient of Variation for demographic estimation Chapter 7: The final result and conclusion

(6)

List of Tables

Table 1: New priority geography of urban policy in metropolitan France and the overseas departments Table 2: Characteristics of priority neighborhood in metropolitan France

Table 3: Children living in Priority Neighborhoods

Table 4: Education (lower secondary school : 11-14 years) – lower pass rates in Brevet des collèges for Priority Neighborhood pupils

Table 5: The unemployment in the Priority Neighborhoods

Table 6: The labour market situation of men and women aged 30-49, in and out priority neighborhood Table 7: Living environment and security

Table 8: Proportion of foreigner

List of Abbreviations

INSEE National statistic and economics studies Institution EPCI Public Bodies for Inter-municipal Cooperation NOTRe Act New territorial organization of the Republic IRIS Population census zones

QPV Quartiers prioritaires de la Politique de la Ville - Disadvantaged priority neighborhood

DSAU Division Statistics and Urban Analysis MSE Measures of precision

HT Horvitz-Thompson

OLS Ordinary least square

BLUP Best Linear unbiased predictor

EBLUP Empirical Best Linear unbiased predictor ONPV National Observatory of Urban Policy CV Coefficient of Variation

(7)

Chapter 1 : Introduction

The estimate population indicator in the statistic system is always important mission of national statistics office generally, and INSEE – National statistic and economics studies Institution in France, particularly. The necessary information has provided for people needed such as government, policy-makers, companies or readers. Depend on the purpose of the project, the information has to collect, pre-process, estimate to have the accuracy and trusted results.

1.1 Problem statement

According to demand, updating the cities with the new inter-communal bodies has been starting in every fixed period. In France, the Public Bodies for Inter-municipal Cooperation (EPCI) in 2011 has defined the perimeter of cities. In addition, the multiple territorial reforms were conducted between 2012 and 2016, making the previous perimeters obsolete. The most significant of these, with regard to the topic of exploration, is that brought about by the Notre Act (new territorial organization of the Republic), which in 2015 profoundly changed the boundaries of the inter-municipalities. All French municipalities are now covered by EPCIs with at least 15,000 inhabitants. However, when creating these new institutional objects, density criteria were not taken into account, such that the new EPCIs are just as likely to contain densely populated urban areas as uninhabited or very sparsely populated areas. Consequently, extensive reflections were initiated with a view to updating the perimeter of the cities in a manner both fitting and sustainable, and calling into question the EPCI meshing system.

Ongoing exploratory work will determine the possibility and relevance of creating sub-city districts for the 13 cities in which the population exceeds 250000 inhabitants. The multiple municipalities composing each of these cities would, therefore, be established as level 1 sub-city districts.

In order for local communities (cities, municipalities) to carry out their tasks and make decisions, they need to collect and gather information on small sub-municipality-level areas. These areas do not necessarily correspond to population census zones, IRIS in French, or can depend on a large and small city. This type of use cannot be directly provided in the results of the population census. This is especially the case for Quartiers prioritaires de la Politique de la Ville, QPV, a disadvantaged priority area of a city.

(8)

DSAU is going to provide the public authorities information at the scale of these neighborhoods that facilitate their demographic, economic and social monitoring and the evaluation of the specific public policies carried out there. The ambition is therefore to ability to disseminate PR data as well as data from other sources at least at the level of these neighborhoods and futures on tiles a few hundred meters apart. For this dissemination to be possible, it is necessary to be able to obtain a satisfactory accuracy of the data relating to these areas and therefore to be able to estimate this accuracy.

Since 2004 the RP is no longer exhaustive and is carried out by survey in cities of more than 10 000 inhabitants. Therefore, it is possible that the number of observations contained in the targeted area is too weak to obtain good direct estimators by variables and subpopulations which one is interested in. Therefore, it is necessary to be able to identify which are the data which are of sufficiently good precision to be published and which are the QPV on which finer lighting is possible.

Amongst the statistical indicators considered in this study for the evaluation of the accuracy of estimators, we find the variance calculations and coefficient of variation. Small estimate areas were also considered as part of my internship as it seemed appropriate for the exploitation of this type of data.

1.2 Research Objectives

Estimation methods exist for some types of infra communal areas by using France census which surveys approximately 40% of the dwelling in five years in cities of more than 10000 inhabitants. These estimation methods were developed in 2010, and depend on the number of dwellings sampled in the area. Today, these methods must be reviewed for several reasons: the demand for information on small areas has increased, the supply of auxiliary geolocated data has expanded, and geolocalization of population census data has been greatly developed.

The objective of the research is to propose a method to estimate the population as well as estimate some of the population characteristics on essentially infra-city communities. Depending on the housing sample size in these domains, this research project will explore direct estimation methods or small areas type methods. In order to improve estimations, the research project will also use auxiliary information from financial baselines. Finally, the research project will propose and program methods to assess the statistical quality of the produced estimations.

(9)

1.3 Organization of this report

The report is organized as follows. Chapter 1 will be the introduction, chapter 2 is literature review related to Balance sampling, cube method, direct estimation - DSAU, generalization linear regression, ratio estimation, calibration estimation, indirect estimation, composite estimation, and small area estimation. Chapter 3 is a data description of the related methods such as French census, data collection, Addresses register used for French census sampling in large municipalities. Chapter 4 is focusing on small disadvantages priority areas in France, typologies of these areas and auxiliary information. Chapter 5 is applying the estimation methodologies apply in QPV using French census. Chapter 6 is presenting in writing and analytical calculation of variance. The final result and conclusion work will be given in Chapter 7.

(10)

Chapter 2: Literature review

2.1 Balanced Sampling

While the balanced sampling has been started in the early days of statistics development, applying this concept is difficult because most of the proposed method are either listed or rejected and demanded for time and significant computational time. The algorithm of cube method was published in Tille (2001) and Deville Tille (2004) to select the samples, which can represent for population in computational time, can be considered as a generalization of stratification methods. The most important applications of this are in New French Census and French Master Sample.

Definition of balanced sample

Consider in population U of size N, a vector of auxiliary variable xk = (x1, x2...., xkj,... xkp) is available for any k∈ U..

A random sample S is selected with an inclusion probability πk for any unit k∈ U. In this

case, the sampling design is balanced on xk if following balancing equations holds:

(1)

Balanced sampling design

The problem is that the balancing equation (1) can rarely be exactly satisfied. Therefore, the cube method aims at selecting a sample that exactly satisfies the inclusion probabilities while remaining as balanced as possible.

When xk = πk the balanced sampling amounts to fixed-size sampling since the balancing equation given in (1) become

(11)

2.1.1 The Cube method

The cube method, designed by Jean-Claude Deville and Yves Tillé at the end of the 1990s, and used for the first time on a large scale for the census, is a class of sampling algorithms that selects a sample and satisfied the given inclusion probabilities.

This method is based on a geometric representation of the design. Indeed, the s samples of U can be represented as coordinate vectors (s1,…, sk,…sN), where sk = 1 if k ∈s and sk = 0 otherwise. We can thus interpret each vector s as a vector of a N-cube. A sampling plan of inclusion probability k can then be defined as a law of probabilities p (.) on all the vertices of the N-cube such that:

A balanced sampling algorithm then corresponds to a "random walk allowing to reach a vertex of the N-cube from the vector π so that the balancing equation are satisfied ". To do this, the cube method produces two successive phases: a flight phase, allowing to decide on the inclusion or not of most of the units by exactly respecting the balancing constraints, and a landing phase to decide possibly on the remaining units.

2.2 Small area estimation

Direct estimators only use data from the survey of the area of interest itself. The demand for reliable information about the extent of cities and functional urban areas has increased significantly. However, the increasing demand for political and economic insights is primarily making amends by a constant or even lower budget for data collection. The sampling designs of social surveys are typically only designed to estimate based on reliable designs at the state or region level due to the maximum size allowed. In contrast, functional cities and priority areas are often not incorporated in the sampling design and are therefore called unplanned areas. Because the areas involved may not be planned and the random sample size is small, the estimation of the corresponding parameters of interest may be difficult.

Indirect estimators denote all estimators which also use information from outside the area of interest. In this context, methodologist tries to borrow strength from additional information which can come from the administrative sectors. Therefore, the modeling has used to connect the relationship between the two data.

(12)

The design-based estimators are approaches whose inference is based on the probability distribution generated by the underlying sampling design because it used mainly sampling weight to estimate the indicators. For instance: Horvitz-Thompson estimator, Ratio estimator, Calibration estimator. The data have used which are usually fixed and finite population, therefore, the design-based methods are approximately unbiased. However, if the sample size is small, the variance of estimation will be large.

Figure 1: A classification of estimation methods

Small Area Estimation

Direct estimation

Model assisted Model based

Borrowing strength

Unit level model Area level model

Horvitz-Thompson Ratio GREG

estimatio

Calibration

(13)

The assisted estimators have provided to improve the accuracy of the estimator. The model-based estimators include additional auxiliary information which can model of relationships across all areas, it may be possible to improve the precision of the method and achieve a stabilization of the estimation. The model-based methods may be subject to bias. However, their estimation variance will tend to be small in comparison to design-based approaches. The example of the model-based estimators is the synthetic approach, composite estimator, EBLUP (empirical best linear unbiased predictor) methods.

2.2.1 Horvitz - Thompson estimation

In the estimation for domains, the mean square error (MSE) is usually computed to evaluate the quality of a population parameter estimate and to obtain valid inferences. The estimation method and the sampling design determine the properties of the MSE and the sampling error.

Small areas or small domains denote areas in which the sample size is small which do not have enough sampling and are not large enough for a direct design-based estimation of sufficient precision.

A variety of estimator is usually considered for small area estimation. The basic Horvitz-Thompson estimator is the most natural estimator to use. It conforms to the sampling design and the sampling weights incorporated in the estimation process account for possible stratification, clustering, and multi-phase or multi-stage information. However, the use of additional auxiliary information is necessary to improve estimators.

In sampling survey, direct estimation consists of building an estimator of y without using any information outside of the given area. Then, in a direct estimation, a unit can only contribute to its area. The Horvitz-Thompson estimator is the basic design-based direct estimator of the domain total.

Let ty = ∑yk , k ∈U, yk be the total of the quantitative variable y. The Horvitz - Thompson estimator” or “π-estimator” of ty is defined by

The Horvitz - Thompson gives the accuracy result in case sampling is large. In some case, the sampling is small, lead to reduce the precise of method, it is necessary to use the auxiliary

(14)

information.

2.2.2 Ratio Estimation

Ratio estimation is a technique that uses available auxiliary information which is correlated with the variable of interest.

Suppose that a variable xk is correlated with a variable of interest yk , and we have a paired random sample of n observations (xk,yk) for k = 1,…,n. and that the total ty is known.

The ratio estimation is

2.2.3 Calibration Estimation

Calibration estimation is regularly used in survey sampling in order to adjust sampling weights to make certain estimators match known totals which are from the auxiliary files.

Consider the estimating the population total Y, tyk in each QPV, for a finite population of size N. Let S denote the index set of the sample obtained by a probability sampling scheme and let yk be observed in the sample. The direct estimation or Horvitz - Thompson (HT) estimator of the form

is unbiased for ty , where dk = 1/πk is the inverse of the first order inclusion probability of unit k in the population.

If an auxiliary vector xk is available from the sample and the population total X is known. It is desirable to find the new weight

k such that

(15)

For example, the regression estimator ( GREG)

where

is calibrated on the total tx

The term calibration estimation was introduced by Deville and Sarndal (1992) as a procedure of minimizing a distance measure between initial weights and final weights subject to calibration.

2.2.4 Generalized regression estimators (GREG)

The ratio estimate provides on the efficient estimate of the population mean if the regression of y, the variable under study, on x, the auxiliary variable is linear and the regression line passes through the origin. If the relationship between y and x is remain, even though the regression of y on x is linear, the regression line does not pass through the origin. Under this conditions, it is more appropriate to use the regression method of estimation rather than the ratio method of estimation. Since the regression coefficient (beta) is generally not known, it can be find regularly by using the estimation

(16)

The regression estimate is

The expected value of the Simple Regression Estimator is showing that the simple regression estimate is biased by an amount coefficient of variable of beta and x .

The regression estimator was proposed under a regression population model that postulates a relationship between the study variable y and variable xi from census. We know that y and x have linear regression relationship, in addition, we know the total of xi in auxiliary file. In this case we do not know the total of yi directly from auxiliary file. We still apply direct estimation (combine ratio estimation and linear regression). This direct estimation go through 4 steps:

Step 1: Choose potential explanatory variables xi .

The variables xi have to present in both census and auxiliary file. The total of xi know in auxiliary file.

By checking correlation between study variable y and potential explanatory variable x, we can find the relationship between y and x.

Step 2: Run linear regression using data in survey. This method will figure out the explanatory variables.

Step 3: Calibration method

For each dwelling j inside the address k, we have: Inclusion probability of the dwelling j : πj = πk Sampling weight of the dwelling j : dj = dk (=1/πj)

Choose the vector xj of auxiliary variables ( at the dwelling level) on which we want to calibrate the weight dj. We note that ⍵j for the calibrated weight.

(17)

For example, if we are interested in the proportion of foreigner, where yjis number of foreigner in the dwelling j; zj are number of people in the dwelling j

U is the whole municipality is the QPV for which we want to make the estimation. S is the sample of addresses selected in the QPV.

The estimator is:

2.2.5 Composite Estimation

There is one indirect method which called synthetic estimator, this is an estimator is used to derive an indirect estimator for a small area under the assumption that the small areas have the same characteristics as large area. For instance, in our QPV project, we propose that all the QPV in the same typology will have same characteristics so that the regression model will build to see the relationship between some variable of interest.

In some case, synthetic estimator do not use the auxiliary information, only survey data has used, synthetic estimator will become ratio estimation or Generalized regression estimators. The model implicit that small area mean is approximately equal to the overall mean, the synthetic estimator will be very efficient because the mean square error (MSE) will be small. On the other hand, it can be heavily biased for areas exhibiting strong individual effects which can lead to large MSE.

In the area-level of auxiliary information, the regression-synthetic estimators can applied in all small area (D). Suppose survey estimates total Y, and related area level auxiliary variables x (=1, …,k) are available for all small area (D). We can the fit by least squares a linear regression to the data (

̂t

yk,

x

k) from D sample areas. the results of regression coefficient lead to the regression-synthetic estimators given by k = 1,…,N. The estimators of regression-regression-synthetic can be heavily biased of the underlying model assumptions are not valid.

There are the natural way to balance the potential bias of the synthetic estimator, and against the instability of a direct estimator is to take a weight average of

̂t

and

̂t

yreg, called the composite

(18)

estimation.

The composite estimation is the combination between direct estimation and indirect estimation. The estimate is a weighted average of direct estimation in the field estimation by a model.

α: is composite weigh, 0 ≤ α ≤ 1, depending on the sample size of the domain or depending on CV The design MSE of the composite estimator is give by

where Edis the covariance term, is small relative to

MSEd (

̂ t

yreg)

Minimizing this, we have

the approximate optimal α* lies in the interval [0,1] and depend on the ratio

as

(19)

2.2.6 The Fay Herriot - Empirical best linear unbiased predictor (FH-EBLUP)

Small areas or small domains denote areas in which the sample size is not large enough for a direct design-based estimation of sufficient precision. Further information from outside the area is required for a reliable estimation. The term is independent of the actual size of the area, for example the area of interest might be a rather large city, but the sample size within this city is too small for a precise direct estimation.

Small area estimation methods can be used to improve the quality of estimates for the interest. The key approaches based on this model incorporate additional auxiliary information from further regions using the predefined model. This allows for increased accuracy and even estimates for areas that have not been sampled.

Small area estimation depends approaches heavily on the availability of outside information in the form of survey data, administrative registers or census data. These sources contain variables that can be correlated with the target variable and therefore may be well suited to use for estimating. Therefore, the difference between the area and unit-level auxiliary data must be made.

If some additional information about the population is available at the population level, the model-assisted estimation methods may be applied. For example, the generalized regression (GREG) estimator method proposed by Cassel et al. (1976). The model-assisted is used to lessen the unexplained variation of the variable of interest. The prediction method regularly has a variance lower or equal to the Horvitz-Thompson estimator. However, to apply direct estimation method, the sample size requires to be large enough in the domains and areas of interest. If the variance estimates for GREG estimates show too much variability, the design-based and model-assisted methods are not suitable for generating these estimates. In this case, the model-based small area estimation methods should be applied.

Considering to the data level to apply in the model when we use auxiliary information, there are two level of data are unit-level and area-level.

The unit-level models can be used in case all the information is available for all units of the population such as the information about the variable of interest, the auxiliary variables and the area membership. In the case of a linear model, where each element must be known and run linear regression to find out the relationship of variables, the knowledge of the variable of interest and the auxiliary variables for the sample elements only is sufficient. In addition, the area-specific aggregated values of the auxiliary variables must be known for the population.

The area-level models can be used in case of impossible to access in unit-level information, or the suitable auxiliary information is not available at unit level. It requires direct estimators for the

(20)

variables of interest in the areas and the aggregated values of the auxiliary variables in the areas. The area-level models have advantages in terms of lacking unit data and less consume time in order to create more complex models.

There are definitions of planned areas and unplanned areas to support the information data. The planned areas exist whether the sampling design is a stratified random sample, by which the areas of interest constitute the strata. The requirement is that form the beginning of survey processing, the areas have identified and the sampling design is based on the area of the population units to collect data. The areas can be considered to be specific group, within which traditional estimation approaches can be applied. The area-specific sample size is often fixed and known to estimate. The unplanned areas exist in many case in which the area is not taken into account within the sampling design. The sampling design are provided for general purpose, therefore the area-specific sample sizes are random, which can be almost full survey or only survey less than 30%. In some coincide, there are areas which are non-sampled at all which no sample units have been drawn.

The model-based Estimation - Fay-Herriot Model

The Fay-Herriot Estimator is the estimator use the area-level of data of suitable auxiliary information which have been aggregated from the area of interest. The area-level model of Fay-Herriot can be divided into two parts: the sampling model and the linking model (see Jiang and Lahiri, 2006, p. 6).

The sampling model for each of the D areas of interest with index k = 1, ..., N, is given by

Let ty = ∑yk , k ∈U, yk be the total of the quantitative variable y. The Horvitz-Thompson estimator” or “π-estimator” of ty is defined by

Where

t

is a direct estimator for the respective area of interest k. An estimator is designated to be direct if only data from the respective area of interest have been used for the estimation. In addition,

t

is the true but usually unknown parameter of interest in region k. Therefore, it is supposed that

(21)

In the context of the linking model, the assumption of a linear relation between the parameter to be estimated,

t

yk, and the true area-specific auxiliary variables is made.

with vk ∼ N(0,σv2). Xk designates the population average of the used auxiliary variables in area k. The random effect vk incorporates variations between the areas that cannot be explained by the fixed effect of the regression term. The variance of the random effects σv2 is also called model variance as it measures the variance between the areas, which cannot be explained by the fixed component of the model. XTk β is the regression term with the vector of regression coefficients β, which measures the fixed effects over all areas. This is the relationship between the variable to be explained and the auxiliary information.

In combination, the sampling model and the linking model result in the linear mixed model

(1)

with vk ∼iid (0, σv2) and ek∼ind (0, ψk)

This is a basis form of the Fay-Herriot estimator. In this, the direct estimator, which has been built on the basis of a sample, forms the dependent variable. By assuming that the model variance σv2 is known, the best linear unbiased predictor (BLUP) is given by

(2)

with

(22)

γk called shrinkage factor which measures the relation between the model variance σv2 and the total variance ψk + σv2, it might be considered as the uncertainty of the model with respect to the estimation of the area-specific mean values . The vector of regression coefficients β is estimated by the weighted least squares method and is given by:

(3)

Replace

v*

k ︎ into equation (2)the Best Linear unbiased predictor (BLUP) might be transformed as follows:

(4)

As a result of the transformation, it is visible that the model-based estimator according to Fay and Herriot (1979) is a weighted average of the direct estimator and the regression-synthetic estimator

X

Tk β︎.

The weight of the single components hereby depends on the shrinkage factor

γ

k. Hence, if the sampling variance of the direct estimators is comparatively high in an area k, the respective

γ

k tends to be comparatively low. As the direct estimator for this area is considered to be unreliable, a correspondingly large weight is placed on the regression-synthetic part of the BLUP. On the contrary, if a low area-specific sampling variance ψk or a high general model variance

σ

v2 is given, the weight increases and more confidence is put in the direct estimator of the respective area. In practice however,

σ

v2 is unknown and has to be estimated as well. For this purpose a number of fitting methods exist. By replacing the model variance

σ

v2 by the estimated variance of the random effects

σ*

v2 in (2) and (3), the empirical best linear unbiased predictor (EBLUP) is obtained.

(23)

Chapter 3. The Data descriptions

3.1 French census overview

The census takes place every year in France to count houses and people. This census is organized by the National Institute of Statistics and Economic Research France (National Research Institute de Statistique et des Etudes economiques, INSEE). The objective of the census is to identify the main characteristics of the population such as gender, age, activity, occupation, household characteristics, etc.

This information is useful in identifying the needs and organize territories at the national and local levels. For example, which social policy should be implemented or which infrastructure will be built.

3.1.1 Data collection

Census population is used to learn about France's various and changing population. It is composed by sex and age, occupation, housing conditions, means of transportation, commuting to work, commuting to study, and so on. The results are produced each year and used:

• by the administrative and local authorities to tailor collective facilities: child care centers, hospitals, schools, sports facilities, transport, etc. and to develop local policies.

• by public and private professionals to improve knowledge of housing stock.

• by companies to have precise data in order to improve knowledge of potential customers or labor availabilities in a given geographical sector.

• by associations, especially those that deal with healthcare, social, educational and cultural fields, in order to better adjust their actions to the needs of the population.

The purpose of the census to have the official population number of every administrative district (more than 350 administrative and legal texts refer to populations), to description of socio demographic characteristics at all levels of geography, too sample base for surveys. The French census is under the responsibility of the State.

Insee :

-

Defines the protocol of data-collection.

-

Defines the content of questionnaires.

(24)

-

Provides computer applications.

-

Provides communication media.

-

Trains the municipal coordinators and contributes to the census agents trainings.

-

Controls the preparation of data-collection.

-

Controls the progress of data-collection and its results.

-

Recodification, adjustments and calculation of population.

-

Dissemination of results.

Municipalities :

-

Recruit the census agents.

-

Divide their area in data-collection zones and examine all the addresses in the census.

-

Organize local communication.

-

Participate at the census agents trainings,

-

Supervise the census agents daily.

-

Check the completeness of data-collection.

-

Ensure the reminders to residents who did not answer.

In the implement, there are three main stakeholders : municipal coordinator, supervisor, census agent .

Municipal coordinators are in charge of preparing and leading the data-collection; leading the census agents and contributes to their trainings, and exchanging with the supervisor about difficulties encountered.

Supervises and controls the data-collection respond for training and advising the municipal coordinator; contributing to the census agents trainings; and monitoring the progress and the quality of operations.

Census agents realize the data-collection and report their progress.

3.1.2 Principles of the official population counts

There is the main issue of rolling census that the official population counts have to publish annual for each municipality. The constraint is that the whole territory is not collected at the same time. The last comprehensive Census was held in 1999. In 1999 official population counts were published for each municipality thanks to the comprehensive collection results. Since 2004, each annual census survey has contained. One in five small municipalities collection results. About 8% of the dwellings of each large municipality (40% of the dwellings of the annual rotation group).

(25)

3.1.3 Addresses Register used for French Census sampling in large municipalities French Census in large municipalities which have 10.000 inhabitants or more, contrary to the small municipalities, the population will less than 10.000 inhabitants. In general, the census is not a full enumeration of the livable addresses. Each year, a sample of livable addresses is drawn according to a random sampling methodology. Before setting the sampling design, we have to create the sampling frame of the livable addresses.

Quality requirements for a sampling frame : To build a good sampling frame we need • Concerning the units (the livable addresses) : an exhaustive basis updated regularly and

an identifier for each unit of the basis.

• Concerning their characteristics (variables) : Geographical variables to locate precisely each livable address ; Social and demographic variables to help us to define the sampling design (by creating strata e.g.) to improve the quality of the samples (by using calibrations methods e.g.)

The French Addresses Register : There is an addresses register for each municipality. The exhaustive file of the addresses of the city which updated continuously and each address has a permanent identifier.

The concept of address in the register is by address which is meant a building or a house can be clearly identified on the ground. For example, on President Wilson Street, N°2, we find a building A and a building B the register must contain a unit for each of this two buildings.

The management of the register is co-managed by Insee and the municipalities. Each year, the mayors’ offices evaluate their addresses register to improve its quality as much as possible before the annual census survey. Since 2016, the Addresses Register is managed in a central computing application whose name is Rorcal. The central management of Rorcal is located at Insee.

The updating the register has done in several times a year in each municipal. By means of Rorcal, Insee sends a list of new addresses on the basis of the building permits. In Rorcal the manager creates, locates and characterizes each new address & updates the others . After the collection, the register is updated with informations from the census survey (addresses, number of habitable dwellings).

3.3 Balance sampling design in large municipalities

(26)

each large municipality every year. In order to draw the addresses, a two-steps balanced sampling design has applied. The sampling frame is the Addresses Register.

The first step is to randomly split the address into five rotation five rotation.

The second step is to select a balanced sample of approximately 40% of the dwellings in one of the groups. We collect all the housings and the inhabitants of the sampled addresses. The sampling and estimation strata to avoid the cluster effects, spurious time effects. The sampled addresses for year Y mean all Y-group’s addresses of comprehensive enumeration strata large addresses, new addresses and tourist accommodations. The random balanced sampling in the Y-group of regular addresses to collect 40% of the dwellings of the Y-group globally. The balancing variables are the number of housings in the municipality, the number of collective housings in the municipality, the number of housings in each block of the municipality.

3.4 Auxiliary information

In sampling theory, the auxiliary information may be used at the estimation stage, for example in formulation of the calibration estimation.

The information available is usually in the form of:

• The values of the auxiliary character(s), known in advance for each and every unit of the population.

• Or the population totals or means of auxiliary characters.

If it is desired to stratify the population according to the values of some variate x, their frequency distribution must be known.

In sample surveys, many a time the characteristic y under study is closely related to an auxiliary characteristic x, which may be either readily available or can be easily collected for all the units in the population. In such situations, it is customary to consider estimators of population mean of y that use the data on x and are more efficient than the estimators which use data on the characteristic y alone. In DSAU, there is some auxiliary information provided is the Fideli file (Tax file). The Fideli file has auxiliary information related to the age, gender, country of birth, income etc. The Fideli file has been collected every year in every dwelling. In some cases, the Fideli file provides the information of characteristics which are exactly the same for which estimators are need. In other case, Fideli contains information which may be useful for estimation.

(27)

Chapter 4: The new priority neighborhood (QPV)

4.1 Definition

Quartiers prioritaires de la Politique de la Ville- New priority neighborhood - QPV

By the Law for urban affairs and urban cohesion of 21 February 2014, the definition of a new priority geography has approved (territorial units targeted by urban policy measures in France) to created a new National Observatory of Urban Policy (ONPV) which aims are : • Observation of the situation of residents

• Evaluation of the outcomes from the policies

In addition , the creation of new urban contracts is to improve areas and people situation The principle in creating the new geography is to simplify and make the priority geography coherent, or to concentrate public interventions (1300 areas now, before 2014 is 2500 areas).

Methodology for the areas identification :

• Criterion : the level of income of the inhabitants

• The territorial scale : using a system of squared grid cells of 200 meters side, defined by the national institute of statistics (INSEE)

• Definition : the cells or clusters of cells with more than 1,000 inhabitants with resources lower than 60% of national median incomes ( more exactly weighting of national median income by the median incomes of the urban unit)

• Other methods in the overseas to take account of the specificities of these territories New priority geography of urban policy in metropolitan France and the overseas departments is 1436, the population in new priority neighborhood - QPV is 5300000, is 8,4% population ( Source : CGET – Estimated population (INSEE – RFL2001 – RP 2010)).

Table 1: New priority geography of urban policy in metropolitan France and the overseas departments

(28)

4.2 Characteristics of priority neighborhoods

Characteristics of priority neighborhood in metropolitan France

The populations in priority neighborhood in metropolitan France is young, with a low educational attainment, and is more often of foreign origin. The proportion of lone-parent families is twice that of the cities in which the neighborhoods are located. Employment is also more likely to be insecure. The priority neighborhoods are usually located mostly in three regions of France (Ile-de-France, Nord-Pas-de- Calais and Picardie, and Provence-Alpes-Côte- d’Azur), where are more likely to be situated in south-west France or in the former region of Picardie. One in five residents of these neighbourhoods live in the Languedoc-Roussillon-Midi-Pyrénées region, and one in ten in Aquitaine-Limousin-Poi- tou-Charentes. At the level of departments, the largest number of people living in Priority Neighbourhoods is in Seine-Saint-Denis, which accounts for one eighth of the total population concerned by urban policy in metropolitan France. Four in ten residents of this department live in a Priority Neighborhood.

(29)

Outside of metropolitan France, slightly more than 500,000 people live in a Priority Neighbourhood in the overseas departments and regions (Guadeloupe, French Guiana, Martinique, Mayotte and Réunion), 9,700 in Saint-Martin, and 73,000 in French Polynesia.

Conditions for residents of these overseas Priority Neighbourhoods vary depending on the territory and in many cases are specific to them. But the underlying realities are the same as in the Priority Neighbourhoods in metropolitan France—a younger population than in the urban environment as a whole, an alarming situation as regards jobs and unemployment, and a generally larger proportion of lone-parent families. A few differences emerge in respect of housing tenure. In most of the over- seas territories, the proportion of tenants in the Priority Neighbourhoods is higher than in the rest of the municipalities. But this is not the case in French Guiana or in Saint Martin. For Réunion, statistical data available for the island’s 49 neighbourhoods were used to construct a typology in which four separate groups of neighbourhoods are distinguished by characteristics relating to their population or housing.

(30)

Table 3: Children living in Priority Neighborhoods

A total of 1.4 million households in the Priority Neighbourhoods for Urban Policy in metropolitan France are in receipt of benefits distributed by the family allowance funds, equal two in three residents in the Priority Neighbourhoods, compared with less than one in two in the rest of metropolitan France. In the Priority Neighbourhoods, the largest group is that of people living alone followed by lone-parent families, whereas in the rest of metropolitan France couples with one or two children are the second most common family type after people living alone. The Priority Neighbourhoods also have a larger proportion of childless couples. Two-thirds of family allowance recipients living in Priority Neighbourhoods are below the low-income threshold, families with children being the largest group ahead of isolated people. Consequently, among the children of family allowance recipients in Priority Neighbourhoods, two-thirds live in low-income households, which is double the proportion in the surrounding urban units. Lastly, of these 1.4 million households, almost 36% receive a means-tested minimum income—the Revenu de solidarité active (RSA)—compared with 19% of comparable households not in Priority Neighbourhoods.

Recipients of family allowance benefits in the incoming Priority Neighbourhoods, i.e. not previously included in the priority geography of urban policy, differ slightly by family structure

(31)

from the beneficiaries in the other Priority Neighbourhoods. The former include a larger proportion of isolated people, whereas large families are less represented, with in particular a smaller proportion of couples with three or more children. They are slightly less likely to be living in poverty than the households of beneficiaries in the other neighbourhoods.

Education for Priority Neighbourhood pupils

Table 4: Education (lower secondary school : 11-14 years) – lower pass rates in Brevet des collèges for Priority Neighbourhood pupils

At 2013, the 5.5 million secondary level pupils in metropolitan France included 460,000 (8.4%) resident in an urban policy Priority Neighbourhood. Of these, 268,000 were lower secondary school pupils (ages 11–15), divided between state and private schools situated at varying distances in relation to their home neighbourhood and that also served populations from other neighbourhoods. An analysis of the social composition of these schools shows it to be considerably more disadvantaged on average than in the other lower secondary schools. Nearly two-thirds of their entry grade pupils (ages 11–12) are from socially disadvantaged backgrounds, compared with 40% in lower secondary schools not concerned by urban policy.

(32)

Pass rates in the Brevet des collèges, a national achievement test taken at age 15, stand at 75.6% in state lower secondary schools with a large proportion of pupils from Priority Neighbourhoods, compared with 86.1% for state schools with no pupils from Priority Neighbourhoods. Pass rates are much higher in private schools. The sixty private lower secondary schools with more than 25% of pupils from Priority Neighbourhoods, obtain a 91.4% pass rate, slightly lower than that (94.9%) of the private schools with no pupils from these neighbourhoods.

Two years after their final year of lower secondary school, pupils from state schools where a large proportion of pupils come from Priority Neighbour-hoods, have entered in equal proportions (around 25%) a general (i.e. academic) or vocational stream of studies, whereas their counterparts from state schools with no pupils from Priority Neighbourhoods are twice as likely to be in the former than in the latter. Nearly 200,000 pupils in the Priority Neighbourhoods attend a general, technological or vocational

High unemployment in the Priority Neighbourhoods regardless of educational level, sex, or origin

(33)

In 2014, the rate of unemployment among residents of Priority Neighbourhoods stood at 26.7% compared with barely 10% in the rest of the cities. This high unemployment affects all levels of educational attainment. It is particularly high (31.7%) among people with less than the basic vocational qualification (BEP/CAP), but it is 18.8% for those with two years’ higher education (post baccalauréat), which is three times higher than in the surrounding urban units for the same qualification levels. Unemployment is lower among women than among men, but the proportion of women who are inactive is higher, which is because they are more likely to have withdrawn from the labour market. Unemployment affects the immigrant and non-immigrant populations in the Priority Neighbourhoods in broadly similar proportions (27.9% and 26.2% respectively), contrary to the situation in the surrounding urban units where unemployment among immigrants is

considerably higher than among non-immigrants (15.5% versus 9.2%). Higher than average levels of inactivity combine with the higher incidence of unemployment to give an employment rate that is one-third lower in the Priority Neighbourhoods. Fewer than one in two residents of Priority Neigh- bourhoods aged 15–64 are in employment, compared with almost two in three elsewhere.

Women in the Priority Neighbourhoods

Table 6: The labour market situation of men and women aged 30-49, in and out priority neighborhood

(34)

Women outnumber men in the Priority Neighbourhoods and life expectancy is higher for women than for men. However, in the 25–59 age range, the over-representation of women relative to men in the Priority Neighbourhoods is twice what it is elsewhere.


This phenomenon may have its origins in lone parenthood, which concerns one in four families in these neighbourhoods, with a female head of family in nearly nine out of ten cases. When a father or partner has moved out of the home and perhaps indeed left the neighbourhood altogether, the socially rented housing that is overrepresented in the Priority Neighbourhoods offers affordable accommodation for families in this category (nearly 40% of lone-parent families live in social housing compared with 15% of other households). Between the ages of 30 and 49, almost one in three women resident in a Priority Neighbourhood is economically inactive, i.e. neither in work nor unemployed, which is twice the level observed outside the neighbourhoods. The incidence of inactivity is lower among the mothers at the head of lone-parent families, who have to combine responsibility for children with the need for an income. But they do have higher levels of unemployment and involuntary part-time working. Among women, only one in two is in work, mostly in manual or sales/clerical positions, and in many cases on a part-time basis.

Local environment : the answers from former Priority Neighbourhood

Table 7: Living environment and security

The generally positive view that residents in ZUS have of their neighbourhood does not conceal the fact that they identify a number of problems as priorities. One in two residents mention a bad image for the neighbourhood and high crime, but noise and a run-down environment also figure

(35)

prominently among their concerns. Thus 38% report frequent exposure to daytime noise eleven percentage points higher than in the surrounding neighbourhoods. There is also a large difference between the two categories of neighbourhood as regards exposure to night-time noise. The sound insulation of the dwellings is blamed, with 27% of ZUS residents complaining about a poor acoustic insulation, compared with 17% in non-ZUS. Nevertheless, compared with the 2002 survey, progress has been made, since the proportion has fallen substantially (from 36% to 27%). The principal sources of noise mentioned are neighbours and road traffic. As regards the state of air quality and of green spaces, residents in ZUS are still relatively more likely to report a poor local environment, though in smaller proportions than those recorded a decade earlier.

4.3 Typologies of priority neighborhoods

By construction, the 1,300 priority neighborhoods in metropolitan France are all defined by a concentration of low-income audiences. The situations of each QPV can be very different. Relate the situation of all the neighborhoods in the city's politics allow, of course, better know these neighborhoods, but erases their differences by making them appear as a homogeneous whole. The typologies aims to group neighborhoods into a smaller number of classes, in order to be able to analyze their situations in a synthetic way while at the same time distinguishing according to their characteristics. This research presents three typologies of priority neighborhoods, one-first devoted to the living environment, a second dealing with cohesion and a third on employment, according to the trinary articulation of new city contracts.

The typology relating to the living environment makes it possible to distinguish, from data on housing and housing market dynamics, five neighborhood classes: old centers, neighborhoods of social housing in small and medium urban units, outlying neighborhoods small addresses (of less than 20 dwellings), the residential neighborhoods on the periphery of large urban units and housing neighborhoods of remote suburbs. Therefore, using the typology relating to the living environment is appropriate in estimating the indicator.

Living environment

The typology of priority neighborhoods on the living environment that we propose in This article is based on two main themes: based on district-level data or urban unit.

The first is urban morphology to qualify the type of building and its seniority, as well as the centrality of the neighborhood in space urban. The second is more dynamic of the housing market, approached by the vacancy of housing, the size of the urban unit and the part that represents the

(36)

social part of the district in that of the urban unit. The zoning used by the Ministry of Housing to characterize the tense areas was not used in the elaboration of the typology because it is very correlated with the size of the urban unit. Data makes it possible to distinguish five large types of neighborhoods, with different characteristics.

Description of classes Class 1 : Old Centers

With 74 neighborhoods and around 6% of neighborhood residents priority, the class of old centers is the smallest of this typology. The neighborhoods in question are characterized by a strong proportion of dwellings built before 1946. They are located, for the majority of them, in urban units of less than 200,000 inhabitants and are very close at the same time of the town hall of the commune in which they are located, and of the town hall of the commune-center of the urban unit. The commune is also the commune-center of the urban unit. The housing market seems rather relaxed: 11% of the dwellings of the old centers are vacant, against 5% on average in the priority districts. Neighborhoods of old centers are overrepresented in South-East France, particularly in Occitania and Provence-Alpes-Côte-d'Azur: these two regions are home to 53 such neighborhoods, more than a half of them. The five most representative districts of the old centers are all located in the south-east quarter of France: the priority neighborhoods of downtown Carpentras (Vaucluse), the city center of Montélimar (Drôme), from the old center of Châteaurenard (Bouches-du-Rhône), or from the center of Bédarieux (Hérault).

Class 2: HLM neighborhoods of small urban units

The class of housing estates of small units includes 255 neighborhoods and 20% QPVs. Almost all of these neighborhoods are located in urban units of less than 200,000. Unlike the previous class, there are found very few older dwellings (8% housing), but the share of housing is very important (81%). These neighborhoods are very close to the center of the urban unit, even if they are further away from the municipality that the centers old. Due to the small size of the units urban areas in which they are located and the composition of their housing stock, the housing estates of small urban units group a large part of the social housing of their urban units. HLM housing estates of small urban units represent a significant proportion of neighborhoods priorities of the regions Center-Val-de Loire, Burgundy-Franche-Comté and Grand Est. The most representative neighborhoods of this class are Saint-Laurent in Cosne-Cours-sur-Loire (Nièvre), the Roc in Pierrelatte (Drôme), Republic in Guénange (Moselle) or still Vert-Bois in Saint-Dizier

(37)

(Haute-Marne).

Class 3: Small peripheral districts Address

The 103 small peripheral districts addresses include 8% QPV and have the particularity to accumulate a significant share of housing old (36%) and social housing (64%). These neighborhoods have very little addresses of more than 20 dwellings (10% of addresses), whereas they are located in urban units with more than 200,000 inhabitants for the vast majority of them. In average, 14 minutes are required to reach the center of the urban unit by car from the center of the neighborhood, hence the character periphery of these neighborhoods. The rate of the vacancy rate is very low (2%). They are mainly located in departments of Nord and Pas-de-Calais. The neighborhoods more representative are Cité Le Jard à Vieux- Condé (North), Mace-Darcy to Hénin-Beaumont (Pas-de-Calais), 3 Cities in Mazingarbe (Pas-de-Calais) or the city center of Fresnes-sur-Escaut (North).

Class 4: Neighborhoods HLM peripheral of large urban units

The 288 peripheral HLM districts are all located in urban units of more than 200,000 inhabitants and concentrate 23% QPV. Although located in large urban units, these neighborhoods are rather close to the city center. On the other hand, they are far from the town hall of their commune (11 minutes away, against 4 on average in the priority neighborhoods of politics of the city), so rather located on the outskirts of the commune. The share of social housing is important, as the major addresses of more than 20 dwellings.

These neighborhoods are concentrated in the regions housing large urban units, including Auvergne-Rhône-Alpes, Ile-de-France and Provence-Alpes-Côte d'Azur. Districts emblematic of this class are Ariane-The Manor in Nice (Maritime Alps), plateau de Haye on the communes of Nancy and Maxéville (Meurthe-et-Moselle), Blémont in Paris or Mermoz in Lyon.

Class 5: HQ neighborhoods in remote suburbs large urban units

The class of suburban HLM neighborhoods of 230 neighborhoods and 18% QPV, shares a lot of characteristics with the peripheral HLM districts: they are all located in large units urban areas, have a significant share of social housing and great addresses, and few old homes. It is especially the centrality that distinguishes the two classes: HLM neighborhoods in remote suburbs are 33 minutes away from the center of the unit urban, versus 16 minutes for neighborhoods HLM peripherals. More than a half of the neighborhoods in this class are located in Ile-de-France, and,

(38)

more generally, these neighborhoods are concentrated in the very large urban units. The most representative districts are Notre-Dame sailors in Martigues (Bouches-du-Rhône), Les Oliveaux in Loos (North), Ville-Nouvelle in Rillieux-la-Pape (Rhône), or the Courtilles à Gennevilliers (Hauts-de-Seine).

Class 6: The other “nd”

There are some QPVs which do not belong to any class above, The class of “other” of 323 neighborhoods and 25% QPV, these QPVs do not have any similar characteristics of other class.

4.4 Demographic estimation indicators

There are 84 demographic estimation indicators which DSAU department research and would like to estimate for local policy makers. The statistics make the research base mainly on the need of local policy makers, belong with this, local policy makers would like to have more knowledge about their disadvantage priority areas in order to propose the proper actions which can help residents in QPV can get out of QPV and have better economics situation. In order to cover many aspects, which come from the social, educated, career, gender, using public transportation which will provide the deep view to QPV, therefore, the 84 indicators have prepared by the discussing of statistics and local policy makers to gain the final goal.

The indicators are related to age, gender, studying level, career level, number housing member, and so on. To be more specific, I would like to explain the group of indicators as below, but a comprehensive list is given in Annex 1.

The age of the population: is divided into five small indicators such as the proportion of population from 0 to 14 years, from 15 to 24, from 25 to 59, from 60 to 74, and more than 75 years old.

The population is also divided by gender, male and female, and the age ratio of men and women is also divided into 5 groups as above. Citizenship is also taken into account in each QPV, the number of foreigners, and the age and gender of foreigners is estimated in the project.

These are some characteristics of the population considered. In terms of work, job indicators, unemployment, employment rate, and age of workers also need to be estimated. The groups of working as permanent, or temporary, or part-time, or executive, or intermediate, or just manual workers, are also mentioned, in order to have a clearer view of labor activities of people in QPVs. In addition, the educational levels that people living in QPV have ever obtained, such as graduating from middle school, or obtaining high school, or higher level degree, or no qualifications at all are collected and so is the proportion with diplomas for women, and foreigners. We are also interested

(39)

in the proportion of the 15-24 years old who are studing, and its share between men and women and foreigner.

Residents of QPVs are considered to use public transportation, or using their own cars, and if there is a private car, which percentage of households have one or more cars in each QPV.

Concerning households and the number of people in a household, indicators need to be estimated such as the proportion of houses with 1 person, 2 person, 3 person, 4-5 person or 6 person or more; the rate of houses with 1 resident is the elderly from 60-74, 1 resident is aged 75 or older, the proportion of houses with 1 person is a woman who has a foreigner who has a house, etc. The list of the demographic estimation is in Appendix 1.

(40)

Chapter 5: The estimation methodologies apply in QPV using French census

5.1 Direct Estimation - DSAU

The direct estimators are based on the survey data provided only by the considered area. The family of direct estimators gathers the estimator proposed by Horvitz and Thompson (1952) also called π-estimator, the generalized regression estimators (Särndal, Swensson and Wretman, 1992) and the calibration estimators (Deville and Särndal, 1992). From the beginning, the Horvitz and Thompson is applied in DSAU follow 4 steps.

In general, the QPV will belong to big cities, small cities or both. The QPV which belong to small cities: the survey in 5 years is completed, each year 20% survey is applied, all information is collected. In France, there are approximately 36000 small cities, and 1000 big cities.

The QPV which belong to big cities: each year will survey 8%, after 5 years, census survey 40%. It has to use a sample weight. The method goes through four steps following, take an example of estimating the population of each QPV

For example, the estimation of the number of people in each QPV less than 14 years old, the step are as following:

Step 1: Estimating the total number of people in each QPV using individual sample weight. For instance, the total number of people from 0 to 14 will be the weight some of people aged less than 14 years old in the QPV.

Step 2: Estimating the number of dwelling in each QPV by using the dwelling sample weights. Step 3: Calculating the average number of people from 0 to 14 in the dwelling (the estimated number of people divided by the estimated number of dwelling).

Step 4: We have the real number of dwelling from administrative offices. The total number of people from 0 to 14 in each QPV will be equal to multiple average number of people from 0 to 14 ( in step 3) to the real number of dwelling from administrative offices.

The real number of dwelling come from register address office, which is the most accuracy information.

The proportion of people from 0 to 14 in each QPV will be equal to the total number of people from 0 to 14 in each QPV divided to the total number of people in this QPV.

In case when the QPV belong to both big and small cities, all the information is known for people living in small cities; for the other part, we processed as described above.

(41)

accuracy is based on mean of coefficient of variation (CV). Following the report, there are 75% of “good results” with CVs lower than 15%, this results can be published in INSEE report. There are 25% of “not good results”, with CVs higher than 15%, and can not be published in the report. It is necessary to find other methods which improve the precision for these indicators.

The R program for direct estimation - DSAU and the results are in Appendix 2. The R program for CV and the results are in Appendix 3.

The example of the small group includes 15 QPVs by using direct estimation - DSAU to estimate the proportion of foreigner, we have these results

In the first stage of the project, direct estimation has applied for eighty-four indicators, and the precise of the method base on the coefficient of variation (CV). Following the report, there are 75% of the results are “good” which mean CVs are lower than 15%, this results can be published in INSEE report. There are 25% of the results are “not good”, which have CVs are higher than 15%, this results can not be published in the report, it is necessary to find the other methods which improve the precision is case sampling is small.

0.00% 10.00% 20.00% 30.00% 40.00% QP001001 QP001003 QP001005 QP001007 QP001009 QP002002 QP002004 QP002006 3.6% 1.2% 2.6% 3.5% 6.7% 12.3% 20.8% 28.7% 30.4% 35.1% 21.9% 20.1% 27.6% 28.4% 22.1%

(42)

5.2 The Horvitz - Thompson estimation

The Horvitz - Thompson estimator directly uses the sample weights 1/π is a direct estimator.

Applying it in our project, the Horvitz - Thompson estimate method has used in 84 indicators along with 1296 QPV.

The direct estimation - DSAU and The Horvitz and Thompson give the accuracy result in case sampling is large. After applying the Horvitz - Thompson estimator, in some QPV, the sampling is small, the coefficient of variables are still high. The direct estimation and the Horvitz - Thompson estimator are not the best choice, it is necessary to use the auxiliary information.

For example, in order to estimate the proportion of foreigner in small group QPV, the proportion of foreigner will be calculated by dividing the number of foreigner to the number of people in each QPV. 0.00% 10.00% 20.00% 30.00% 40.00% QP001001 QP001003 QP001005 QP001007 QP001009 QP002002 QP002004 QP002006 3.61% 1.28% 2.64% 3.51% 6.74% 12.37% 20.82% 28.75% 30.49% 35.14% 21.93% 20.17% 27.61% 28.42% 22.15% Horvitz-Thompson

Riferimenti

Documenti correlati

To cite this article: Natascia Giuliano, Maria Laura Annunziata, Francesca Giovanna Esposito, Salvatore Tagliaferri, Andrea Di Lieto, Giovanni Magenes, Maria Gabriella Signorini,

The list of issues put up for consultation in the Green Paper is the result of a series of activities, such as a comparative analysis on how the target directives are applied in

  In   particolare   si   cerca   di   comprendere  se  la  pubblicazione  dei  risultati  abbia  avuto  un  impatto  significativo  in   termini  di  rendimenti  e

This paper develops a quantitative, dynamic, open-economy framework which generates high exchange rate volatility, and analyzes the role of nominal rigidities (in the form of

La seconda parte del capitolo ha invece lo scopo di approfondire tre aspetti strettamente collegati al processo di sviluppo delle zone economiche speciali,

(Received 27 August 2018; revised manuscript received 20 November 2018; published 21 December 2018) We demonstrate that combining standing wave (SW) excitation with resonant

Lo studio è finalizzato alla definizione dei requisiti del SIU (Sistema Informativo Urbano) che vengono descritti in funzione degli obiettivi che si vogliono raggiungere e