• Non ci sono risultati.

Interviewer effects and the measurement of financial literacy

N/A
N/A
Protected

Academic year: 2021

Condividi "Interviewer effects and the measurement of financial literacy"

Copied!
29
0
0

Testo completo

(1)

150

|

wileyonlinelibrary.com/journal/rssa J R Stat Soc Series A. 2021;184:150–178.

1

|

INTRODUCTION

Insufficient saving and poor financial decision-making are major policy concerns, particularly in the face of increasingly complex financial markets and increasing reliance on individual financial provi-sion for old age. While these concerns have been raised for decades (see Engen, Gale & Scholz, 1996; Skinner, 2007), recent research has highlighted the limitations of households' decision processes. One explanation for poor financial decisions that has attracted considerable interest is a lack of financial

O R I G I N A L A R T I C L E

Interviewer effects and the measurement of

financial literacy

Thomas F. Crossley

1,2

|

Tobias Schmidt

3

|

Panagiota Tzamourani

3

|

Joachim K. Winter

4

1European University Institute, Fiesole, Italy

2Institute for Fiscal Studies, London, UK 3Deutsche Bundesbank, Frankfurt, Germany

4University of Munich, Munich, Germany Correspondence

Thomas F. Crossley, Department of Economics, European University Institute, Villa La Fonte, Via delle Fontanelle 18, 50014 San Domenico di Fiesole (FL), Italy. Email: tfcrossley@gmail.com

Abstract

In this paper, we ask whether interviewers influence the answers to a standard set of survey questions on financial literacy. We study data from Germany's wealth survey, the Panel on Household Finances (PHF). We have access to extensive auxiliary data, including interviewer identifiers, background characteristics of interviewers and measures of interviewer activity through the survey. We find that inter-viewer effects explain a significant fraction of the variance of the financial literacy score, and intra-interviewer correla-tions are notably larger for the financial literacy score than for other survey variables. We explore how accounting for interviewer effects can improve estimates of the effects of financial literacy on financial behaviours and outcomes. K E Y W O R D S

financial literacy, interviewer effects, measurement error

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

© 2020 The Authors. Journal of the Royal Statistical Society: Series A (Statistics in Society) published by John Wiley & Sons Ltd on behalf of Royal Statistical Society

(2)

literacy (Hastings et al., 2013; Lusardi & Mitchell, 2014; van Rooij et al., 2011, 2012). Lusardi and Mitchell (2014) define financial literacy as ‘peoples' ability to process economic information and make informed decisions about financial planning, wealth accumulation, pensions, and debt’, while Atkinson and Messy (2012) define a financially literate person as one who has ‘some basic knowledge of key financial concepts and the ability to apply numeracy skills in financial situations’. This emerg-ing literature on financial literacy argues that poor financial literacy is both causally responsible for suboptimal financial choices of households and individuals, and amenable to being altered by public policy.

Much of the current knowledge about the predictors and effects of financial literacy is based on survey data. Lusardi and Mitchell (2008) proposed a short list of questions that can be integrated into existing surveys at low cost. These include questions on interest rate compounding, on the effects of inflation, and on diversification of securities. The premise is that individuals should know the answers to these questions in order to make sound decisions on issues of household finance. Indeed, a variety of studies have shown that measures of financial literacy based on the responses to such simple survey questions are correlated with the quality of households’ financial decisions and also with long-term financial outcomes, even after controlling for socio-economic characteristics and for cognitive ability. This holds for teenagers who are just beginning to make their own financial decisions as well as for young and older adults, and across both developed and developing countries (Hastings et al., 2013; Lusardi & Mitchell, 2014; Lührmann et al., 2015; Mitchell & Lusardi, 2015).

Despite the recent advances in the analysis of financial literacy, measurement error arising from the survey response process is an important concern. Lusardi and Mitchell (2014) summarize studies that use instrumental variables (IV) techniques to estimate models where financial literacy is a right-hand side variable. They observe that IV estimates of the effects of financial literacy in these studies are typically larger than OLS estimates, and conclude that ‘the noninstrumented estimates of financial literacy may underestimate the true effect’ (p. 27).1 While econometric methods such as IV can re-solve endogeneity that arises from measurement error, they are not ideal for several reasons, perhaps the most important of which is the fact that credible instruments are often hard to come by. In this paper, we explore how the survey response process induces measurement error in measures of finan-cial literacy, with the ultimate goals of improving the econometric analysis of the effects of finanfinan-cial literacy and of constructing better survey measures. Specifically, we focus on the role of the survey interviewer.

The survey methodology literature argues that interviewers might affect survey outcomes mainly in one or more of three different ways: unit non-response (differential success in recruiting different kinds of respondents across interviewers), item non-response (differential success across interviewers in obtaining a response to a specific item) and the response itself (Biemer, 1980; Platek & Gray, 1983; West & Blom, 2017; West, Kreuter & Jaenichen, 2013; West & Olson, 2010).2 We highlight the latter channel that is the possibility that interviewers induce differential measurement error. For example, interviewers might help respondents to better comprehend complex survey questions or they might help respondents to find strategies that enhance the reporting of quantities that are not easily recalled. 1In principle, financial literacy measures might suffer from several types of endogeneity. Measurement error in financial literacy will tend to attenuate estimates of the effects of financial literacy on behaviours and outcomes. In contrast, reverse causation, from financial behaviour to financial literacy, would lead simple regressions to overstate the causal effect of financial literacy, as would omitted variables that are correlated with both financial literacy and financial choices. The fact that IV estimates are typically much larger (not smaller) than ordinary regression estimates suggests that measurement error is the key empirical problem.

(3)

It is likely that measurement error induced in such ways is heterogeneous across interviewers. While this possibility has been recognized for some time, the survey methodology literature has mostly fo-cused on the implications for variance estimation and item non-response (Durrant et al., 2010; O'Muircheartaigh & Campanelli, 1998; Schnell & Kreuter, 2005; West, Kreuter, & Jaenichen, 2013). One notable exception is a recent paper by Fischer et al. (2019). The objective of the present paper is to consider the effects of interviewer-induced measurement error on coefficient estimates in regres-sion models. While the application is concerned with survey measures of financial literature, the general approach is applicable to other settings as well.3

As measures of financial literacy are often used in regression models, both as dependent variables and as regressors, it seems important to understand the statistical properties of interviewer-induced measurement error. We develop a tractable analytic framework for thinking about (a) interviewers both as a source of error in survey responses but also as a moderator of respondent errors, (b) the con-sequences of interviewer effects for the kinds of models estimated in the financial literacy literature, and (c) how information on interviewers or interviewer effects might be used to improve estimates of the effects of financial literacy on financial choices and outcomes.

We apply this framework to data on financial literacy collected as part of the German Panel on Household Finances (PHF). The PHF is a large survey on household finance that is representative of the German population. We use data from the first wave of the PHF which was conducted in 2010–2011. The questionnaire focuses on households’ financial and non-financial assets and debts. It includes the standard financial literacy questions on interest rate compounding, the effect of inflation and diversification of securities developed by Lusardi and Mitchell (2008).

The PHF survey data we analyse in this paper are unusual in that they not only allow us to identify the interviewers—we also obtained data on a number of interviewer characteristics, including gender, age and education level, from the survey firm that conducted the fieldwork. From detailed contact records, we are also able to compute measures of interviewers’ contact behaviour and workload. Our analysis will make extensive use of such auxiliary data, and the results highlight the usefulness of auxiliary data (see also Couper, 1998, and Kreuter, 2015). We use these data to test for the indepen-dent interviewer effects and for a moderating effect of interviewers on responindepen-dent error; to explore whether interviewer effects on financial literacy questions are related to interviewer characteristics (including those, such as interviewer experience, that might be controlled by survey field agencies); and to evaluate the strategies for mitigating the consequences of interviewer effects that our analytic framework suggests.

We find significant interviewer effects on the financial literacy score, with intra-interviewer cor-relations notably larger than for other survey variables we examine. We find interview effects on both mean (location) and variance (scale); the later suggests a moderating effect of interviewers on respondent errors. Estimated interviewer effects are weakly related to interview characteristics. There is some evidence that older interviewers elicit responses that indicate higher financial literacy on aver-age, and which are less variable. Different approaches to using the auxiliary data to improve estimates of the effect of financial literacy on outcomes and behaviour give different results. This suggests that the measurement error induced by interviewers has a rich structure.

3The framework that we develop in Section 2 shares much in common with the one developed (independently) by Fischer et al. (2019). Both draw on longstanding results on the effects of measurement error in linear regression models (described for example in Wansbeek and Meijer (2000)). Both papers suggest adapting the standard correction for attenuation bias based on the reliability of the mis-measured variable. Fischer et al. provide simulation evidence on the effects of measurement error and non-response variation induced by interviewers. We provide an actual application. In addition, we propose an alternative approach based on differencing the data to remove interviewer effects which is not discussed in their paper.

(4)

These results have a number of important implications. Most fundamentally, they reinforce the need for providing survey auxiliary data—specifically, interviewer identifiers and perhaps also inter-viewer characteristics—along with any household survey data set. They also highlight the need for more research on the relative importance of interviewer effects across different surveys and survey questions. Finally, as the bias introduced in regression coefficients is difficult to correct after the fact, mitigating interviewer effects appears to be crucial.

The remainder of the paper proceeds as follows. Before turning to the data, we first present a trac-table framework for thinking about interview effects. This is presented in Section 2. In Section 3, we describe odata. Section 4 contains our empirical results. We discuss the implications our results have extensively in the concluding section of this paper.

2

|

STATISTICAL FRAMEWORK

To help organize our interpretation of the data, consider the following statistical framework for think-ing about interview effects. We will first develop a measurement model for financial literacy, which we then combine with a simple model of financial decision–making with financial literacy as the explanatory variable of interest. Both models can include additional regressors, but we will initially suppress them to simplify the exposition.

Beginning with the measurement model, FLij is a measure of the variable we are interested in (financial literacy), with true value FLij= 𝜃 + vi. The subscript i indexes respondents, and j indexes interviewers. The overall mean of true financial literacy is given by 𝜃, and heterogeneity in the true value given by vi, so that V

[

FLij

] = 𝜎2

v. Below we will allow 𝜃 to be a function of observed covariates. Our model of measured responses is:

Response error is 𝜋j𝜔i+ uj, where uj is interviewer-level error and 𝜔i is individual reporting error, which is moderated by interviewers. So interviewers affect both the mean and variance of measure-ment response error.4 An interviewer who (e.g. through clarity in posing the questions) reduces re-spondent error has 𝜋j< 1 (and an interviewer that exacerbates respondent error has 𝜋j> 1.) We assume that 𝜋j, 𝜔i, vi and uj are independent.

A testable restriction on this model is 𝜋j= 1. This restriction implies common within-group vari-ances, and a more familiar error components structure,

with an interviewer error component (uj) and an individual error component (

𝜀i= vi+ 𝜔i )

that contains both error and genuine heterogeneity. Given our assumption of independent errors, 𝜎2

𝜀= 𝜎2v+𝜔= 𝜎 2 v+ 𝜎

2 𝜔. However, as we will show below, the assumption of common within-group variances is rejected in our data.

Carrying on with our more general framework,

(1) FLij= 𝜃 + vi+ 𝜋j𝜔i+ uj

4Brunton-Smith et al. (2017) also present a framework in which interviewers have both scale and location effects.

(2) FLij= 𝜃 + vi+ 𝜔i+ uj,

(5)

with 𝜔i, vi, uj and 𝜋j independent, it is straightforward to show that:

where 𝜇2 𝜋=

(

E[𝜋j])2. The reliability, R, of the financial literacy measure is therefore:

Thus, the reliability of the financial literacy measure depends in part on interviewer effects: R is de-creasing in 𝜎2 u, 𝜎 2 𝜋 and 𝜇 2 𝜋. Now note that:

And:

Taking these results together gives:

so that the intraclass correlation (ICC) is:

And:

The key point here is that (1 − ICC) provides an upper bound on the reliability R of the financial literacy measure, and that this quantity can be estimated with a multilevel model or analysis of variance.

Next, we explore the implications of these results for substantive regressions that contain financial literacy as the independent variable of interest. Suppose the equation of interest is given by:

where yij is typically a measure of financial behaviour, such as stock market participation. We assume that yij is well measured, though we will return to this assumption below. Substituting measured financial literacy for true financial literacy gives:

(3) V[FLij]= 𝜎2u+ 𝜎2v+(𝜎2 𝜋+ 𝜇 2 𝜋 ) 𝜎2 𝜔 (4) R= V[FLij] V[FLij] = 𝜎2 v 𝜎2 u+ 𝜎 2 v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 (5) E[FLij|j]= 𝜃 + uj (6) V[E[FLij|j]]= 𝜎2 u (7) FLij− E[FLij|j]= vi+ 𝜋j𝜔i (8) V[FLij− E[FLij|j]]= V[vi+ 𝜋j𝜔i ] = 𝜎2v+(𝜎2 𝜋+ 𝜇 2 𝜋 ) 𝜎2 𝜔 (9) 𝜎2 u 𝜎2 u+ 𝜎2v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 (10) (1 − ICC) = 𝜎 2 v+ ( 𝜎2 𝜋+ 𝜇 2 𝜋 ) 𝜎2 𝜔 𝜎2 u+ 𝜎2v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 = R + ( 𝜎2 𝜋+ 𝜇 2 𝜋 ) 𝜎2 𝜔 𝜎2 u+ 𝜎2v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 (11) yij= 𝛽0+ 𝛼FLij+ eij (12) yij= 𝛽0+ 𝛼(FLij− 𝜋j𝜔i− uj)+ eij

(6)

The substantive regression is thus subject to the usual measurement error problem that the inde-pendent variable is correlated with components of the error term. Let ̂𝛼 be the OLS estimate of 𝛼 in this equation. It is straightforward to show that the coefficient estimate is attenuated:

Interviewer effects lead to asymptotic bias in the point estimate of the effect of financial literacy on the outcome of interest.

Interviewer effects are often thought of as being similar to design effects or clustering of respon-dents at sampling points.5 As such, they are seen as a challenge to inference: variance estimation must account for the correlation structure. However, with this framework, we emphasize that interviewer effects can also compromise point estimates because they affect the reliability of the measure of a quantity whose effects are being studied (in our case, financial literacy). Note that correlated true heterogeneity in financial literacy (in vi)—as might arise from complex sample designs—does not directly affect the reliability and hence does not lead to inconsistent estimates.6

It is well known that in the presence of measurement error in an independent variable, rescaling the least-squares estimate by the reliability of the measure gives a better estimate (see Goldstein and French (2015) for a recent example). Here rescaling by ICC) improves the estimate because

(1-ICC) gives a lower bound to the reliability.

and:

Intuition may be helped by special cases. If 𝜋j= 1, then 1

(1−ICC)= 𝜎2 v+𝜎2𝜔+𝜎2u 𝜎2 v+𝜎2𝜔 , plim̂𝛼 = 𝛼 𝜎2v 𝜎2 v+𝜎2𝜔+𝜎2u and plim ̂𝛼 1−ICC= 𝛼 𝜎2 v 𝜎2 v+𝜎2𝜔. Again

(1 − ICC) gives an upper bound on the attenuation and 1

(1−ICC) gives a lower

bound to the required correction factor. In particular, note that (1−ICC)

R = 𝜎2 v+𝜎2𝜔 𝜎2 v . Recall that 𝜎2 v is the true variation in financial literacy and 𝜎2

𝜔 is the variance of individual reporting errors. So, for example if the variance of individual reporting errors were 20% of the true variance of financial literacy, (1 − ICC) would overstate the reliability ratio by 20%, and so on. If, further, there is no individual component to reporting error 𝜎2 𝜔= 0 and 1 (1−ICC)= 𝜎2 v+𝜎2u 𝜎2 v

= 1∕R. In this case, plim ̂𝛼

1−ICC= 𝛼. This is the textbook, classical

mea-surement error case, and in this case rescaling by 1

(1−ICC) eliminates the asymptotic bias.

This analysis tells us that rescaling the least-squares estimate by one minus the ICC will improve the estimates; it will do so more effectively if interviewers have little moderating effect on individual reporting errors (𝜋j close to 1 for all j); and it will completely offset the attenuation if there is no indi-vidual component to reporting error.

(13) plim̂𝛼 = 𝛼 𝜎 2 v 𝜎2 u+ 𝜎2v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 = 𝛼R

5Many large surveys are multistage, clustered samples. Respondents are drawn from, and therefore clustered in, small geographical areas which are selected in a first stage of sampling. These geographical areas are often called ‘sampling points’. Sampling points might be small municipalities; in large municipalities, the sampling points might be streets or blocks. See also Schnell and Kreuter (2005).

6Note however that, for a given level of response error, if a complex sampling design led to lower total variance of true financial literacy (smaller 𝜎2v), this would lead to lower reliability.

(14) plim ̂𝛼 1− ICC= 𝛼 𝜎2 v 𝜎2 v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 (15) 𝜎2 v 𝜎2 v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 > 𝜎 2 v 𝜎2 u+ 𝜎 2 v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔

(7)

Estimation is also improved by any ‘within’ transformation that eliminates the interviewer effect. For example, one can take deviations of individual responses from interviewer means (or equivalently condition on interview dummies). Denoting deviations from interviewer means by Δj, we have:

Let ̂𝛼Δ be the estimate of 𝛼 obtained from least-squares estimation of the transformed equation. Our independence assumptions imply that E[vi|j]= E[vj]= 0 and similarly for 𝜔i, so that 𝜎2

Δv= 𝜎

2 v and similarly for 𝜔i. It is then straightforward to show that:

Thus ̂𝛼Δ suffers from less attenuation than the untransformed estimator and in fact is identical to the untransformed estimator scaled by 1

(1−ICC). This means that researchers can improve estimates by

sweeping out interviewer effects if interviewer identifiers are available, or by scaling estimates by a factor that depends on the ICC, and these should have the same effect. A comparison of these two procedures therefore provides a possible test of the measurement error assumptions laid out above.

2.1

|

Adding covariates

The model of interest would typically also involve other covariates, X, and as usual in models with measurement error, the bias is more complicated in the presence of additional covariates. In this case, we have

where X can be a set of covariates (a matrix) and we assume these are well measured. Note that the con-stant vector for the intercept is now subsumed in the matrix X and intercept parameter, 𝛽0, is subsumed in the parameter vector 𝛽. As before assume FL= FL+ v with v independent of FL, X and e. Then:

where MX= I − X(XX)−1X’, and so

Note that if V(v) = 0 (there is no measurement error) then plim(𝛼ols)= 𝛼. Also, if M XFL= FL (if FL is orthogonal to X) then (16) yij= 𝛽0+ 𝛼 ( FLij− 𝜋j𝜔i− uj ) + eij (17) Δjy ij= 0 + 𝛼(ΔjFLij− 𝜋jΔj𝜔i− 0) + Δjeij (18) plim̂𝛼Δ= 𝛼 𝜎 2 v 𝜎2 v+ ( 𝜎2 𝜋+ 𝜇2𝜋 ) 𝜎2 𝜔 (19) y= 𝛼FL+ 𝛽X + e (20) 𝛼OLS =[FLMXFL]−1[FLMXy] =[(FL+ v)MX(FL+ v)]−1[(FL+ v)MX(𝛼FL+ 𝛾X + e)] (21) plim(𝛼ols)=[V(MxFL∗)+ V (v)]−1[𝛼V(MxFL∗)] (22) plim(𝛼ols)= 𝛼[V(FL∗)+ V (v)]−1[V(FL∗)]= 𝛼R

(8)

where R is the reliability of FL. However, if FL is not orthogonal to X then the appropriate rescaling involves an adjustment for X, and will vary with the choice of covariates. However, the appropriate ICC can be obtained either by adding the same X covariates to the model of observed FL used to calculate the

ICC, or (equivalently) by pre-purging observed FL with the orthogonal projection matrix MX’, and the proceeding to calculate the ICC.

2.2

|

Interview effects on outcomes

Now consider the case where the dependent variable is subject to interviewer effects. To ease exposition, we consider the special case where there is no individual variation in response errors (in either financial literacy or the dependent variable), and there are no additional covariates. We now have:

Observed financial literacy (FLij) and outcomes (yij) are given by

where uyj is the interviewer effect on reporting of y

ij. Combining (23) and (24) gives:

As before let ̂𝛼 be the OLS estimate of 𝛼 in this equation. If misreporting of yij and FL

ij is uncorrelated then interviewer effects on the reporting of yij do not cause any additional asymptotic bias.

As above, with no individual variation in misreporting, RFL= 𝜎2v

𝜎2

u+𝜎2v

=(1− ICCFL) and plim ̂𝛼

(1−ICCFL)= 𝛼.

And as before, a within estimator is also consistent: plim̂𝛼Δ= 𝛼. The intuition is that under classical assumptions, measurement error in a dependent variable does not cause inconsistency. It is import-ant to note that the grouped structure (at the interviewer level) of the measurement error in y (uyj) nevertheless leads to a grouped error structure in Equation (25), and this will have consequences for inference. The correlation of errors within interviewers is analogous to the design effected induced by cluster-sampling, and can be treated with similar methods. For further discussion, see Fischer et al. (2019).

The more interesting case is when interviewer effects on financial literacy and outcomes are cor-related. Denoting cov

( uyj, uj ) by 𝜎uu, we then have: (23) yij= 𝛽0+ 𝛼FLij+ eij (24) FLij= FLij+ uj yij = yij+ u y j (25) yij= 𝛽0+ 𝛼 ( FLij− uj ) + uyj+ eij (26) plim̂𝛼 = 𝛼 𝜎 2 v 𝜎2 u+ 𝜎2v = 𝛼RFL (27) plim̂𝛼 = 𝛼𝜎 2 v+ 𝜎uu 𝜎2 u+ 𝜎 2 v = 𝛼 ( 1− ICCFL+ 𝜎uu 𝜎2 u+ 𝜎 2 v ) plim( 𝛼̂ 1− ICCFL) = 𝛼 ( 1+𝜎uu 𝜎2 v )

(9)

Thus the correction derived from the intra-interviewer correlation of financial literacy may be too large or too small, depending on the sign of the covariance 𝜎uu (a quantity about which we do not have a strong prior). Note though, that in this case, as long as there is variation in financial literacy within interviewers, the within estimator is a superior alternative. As before, denote the a within estimator transformation (for example, deviation from interviewer means) by Δ and note that Δuyj= Δuj= 0 so that ΔFLij= ΔFLij and Δyij= Δyij. Thus, plim̂𝛼

Δ

= 𝛼. The intuition for this result is that the within transformation eliminates interviewer effects on the reporting of both yij and FLij. Thus a difference between the rescaled and within estimates could be indicative of interviewer effects in the outcome y

ij (though it could alternatively be evidence against one or more of the other assumptions about the mea-surement error structure as noted above.) Again, in inference, appropriate allowance must be made for the resulting grouped structure of the error term.

2.3

|

Empirical implementation

In our empirical analysis, we will allow that response errors associated with respondents might be pre-dicted by individual or household characteristics (Xi); similarly, response errors associated with inter-viewers might be associated with interviewer characteristics (Zj). In addition, conditional on financial literacy, individual or household characteristics may predict financial choices (capturing heterogeneity in choice sets or preferences). Thus, our full framework has both an outcome model (for yi) and a measurement model (for financial literacy), and the latter has a multilevel structure:

In our empirical analysis reported below, we begin by estimating the ICC for financial literacy ques-tions and a financial literacy score. We then report estimates of 𝛾u and 𝛾𝜋, which capture the relation-ship between interviewer characteristics and response errors. Finally, we investigate how estimates of 𝛼 are affected by adjusting for the reliability of the financial literacy score or by using a ‘within’ interviewer estimator that eliminates or conditions out the interviewer-specific intercepts.

3

|

DATA

The German ‘Panel on Household Finances (PHF)’ is a face-to-face CAPI survey focused on measur-ing household wealth. It was carried out on behalf of the Deutsche Bundesbank by infas GmbH Bonn in 2010/2011. A detailed description of the survey can be found in Altmann et al. (2020). The PHF field phase consisted of two major parts. In our study, we include only the 1,705 interviews from the first part, since in the second part the allocation of selected households/addresses to interviewers was not completely random. The first part of the PHF field phase started in September 2010 and lasted until February 2011. In part one, 178 interviewers conducted at least one interview. For the analysis, we only consider interviewers with at least three interviews, which reduces our sample to 160 interviewers.

(28) yi= 𝛼FLi+ Xi𝛽 + ei FLij= FLi+ 𝜋j𝜔i+ uj FLi= Xi𝜃 + vi 𝜔i= Xi𝛾𝜔+ ̃𝜔 i uj= Zj𝛾 u + ̃uj ( 𝜋j)2𝜎2 𝜔= exp ( Zj𝛾𝜋+ % ̃𝜋j)

(10)

The PHF survey provides a representative picture of the population of non-institutionalized house-holds in Germany and focuses mainly on their financial and non-financial assets and liabilities (se-cured and unse(se-cured debt).7 It also collects information on income, employment and pensions. The core questionnaire programme is supplemented with, among others, questions about financial literacy. It includes the standard questions on interest rate, the effect of inflation and diversification of securi-ties developed by Lusardi and Mitchell (2008):

The questions are administered to one person in each household, the reference person, who is selected as being the most knowledgeable on the household's finances. We follow Bucher-Koenen and Lusardi (2011) and Bucher-Koenen and Ziegelmeyer (2014) and aggregate the answers of this reference person to these three questions into a ‘financial literacy score’, which is the number of correct answers. Missing values (DK/NA) are treated as incorrect responses in the baseline specification (as either an incorrect re-sponse or a ‘Don't know’ indicate a lack of knowledge). Figure 1 below shows that almost 68% of respon-dents in our sample provide correct answers to all three literacy questions, 23% get two right and 7.7% one. About 10% have missing values for at least one question (full details are in the Appendix).

The survey is accompanied by a large collection of auxiliary data, among them the interviewers’ id-numbers, as well as interviewers’ gender, age and level of education. From the field work contact protocols and the survey data, we can construct indicators of interviewers’ contact behaviour (average number of contact attempts per case), their performance (number of successful interviews as a share of addresses issued) and two indicators of quality (percentage of DK/NA responses to all survey ques-tions; interview-time in seconds per item/question). All these indicators are meant to characterize the interviewer in general and are therefore calculated based on information from both part one and two of the PHF survey. Summary statistics for interviewer characteristics are provided in the Appendix. 7However, the analysis that follows we employ unweighted samples, as our interest in the effect of these interviewers on this sample.

(1) Let us assume that you have a balance of €100 on your savings account. This balance bears interest at a rate of 2% per year and you leave it for 5 years on this account. How high do you think your balance will be after 5 years?

1 - More than €102 -1 - Don't know

2 - Exactly €102 -2 - No answer

3 - Less than €102 -3 - Question filtered

(2) Let us assume that your savings account bears interest at a rate of 1% per year and the rate of inflation is 2% per year. Do you think that in one year's time the balance on your savings account will buy the same as, more than or less than today

1 - More -1 - Don't know 2 - The same -2 - No answer

3 - Less than today -3 - Question filtered

(3) Do you agree with the following statement: "Investing in shares of one company is less risky than investing in a fund containing shares of similar companies"?

1 - Agree -1 - Don't know

2 - Disagree -2 - No answer -3 - Question filtered

(11)

To mitigate the possibility that within-interviewer correlations reflect clustering of similar house-holds at sample points, we also control for various observable factors that have been found to be related to financial literacy, as reported, for example in Lusardi and Mitchell (2014). We include respondent's personal socio-demographic characteristics that is age, gender, education, employment status, nationality as well as household characteristics, that is household size and household income.

To put the estimated size of the interviewer effects on financial literacy into perspective, we also estimate interviewer effects for a number of additional survey measures, for example an 11-point Likert scale question on life satisfaction, a question on total household net income (in Euro) and a question about qualitative inflation expectations (‘change in the general price level’) in the next 12 months.

Finally, we estimate substantive regressions with financial literacy as an explanatory variable. Our choice of models and dependent variables for this exercise follows some of the most prominent examples in the literature: Christelis, Jappelli, and Padula (2010) showed that more financially literate individuals are more likely to own stocks. Bucher-Koenen and Lusardi (2011) showed a positive effect of financial literacy on retirement planning. Van Rooij et al. (2012) showed that financial literacy is positively related to retirement planning and the development of a savings plan and to stock-holding, and through these channels it has a positive effect on wealth accumulation.

4

|

RESULTS

4.1

|

The magnitude and correlates of interviewer effects

We now implement the statistical model developed in Section 2 above. The starting point is the equa-tion for financial literacy:

FIGURE 1 Number of correctly answered financial literacy questions

67.7%

22.9%

7.7%

1.6%

0% 10% 20% 30% 40% 50% 60% 70% 80%

All three quesons

Two quesons

One queson

(12)

We begin by documenting the size of interview effects on the financial literacy score and individual finan-cial literacy items, relative to other typical questions in a household finance survey. Our measure of the size of interviewer effects is the ICC, which we compute from multilevel models. As described in Section 2, the ICC is the ratio of the variance of interviewer effects to the total unexplained variance of the finan-cial literacy score. Our results are presented in Table 1. We estimate linear multilevel models for each variable considered; for binary variables, we also estimate a logit multilevel model for comparison.8

The first column of Table 1 shows the ICC computed from linear multilevel regression models with no covariates (just constants). Starting from the top, the ICC for our financial literacy score indicates that 20% of the variance in this score in the PHF can be attributed to interviewers. The second row shows a similar result (19%) for an alternate binary score which is equal to one if all three financial literacy indi-cators were answered correctly. The third through fifth rows report the ICC for the individual financial literacy questions which contribute to the overall score. The ICC is large for all three questions, so the large ICC in the overall score is not driven by a single question. The ICC is largest for the interest rate question and smallest for the portfolio diversification question. The remaining rows report the ICC for typical variables from a household finance survey. ICCs for other questions are significantly smaller than for the financial literacy questions (though the p-values reported in the second column indicate that almost all are statistically different from zero).

Next we re-estimate the ICC adding a rich set of individual and household characteristics Xi to the model (as in Equation 29). The results are in the third column of Table 1. There are two reasons to do this. First, one possible concern with the results in the first two columns of Table 1 is that ICC is ac-tually capturing within-interviewer correlation in true financial literacy (the v in Section 2) as respon-dents are not randomly assigned to interviewers. In particular, interviewers are typically associated with specific sampling points that differ with respect to a variety of characteristics, so that it is possi-ble that the interviewer effects we estimate are actually design effects.9 Alternatively, interviewers may have differential success in recruiting different kinds of respondent (as noted in the introduction) which again could generate homogeneity within the respondents of each interviewer. This would be an interviewer effect, but one driven by (differential) unit non-response rather than by differential measurement error. Second, as noted in Section 2, if we wish to use the ICC to adjust estimates from a model in which financial literacy is an independent variable, and that model contains individual and household characteristics Xi, then the appropriate ICC must be calculated net of those explanatory variables (again, unless interviewers are randomly assigned and so uncorrelated with household characteristics).

(29) FLij= Xi𝜃 + vi+ 𝜋j𝜔i+ uj

8The logit models are estimated using the melogit command in STATA (StataCorp, 2017), which fits mixed-effects models for binary or binomial responses. The conditional distribution of the response given the random effects is assumed to be Bernoulli, with success probability determined by the logistic cumulative distribution function (see for example Goldstein (2011), McCulloch, Searle, and Neuhaus (2008) or Rabe-Hesketh and Skrondal (2012)). In an alternative specification, the observed binary response can be written in terms of a latent linear response. The individual errors 𝜖ij are distributed as logistic

with mean 0 and variance (pi)2

3 and are independent of uj (where pi is the number, to distinguish it from the parameter j

introduced in Equation (1)). Hence, the intraclass correlation coefficient for this model becomesR= 𝜎2u

(pi)2∕3+𝜎2

u

.

9Studies that have used interpenetrating designs have found that interviewer effects are as large (O'Muircheartaigh and Campanelli, 1998) or larger (Schnell & Kreuter, 2005) than cluster/design/sampling point effects. See also Vassallo et al. (2017) for a discussion of separating interviewer and design effects.

(13)

TABLE 1

Intraclass correlation (interviewer effects) for financial literacy (FL) and other variables

Linear multilevel model

a

Logit multilevel model

a

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c FL s co re : su m o f c or re ct a ns we rs 19 .7 <0.0 1 17. 0 <0.0 1 FL s co re : b in ar y, 1 i f a ll c or re ct 19. 4 <0.0 1 17. 4 <0.0 1 28 .1 <0.0 1 27. 5 <0.0 1 Q1 i nt er es t r at e 21 .8 <0.0 1 20. 8 <0.0 1 39. 0 <0.0 1 38 .6 <0.0 1 Q 2 e ffe ct s o f i nf la tio n 14 .7 <0.0 1 12 .2 <0.0 1 31 .1 <0.0 1 29. 0 <0.0 1 Q3 d iv er sif ic at io n 11 .7 <0.0 1 10. 4 <0.0 1 19. 3 <0.0 1 18 .3 <0.0 1 H as sa vi ng ac co unt s 6. 3 <0.0 1 4.1 <0.0 1 12 .0 <0.0 1 9.3 <0.0 1 H as m ut ua l f un ds 5.9 <0.0 1 3. 2 <0.0 1 11 .0 <0.0 1 7.0 <0.0 1 H as bo nds 3. 3 <0.0 1 2.2 0.0 1 11 .0 <0.0 1 7.5 0.0 2 H as sh ar es 8. 5 <0.0 1 6.6 <0.0 1 15. 4 <0.0 1 13 .2 <0.0 1 H as p riv at e p en sio n p la n 5. 0 <0.0 1 4. 3 <0.0 1 7.1 <0.0 1 7.6 <0.0 1 Pr ic e ex pe ct at io ns 6.1 <0.0 1 5. 0 <0.0 1 H as m or tg ag e 0.7 0. 32 1.6 0.1 2 0.9 0. 32 2.7 0.11 H as c re di t c ar ds 11 .1 <0.0 1 6. 5 <0.0 1 14 .7 <0.0 1 11 .3 <0.0 1 H as ca rs 7.3 <0.0 1 1. 5 0.0 9 12 .6 <0.0 1 3. 6 0.1 2 D isc re tio na ry sa vi ng 2.5 <0.0 1 2.2 0.0 1 5. 3 <0.0 1 5. 2 0.0 1 Sa tis fa ct io n w ith l ife 6.0 <0.0 1 2.9 <0.0 1 Se lf-ass ess m en t: r isk 4.4 <0.0 1 3. 5 <0.0 1 (Continues)

(14)

Linear multilevel model

a

Logit multilevel model

a

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c Se lf-ass ess m en t: p at ie nc e 2.5 <0.0 1 2.5 <0.0 1 V isi t to re lig io us ser vi ce 7.2 <0.0 1 6.6 <0.0 1

aICCs (in %) refer to interviewer effects and are estimated from a linear multilevel model and a logit specification, respective

ly. See text for further details.

bIndividual/HH characteristics included (in model with covariates): Reference Person: born in Germany (dummy), female (dummy), a

ge (<35, 35–44, 45–54, 54–64 and 65+), employment (1—

gainfully employed, 2—self-employed and 3—other), education (1—low, 2—medium and 3—high); household Characteristics: gross hous

ehold income (quintiles), HH size (1, 2, 3 and 4+), Stratum

indicator. cValues of

p for the null hypothesis that ICC is equal to zero, or, equivalently, that the variance of the interviewer effects is equal t

o zero, are based on a likelihood ratio test comparing the model

with the variance component against the model without the variance component. In a case such as this, where there is one varia

nce being set to 0 in the reduced model, the limiting distribution of the

maximum-likelihood estimate of the parameter in question is a normal distribution that is truncated at the boundary (zero here)

and the distribution of the LR test statistic is a 50:50 mixture of a

chi-squared with no degrees of freedom (that is, a point mass at zero) and a chi-chi-squared with 1 degree of freedom (see Self & Lia

ng, 1987, or StataCorp, 2017, page 17).

TABLE 1

(15)

Note that adding a rich set of covariates has almost no effect on the ICCs for the financial literacy variables. Design effects would arise primarily from the clustering of similar households at sampling points, and should therefore be diminished by controlling for a rich set of respondent observables. Consistent with this view, the ICCs for many other variables do fall significantly when we control for observables (see, e.g. car ownership and life satisfaction.) As a further check, we also estimated a cross-classified model for the financial literacy score, allowing for both interviewer and sampling point effects. Without covariates this produced an ICC of 19.2 for interviewers—almost identical to the 19.7 reported in Table 1—and an ICC of 0.9 for sampling points. For a cross-classified model with additional covariates resulted in an ICC of 16.7 for interviewers—almost identical to the 17.0 reported in Table 1—and an ICC of 2.2 for sampling points. All these findings support our interpretation of the ICC for financial literacy variables as measuring genuine interviewer effects.

The right-hand side of Table 1 repeats this analysis using logit multilevel models where the variable in question is binary. The key point is that the pattern of results is very similar, with ICCs for financial literacy items being much larger than for other common variables.

Table 2 repeats this analysis for the non-response in the financial literacy questions. In Table 1, the Don't Knows/No Answers (DK/NA) are treated as incorrect answers. To check the effect of non-re-sponse, we constructed an indicator taking the value of 1 if the question had a DK/NA and 0 otherwise and computed the ICC's for these indicators. The ICC's are generally much smaller than those reported for the financial literacy questions in Table 1, except for the portfolio diversification question. This suggests that for the first two questions very little of the interviewer variation in Table 1 is driven by the DK/NA responses.

We next add interviewer-level characteristics to the multilevel model for the financial literacy score to investigate whether the variance attributable to interviewers is associated with observed interviewer characteristics. This could provide useful information to fieldwork agencies and survey managers. The results are presented in Table 3. We considered three different specifications. The first, in Column 1, includes only basic interviewer characteristics: gender, age and education. In Column 2, we add mea-sures of interviewer behaviour and outcomes, and in Column 3, we add indicator variables for sampling strata. In all three models, we continue to include the rich set of respondent-level covariates. We find few significant interviewer-level predictors of the mean, with only interviewer age being statistically significant.

The statistical model in Section 2 allows for the possibility that interviewers moderate response errors, and do so with differing ability. This generates an interaction between individual mean responses and an interviewer parameter (𝜋j𝜔i); there are interviewer effects on the variance as well as the mean. Given our assumption that:

we can test for interview effects on the scale (or variance), Ho: 𝜋j= 𝜋, by testing the equality of the vari-ance of the residuals across interviewers:

We implemented this test for the financial literacy score (the number of correct answers) and for each individual financial literacy item. In every case, the null is rejected at conventional levels of statistical significance. We interpret these results as evidence of interview effects on the variance (not just the mean), which in turn lends support to the idea that interviewers moderate individual response errors, perhaps with heterogeneous skill.

(30) 𝜎2

𝜔and 𝜎 2

vare constant across interviewers j,

(31) Ho:(𝜋j)2𝜎2

𝜔+ 𝜎 2

(16)

TABLE 2

Intraclass correlations (interviewer effects) for non-response to financial literacy questions

Linear multilevel model

a

Logit multilevel model

a

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c

ICC model with constant only

p-value of likelihood ratio test

c

ICC from model with covariates

b

p-value of likelihood ratio test

c D on 't k no w/no a ns we r i n: A t le as t o ne f in an ci al l ite ra cy qu es tio n 10 .2 <0.0 1 8.8 <0.0 1 30 .1 <0.0 1 29. 0 <0.0 1 Q1 i nt er es t r at e 2.9 <0.0 1 2.3 <0.0 1 24. 6 0.0 1 25. 8 0.0 1 Q 2 e ffe ct s o f i nf la tio n 6.1 <0.0 1 5. 3 <0.0 1 35 .1 <0.0 1 31 .1 <0.0 1 Q3 d iv er sif ic at io n 10. 4 <0.0 1 9.4 <0.0 1 31 .8 <0.0 1 31 .6 <0.0 1

a ICCs (in %) refer to interviewer effects and are estimated from a linear multilevel model and a logit specification respectivel

y. See text for further details.

bIndividual/HH Characteristics included in model with covariates. Reference Person: born in Germany (dummy), female (dummy), Age

(<35, 35–44, 45–54, 54–64 and 65+), employment (1—gainfully

employed, 2—self-employed and 3—other), education (1—low, 2—medium and 3—high); household characteristics: gross household inc

ome (quintiles), HH size (1, 2, 3 and 4+), Stratum indicator.

cValues of

p from a likelihood ratio test of whether the ICC is different from zero testing the model with a variance component against the

(17)

TABLE 3 Linear multilevel models for financial literacy score with interviewer characteristics

Model 1 Model 2 Model 3

INT: female −0.085 −0.057 −0.061 (0.057) (0.056) (0.055) INT: age 45–64 0.098 0.065 0.064 (0.060) (0.058) (0.057) INT: age 65+ 0.186** 0.183** 0.178** (0.071) (0.069) (0.068) INT: medium education (‘Mittlere Reife’) 0.043 0.061 0.048

(0.090) (0.086) (0.083) INT: high education (‘Abitur, Hochschule’) 0.07 0.084 0.064

(0.087) (0.082) (0.081)

INT: DK/NA percentage −5.167* −4.973*

(2.235) (2.186) INT: number of contact attempts −0.000* −0.000*

(0.000) (0.000) INT: share of interviews in number of

addresses received −0.376(0.322) −0.149(0.323) INT: interview-time in seconds per item 0.002 0.002

(0.005) (0.005)

Stratum: wealthy small municipality 0.139**

(0.050) Stratum: large municipality, wealthy street 0.108

(0.056) Stratum: large municipality, other street 0.112*

(0.055) Household-level control variables Included Included Included

Constant 2.162*** 2.296*** 2.219*** (0.123) (0.207) (0.205) Random effects σ2 u (interviewer) 0.063*** 0.053*** 0.049*** (0.011) (0.10) (0.010) σ2 ε (respondent) 0.335*** 0.336*** 0.336*** (0.012) (0.012) (0.012) N 1,705 1,705 1,705 Wald test 196.24 213.59 226.62 p-value 0.00 0.00 0.00 AIC 3182.7 3179.5 3177.0 BIC 3318.7 3337.3 3351.1

aLinear multilevel model estimates. All regressions include individual/HH characteristics: reference person: born in Germany (dummy),

female (dummy), Age (<35, 35–44, 45–54, 54–64 and 65+), employment (1—gainfully employed, 2—self-employed and 3—other), education (1—low, 2—medium, 3—high); household characteristics: gross household income (quintiles), HH size (1, 2, 3 and 4+).

(18)

To explore these interviewer effects on scale further, we fit the logarithm of the squared residuals from the linear multilevel model for the financial literacy score (including covariates) reported in Table 1 to interviewer characteristics:

to estimate

Under the maintained assumption that 𝜎2

𝜔 is constant across interviewers, the coefficients 𝛾 𝜋 are proportional to the interviewer effects on variance. The results are reported in Table 4. We find more significant predictors of the interviewer effects on variance (scale) than we did for interviewer effects on mean (level). Interviewer age and percentage of DK/NA responses associated with the surveys conducted by an interviewer are statistically significant. Note, however, that we again fail to explain much of the cross-interviewer variation.

The bottom line from this analysis seems to be that there are significant interviewer effects, in both mean and variance, but we find little evidence of a relationship between these effects and observable interviewer characteristics.

4.2

|

Estimating the effects of financial literacy on financial outcomes

We now consider the impact of these interviewer effects on estimates of the impact of financial literacy on a variety of financial outcomes—that is, we put the financial literacy measure on the right-hand side of ‘substantive’ regressions (corresponding to the first line in Equation 28). Recall that the statistical model developed in Section 2 implies two things: First, that the interviewer ef-fects lead to attenuation of estimates of the efef-fects of financial literacy on financial behaviours and outcomes, and that, so long as outcomes are well-measured, the degree of attenuation is independ-ent of the outcome under study, yij. Second, the attenuation can be reduced either by sweeping out the interviewer effects with an appropriate within estimator (as in Equation 17, to give ̂𝛼Δ above), or by rescaling the OLS estimates by the ICC (to give 𝛼

(1−ICC)). Furthermore, under our maintained

assumptions, either approach should give the same answer as long as we deal with covariates ap-propriately). Thus comparing a′ and 𝛼

(1−ICC) to uncorrected estimates (𝛼) reveals the impact of the

interviewer effects on estimates of the effect of financial literacy on behaviour and outcomes, while comparing ̂𝛼Δ to 𝛼

(1−ICC) allows us to assess the adequacy of the statistical model developed

in Section 2.

Table 5 presents estimated FL effects from four models of the form given by Equation (34). The financial behaviours we consider are participation in mutual funds, participation in bonds, participation in equities and participation in a private pension. The first column gives OLS esti-mates from a linear model. The second column then corrects these estiesti-mates by dividing by the estimated ICC from Table 1 (allowing for covariates). This of course raises the estimated effects, offsetting the presumed attenuation. By construction, the proportional correction is the same for each outcome.

The third column of Table 5 then presents the ̂𝛼Δ estimates from a ‘within’ estimator that elimi-nates interviewer intercepts by transforming each observation to a deviation from the overall mean for the relevant interviewer. In two of the four cases, the estimated effect increases, consistent with attenuation due to the interviewer effects. This suggests that using auxiliary data to condition on, or (32) ln(residuals2)= Zj𝛾 𝜋 (+error) (33) ln(π2j𝜎2 𝜔)= Zj𝛾 𝜋 (+error)

(19)

otherwise eliminate, interviewer effects is a useful strategy for minimizing the effects of measurement error in empirical studies of financial literacy. However, in no case is the magnitude of the change identical to re-scaling by the ICC. This suggests that the structure of measurement errors in the present data is even richer than we assume. As discussed in Section 2, there are reasons to favour the within estimator over rescaling in this circumstance.

TABLE 4 Log-linear model estimates for within-interviewer variance, interviewer characteristics

Model 1a,b Model 2a,b

INT: female 0.381 0.393 (0.227) (0.223) INT: age 45–64 −0.097 0.006 (0.201) (0.199) INT: age 65+ −0.826** −0.600* (0.292) (0.265)

INT: medium education (‘Mittlere Reife’) −0.209 −0.052

(0.361) (0.324)

INT: high education (‘Abitur, Hochschule’) −0.237 −0.087

(0.340) (0.310)

INT: DK/NA percentage 22.784**

(7.240)

INT: number of contact attempts 0

(0.000) INT: share of interviews in number of addresses received −0.399 (1.135)

INT: interview-time in seconds per item −0.004

(0.018)

Stratum: wealthy small municipality −0.764***

(0.201) Stratum: large municipality, wealthy street −0.883***

(0.218)

Stratum: large municipality, other street −0.281

(0.208) Constant −2.327*** −2.275*** (0.406) (0.655) N 1705 1705 R2 0.039 0.075 AIC 7567.8 7516.3 BIC 7600.4 7587.1

aStandard errors in parenthesis: * p < 0.05, ** p < 0.01, *** p < 0.001.

bThe dependent variable is the logarithm of the squared residuals obtained from a multilevel model regressing the number of correct

answers to the literacy questions on reference person: born in Germany (dummy), female (dummy), Age (<35, 35–44, 45–54, 54–64 and 65+), employment (1—gainfully employed, 2—self-employed and 3—other), Education (1—low, 2—medium and 3—high); household characteristics: gross household income (quintiles), HH size (1, 2, 3 and 4+), stratum indicator.

(20)

5

|

CONCLUSION

We present a tractable model of measurement error arising from interviewer effects. We allow for in-terviewer effects on both the mean and variance of responses. The latter captures heterogeneity in the ability of interviewers to moderate respondent errors. The model clarifies how interviewer effects lead to intra-interviewer correlation of survey responses (similar to the well-known algebra of ICC). We further show that these correlations lead to biased regression coefficients when an independent varia-ble is subject to interviewer effects. We derive a correction factor that rescales regression coefficients so that they are purged from the effects of intra-interviewer correlations. The model is straightforward to estimate if interviewer identifiers are included in the data, but in a regression context the correction factors depend on the covariates in the model (unless interviewers are randomly assigned.) This limits the usefulness, for example of a data producer releasing a single correction factor for users to employ.

Comparing the rescaled coefficient estimates to those obtained from a within-interviewer estima-tor provides an informal test of the validity of our response model. These approaches should give the same result if the multilevel response model we propose is correctly specified. In our data, this test fails. Both approaches suggests that the uncorrected estimates are attenuated, but the empirical differ-ence in corrections implies that, at least in these data, the structure of response error is more compli-cated than allowed for by our model. It seems unlikely that such errors are amenable to econometric correction using instrumental variable techniques (which in their standard implementation require measurement error to be classical).

Our results might be specific to our data. Even though we believe that the financial literacy appli-cation on which we focus is very typical of situations in which interviewer effects might affect applied economic analysis, the importance of interviewer effects should be studied in other fields of applied research that use survey data as well. We thus urge applied researchers to estimate similar models with other data.

Interestingly, we find much larger interviewer effects for financial literacy questions than for a wide range of other questions in the same survey. Financial literacy questions are unusual in that they are testing the respondent's knowledge or ability, and in that it is very likely that the interviewer

TABLE 5 Alternative estimates of the effects of financial literacy on financial behaviours

A B C

Estimated effect

Corrected with ICC from linear multilevel estimation

Estimated effect Basic

model Within (interviewer) estimator

Has mutual funds Fl coeff 0.062*** 0.075 0.055***

s.e. (0.014) (0.016)

Has bonds Fl coeff 0.032*** 0.039 0.036***

s.e. (0.008) (0.010)

Has shares Fl coeff 0.026* 0.031 0.026*

s.e. (0.015) (0.015)

Has private pensions Fl coeff 0.020 0.025 0.048***

s.e. (0.017) (0.018)

(21)

knows the correct answer, and thus is unusually able to guide the respondent. This is not the case, for example with questions about a respondent's financial circumstances (the interviewer does not know the respondent's income or wealth). Surveying the literature on interviewer effects, West and Blom (2017, page 185) note that questions are susceptible to interviewer effects where interviewers have the opportunity to ‘intervene or assist respondents’ and identify ‘attitudinal, sensitive, ambiguous, complex and open-ended questions’ as examples identified by the previous literature. Our findings suggest that an important addition to this list may be questions that test respondent knowledge or skill, and particularly where the interviewer may know the correct answer or learn it in the course of data collection. Other survey questions that share these characteristics include cognitive tests and measures of health literacy. Thus these should be priority areas for further investigation.

In any event, our theoretical analysis and our results suggest that every effort should be made to avoid measurement error arising from interviewer effects from the outset. An attempt to correct them after the fact seems to be feasible only in a best-case scenario in which restrictive assumptions on the process that generates interviewer effects hold and certain survey auxiliary data were released.10 Avoiding or reducing interviewer effects could be achieved either by altering interviewer behaviour or by moving away from personal interviews in favour of self-completion survey modes.

ACKNOWLEDGEMENTS

We thank Irini Moustaki and participants of the 2015 European Survey Research Conference (ESRA), the 2015 Household Finance and Consumption Conference, and seminars at the University of Essex and the London School of Economics for useful comments and suggestions. Crossley acknowl-edges support from the ESRC through the ESRC-funded Centre for Microeconomic Analysis of Public Policy at the Institute for Fiscal Studies (CPP, grant reference ES/M010147/1), through the Research Centre on Micro-Social Change (MiSoC) at the University of Essex, grant number ES/ L009153/1, and through a grant to Essex University for ‘Understanding Household Finance through Better Measurement’ (reference ES/N006534/1). Winter acknowledges support from the Deutsche Forschungsgemeinschaft via SFB/TR 190.

DISCLAIMER

The views expressed in this paper represent the authors’ personal opinions and do not necessarily reflect the views of the Deutsche Bundesbank or its staff.

REFERENCES

Altmann, K., Bernard, R., Le Blanc, J., Gabor-Toth, E., Hebbat, M., Kothmayr L., Schmidt, T., Tzamourani, P., Werner, D., & Zhu, J. (2020). The Panel on Household Finances (PHF) – Microdata on household wealth in Germany.

German Economic Review, 21(3), 373–400. https://doi.org/10.1515/ger-2019-0122

Atkinson, A. & Messy, F. (2012). Measuring Financial Literacy: Results of the OECD / International Network on Financial Education (INFE) Pilot Study. OECD Working Papers on Finance, Insurance and Private Pensions, No. 15, OECD Publishing.

Biemer, P. (1980) A survey error model which includes edit and imputation error. Proceedings of the American Statistical Association, Section on Survey Research Methods, 616–621.

10Survey administrators may be unwilling or unable to release interview identifiers to data users. But they may be willing to release intra-class-correlations for key variables, including financial literacy. Some survey organizations are unwilling or unable to release cluster information but do release variance inflation factors to allow users to account for design effects. Releasing an estimate of intra-interviewer correlation would be similar. Note however that the exact correction required depends on covariates in the regression model (see Section 2).

(22)

Brunton-Smith, I., Sturgis, P. & Leckie, G. (2017) Detecting and understanding interviewer effects on survey data by using a cross-classified mixed effects location-scale model. Journal of the Royal Statistical Society, Series A, 180(2), 551–568. Bucher-Koenen, T. & Lusardi, A. (2011) Financial literacy and retirement planning in Germany. Journal of Pension

Economics and Finance, 10(4), 565–584.

Bucher-Koenen, T. & Ziegelmeyer, M. (2014) Once burned, twice shy? Financial literacy and wealth losses during the financial crisis. Review of Finance, 18(6), 2215–2246.

Christelis, D., Jappelli, T. & Padula, M. (2010) Cognitive abilities and portfolio choice. European Economic Review, 54(1), 18–39.

Couper, M. (1998) Measuring survey quality in a CASIC environment. Proceedings of the Section on Survey Research

Methods of the American Statistical Association.

Durrant, G., Groves, R.M., Staetsky, L. & Steele, F. (2010) Effects of interviewer attitudes and behaviours on refusal in household surveys. Public Opinion Quarterly, 74, 1-36.

Engen, E.M., Gale, W.G. & Scholz, J.K. (1996) The illusory effects of saving incentives on saving. The Journal of

Economic Perspectives, 10(4), 113–138.

Fischer, M., West, B.T., Elliott, M.R. & Kreuter, F. (2019) The impact of interviewer effects on regression coefficients.

Journal of Survey Statistics and Methodology, 7(2), 250–274.

Goldstein, H. (2011) Multilevel statistical models, 4th edition, Hoboken, NJ: John Wiley & Sons Ltd.

Goldstein, H. & French, R. (2015) Differential educational progress and measurement error. Longitudinal and Life

Course Studies, 6, 331–376.

Hastings, J.S., Madrian, B.C. & Skimmyhorn, W.L. (2013) Financial literacy, financial education and economic out-comes. Annual Review of Economics, 2013(5), 347–373.

Kreuter, F. (2015) The use of paradata. In: Engel, U. et al. (Ed.) Improving survey methods. Lessons from recent

re-search. New York: Routledge, pp. 303–315.

Lührmann, M., Serra-Garcia, M. & Winter, J.K. (2015) Teaching teenagers in finance: Does it work? Journal of Banking

and Finance, 54, 160–174.

Lusardi, A. & Mitchell, O.S. (2008) Planning and financial literacy: How do women fare? American Economic Review, 98(2), 413–417.

Lusardi, A. & Mitchell, O.S. (2014) The economic importance of financial literacy: Theory and evidence. Journal of

Economic Literature, 52(1), 5–44.

Mitchell, O. S., & Lusardi, A. (2015). Financial literacy and economic outcomes: Evidence and policy implications. The

Journal of Retirement, 3(1), 107–114. https://doi.org/10.3905/jor.2015.3.1.107

McCulloch, C.E., Searle, S.R. & Neuhaus, J.M. (2008) Linear, generalized, and mixed models, 2nd edition. New York: Wiley Series in Probability and Statistics.

O'Muircheartaigh, C. & Campanelli, P. (1998) The relative impact of interviewer effects and sample design effects on survey precision. Journal of the Royal Statistical Society, Series A, 161(1), 63–77.

Platek, R. & Gray, G.B. (1983) Imputation methodology. In: Madow, W.G., Olkin, I. and Rubin, D.B. (Eds.) Incomplete

data in sample surveys. New York: Academic Press, pp. 255–294.

Rabe-Hesketh, S. & Skrondal, A. (2012) Multilevel and longitudinal modeling using stata, volume II: Categorical

re-sponses, counts, and survival, 3rd edition, College Station, TX: Stata Press.

Schnell, R. & Kreuter, F. (2005) Separating interviewer and sampling-point effects. Journal of Official Statistics, 21(3), 389–410.

Self, S.G. & Liang, K.Y. (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio. Journal

of the American Statistical Association, 82(398), 605–610.

Skinner, J. (2007) Are You Sure You’re Saving Enough for Retirement? Journal of Economic Perspectives, 21(3), 59–80.

Statacorp. (2017) Stata multilevel mixed effects reference manual. Release 15. College Station, TX: Stata Press. Van Rooij, M.C.J., Lusardi, A. & Alessie, R.J.M. (2011) Financial literacy and stock market participation. Journal of

Finance, 101(2), 449–472.

Van Rooij, M.C.J., Lusardi, A. & Alessie, R.J.M. (2012) Financial literacy, retirement planning and household wealth.

Economic Journal, 122, 449–478.

Vassallo, R., Durrant, G. & Smith, P.W.F. (2017) Separating interviewer and area effects using a cross-classified mul-tilevel logistic model: Simulation findings and implications for survey designs. Journal of the Royal Statistical

(23)

Wansbeek, T. & Meijer, E. (2000) Measurement error and latent variables in econometrics. Amsterdam: North-Holland Publishing Company.

West, B.T. & Blom, A.G. (2017) Explaining interviewer effects: A research synthesis. Journal of Survey Statistics and

Methodology, 5(2), 175–211.

West, B.T., Kreuter, F. & Jaenichen, U. (2013) `Interviewer´ effects in face-to-face surveys: A function of sampling, measurement error, or nonresponse? Journal of Official Statistics, 29(2), 277–297.

West, B.T. & Olson, K. (2010) How much of interviewer variance is really nonresponse error variance? Public Opinion

Quarterly, 74, 1004–1026.

How to cite this article: Crossley TF, Schmidt T, Tzamourani P, Winter JK. Interviewer effects and the measurement of financial literacy. J R Stat Soc Series A. 2021;184:150–178. https://doi. org/10.1111/rssa.12617

Riferimenti

Documenti correlati

Findings from multinomial logit models indicate that financial and economic literacy does affect economic policy preferences as predicted: FEL individuals, regardless of their

The training set is constituted by input/output matrices pairs, calculated by a simulator, where the former one reports the time evolution of pollutant source at four differ-

Its focused on the reform of the earnings-related pension scheme, while leaving the national pension essentially unchanged: the normal retirement age remained 65 years,

Book advertisements; Book trade; book market; history of the book; 16th-century books; printed booksellers’ catalogues; early modern book

By contrast, dysfunction of the ATP-binding cassette (ABC)-transporter ABCC6 (cod- ing for the transmembrane protein MRP6 highly expressed in liver and kidney) causes

Six samples were collected from the calcareous nodules (level AI; STOR CH and M ERGL 1989) intercalating to shales belong- ing to the lowermost part of and/or just underlying

Anche dalle domande inerenti le ricadute sull’apprendimen- to, emerge che le opinioni degli studenti tendono a differenziarsi tra autovalutazione e valutazione tra pari: la

We noted that the competitive inhibitor P i protected the enzyme from WRK inactivation at levels of 20 % residual activity (Figure 1A, inset), suggesting that all three