• Non ci sono risultati.

Inferential Statistics Part A

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics Part A"

Copied!
23
0
0

Testo completo

(1)

Inferential Statistics Part A

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

http://www.dima.unige.it/~rogantin/UnigeStat/

(2)

Part A

Generalities and point estimation

• A1. Introduction

• A2. Population and samples

• A3. Point estimation

• A4. Sample mean and its distribution

• A5. Properties of estimators

• A6. A bit of probability: the law of large numbers

(3)

A1. Introduction to inferential statistics

analysis of the available sample

:

probability

:

information about population or pro- cess

Data collection or experiment simulation – EDA

used to assess unknown parameter values of the whole population Partial observation can be intrinsic to the problem:

- limitations due to costs (time and money) - polls and elections

- invasive experiments (industrial, pharmaceutical) - time forecasts

(4)

From the book

J. Maindonald and J. Brown. Data Analysis and Graphics. Cam- bridge University Press. 2010. Chapter 4. p. 102

A random sample is a set of values drawn independently from a larger population. A (uniform) random sample has the characteristic that all members of the population have an equal chance of being drawn

[. . . ]

The challenge is to use the one sample that is available, together with the assumption of independent and iden- tically distributed sample values, to infer the sampling distribution of the mean

(5)

Probability allows a theoretic model of the variability to predict behavior in not-sampled cases :

• starting from experience,

• formally consistent,

• able to describe the phenomenon,

• able to evaluate the inevitable approximations committed in the transition from partial information of the observed data to statements regarding the entire population or the entire phenomenon

(6)

From the book: Larry Wassermann. All of Statistics. Springer.

2010. Chapter 6.1 p. 87

Statistical inference, or “learning” as it is called in com- puter science, is the process of using data to infer the distribution that generated the data

A typical statistical inference question is:

Given a sample (X1, . . . , Xn) ∼ F, how do we infer F ? In some cases, we may want to infer only some feature of F such as its mean

Often one makes some assumptions about F , for instance i.i.d.

sample

Typically the observed sample is indicated with lower case letters x1, x2, . . . , xn

(7)

Statistics/Data Mining Dictionary

Statistics Computer Science Meaning

estimation learning using data to estimate an unknown quantity

classification supervised learning predicting a discrete Y from X clustering unsupervised learning putting data into groups

data training sample (X1, Y1), . . . , (Xn, Yn)

covariates features the Xi’s

classifier hypothesis a map from covariates to outcomes

hypothesis subset of

a parameter space Θ

confidence interval

interval that contains an unknown quantity with given frequency ...

(8)

A2. Population and samples

Example of populations: the inhabitants of a city, the groceries sold in a particular region

It is important to select the observed sample appropriately, ideally the sample should be:

• representative of the population. Examples:

– if we study the average price of a product the sample should not derive from only supermarkets, but also from small shops

– if we study the blood pressure of the inhabitants of a region the sample should not derive from only hospitalized

• formed by elements mutually independent. In the last exam- ple the sample should not derive from the same families

(9)

A3. Point estimation Let X be a random variable modeling the data

Example Estimation of the mean µ of the systolic blood pressure (mmHg) X in the population. Sample of 8 subjects whose systolic blood pressure is:

x1 x2 x3 x4 x5 x6 x7 x8

125 128 133 136 126 129 131 135

Choice of the estimator – two example:

- sample mean: X = X1+X2n+···+Xn - mid-range:

T = max(X1, X2, . . . , Xn) + min(X1, X2, . . . , Xn) 2

Point estimation: x = 130 e t = 131

Which estimate and estimator should we choose?

The sample mean is the best estimator because it has good statistical and mathematical properties

(10)

A4. Estimation of the population mean Effect of the sample choice: random selection Consider the estimator sample mean X

Which are the possible samples from a population?

Which are the possible values of X? How likely is each value?

A small example

Population: 4 subjects A, B, C, D Sample size: 2

Systolic blood pressure (mmHg) of the 4 subjects – variable X

A B C D

125 129 131 133 µ = 125+129+131+133

4 = 129.5

Each subject (each value) has probability 14 to be drawn

(11)

Aim: estimation of µ by one sample of size 2, using the sample mean estimator

List of all samples and the corresponding sample means

Note that the value of the population mean is not a possible value of ¯X

Each sample has probability 1

16 to be drawn

Ex: the sample mean 131 has prob- ability 163

In practice, only a sample will be drawn!

sample x1 x2 x AA 125 125 125 AB 125 129 127 AC 125 131 128 AD 125 133 129 BA 129 125 127 BB 129 129 129 BC 129 131 130 BD 129 133 131 CA 131 125 128 CB 131 129 130 CC 131 131 131 CD 131 133 132 DA 133 125 129 DB 133 129 131 DC 133 131 132 DD 133 133 133

(12)

Values taken by the estimator X and their probabilities

x 125 127 128 129 130 131 132 133

P (X = x) 1/16 2/16 2/16 3/16 2/16 3/16 2/16 1/16

What probability is there to overestimate the mean value of blood pressure?

What is the probability of being away from the true value for more than 2 mmHg?

Where is the randomness?

Why do we say that X is a random variable?

The randomness is in randomly drawing a sample and ob- taining one of the possible values

The probability of observing a certain sample value is written above and impinges on the assumption that all 16 sample are equally likely (i.i.d. sampling). If the sample scheme changes, then the sample distribution of ¯X changes accordingly

(13)

Sample distribution of X and its mean

125 126 127 128 129 130 131 132 133

0.06 0.08 0.10 0.12 0.14 0.16 0.18



E(X) = 125+2×127+2×128+3×129+2×130+3×131+2×132+133

16 = 129.5

X is centered in the parameter µ we want to estimate

In general: the mean of ¯X is equal to the population mean!

Moreover, the distribution of X is close to µ with high probability when n is “large”

(14)

Sample distribution of X and X and their variances

124 126 128 130 132 134

0.000.050.100.150.200.250.30

124 126 128 130 132 134

0.000.050.100.150.200.250.30

variable X

variable sample mean

X has smaller variance than X

In general for i.i.d. samples:

Var(X) = Var(X) n with n sample size

In the example: Var(X) = Var(X)/2

Var(X) = 1

4 (125− 129.5)2 + (129− 129.5)2 + (131− 129.5)2 + (133− 129.5)2

= 8.75

Var(X) = 1

16 (125− 129.5)2+ 2× (127 − 129.5)2+ 2× (128 − 129.5)2+ 3× (129 − 129.5)2+ 2× (130 − 129.5)2 + 3× (131 − 129.5)2 + 2× (132 − 129.5)2 + (133− 129.5)2

= 4.375

(15)

Summary on the simple mean random variable x1, x2, . . . , xn observed values – sample values

X1, X2, . . . , Xn ∼ F i.i.d. sample random variables with mean µ and variance σ2

E(X1) = µ Var(X1) = σ2 Let X be the sample mean random variable

X =

Pn

i=1 Xi n

It has theoretical mean µ and variance σ2/n E(X) = µ Var(X) = σ2

n Sometimes σ

n is denoted as SEM (standard error of the mean variable)

(16)

The unbiased estimator of the variance of a variable X in the population X1, . . . , Xn ∼ F i.i.d – a sample

σ2 the variance of each Xi, i = 1, . . . , n The estimator

S2 = 1 n − 1

n X

i=1

Xi − X2

is unbiased, i.e. its mean is σ2

The theoretical variance of S2 is Var(S2) = 1

n µ4 n−3n−1σ4

(17)

Sample distribution of S2, estimator of the variance of a variable X, and its mean

Example (continue)

Systolic blood pressure of 4 subjects (variable X)

List of all samples i.i.d. (n = 2) and the corresponding s2.

sample x1 x2 x P2

i=1(xi − x)2

AA 125 125 125 0

AB 125 129 127 8 = (125 − 127)2 + (129 − 127)2 /1 AC 125 131 128 18 = (125 − 128)2 + (131 − 128)2

/1

AD 125 133 129 32 = . . .

BA 129 125 127 8

BB 129 129 129 0

BC 129 131 130 2

BD 129 133 131 8

CA 131 125 128 18

CB 131 129 130 2

CC 131 131 131 0

CD 131 133 132 2

DA 133 125 129 32

DB 133 129 131 8 Each sample has probability

DC 133 131 132 2 161 to be drawn

DD 133 133 133 0

(18)

Values taken by the estimator S2 and their probabilities

s2 0 2 8 18 32

P (S2 = s2) 4/16 4/16 4/16 2/16 2/16

0 5 10 15 20 25 30

0.000.050.100.150.200.250.30

E(S2) = 4 × 0 + 4 × 2 + 4 × 8 + 2 × 18 + 2 × 32

16 = 8.75

S2 is centered in the parameter σ2 we want to estimate

(19)

A5. Properties of the estimators A point estimator T of a parameter θ should be

• unbiased or centered (its mean is θ)

• consistent (unbiased and its variance tends towards 0 as the sample size approaches to infinity)

X is a consistent estimator of µ Example

For estimating the maximum length θ of a Mikado. Suppose that the length is uni- form in (0, θ), or in (a, θ + a) with a known We could consider

• the maximum in the sample max{X1, . . . , Xn}

• twice the sample mean 2X

The sample distributions of the two esti- mators are plotted on the right

0 2 4 6 8 10

0.00.20.40.60.81.0

0 2 4 6 8 10

0.00.20.40.60.81.0

θ

(20)

Aside

Random selection is not the only reason why

• probability models are used to make inference from the sam- ple to a large population

• uncertainty associated to inference is modelled by random samples and probability

Sources of uncertainty

• random selection

• measurement (e.g. in a Lab due to equipment, unit under test, operator, calibration of the measurement instrument, . . . )

• not yet observed process

• intrinsically uncertainty outcome

• . . .

(21)

A6. A bit of probability: the law of large numbers X1, X2, . . . , Xn i.i.d. r. v. with mean µ and variance σ2

Let Sn = X1 + X2 + · · · + Xn be the sum of the sample random variables.

Let Xn be the sample mean random variable Xn = Sn

n with mean µ and variance σ2/n Law of large numbers (LLN)

If the sample size n grows to infinity, the probability that Xn takes values outside the interval (µ − δ, µ + δ) goes to zero, for any positive δ

More precisely: P

|Xn − µ| > δ → 0 if n → ∞

The distribution of Xn becomes more concentrated around µ as n gets large, where µ is both the parameter to be estimated and the expectation (or theoretical mean) of Xn

In other words, the distribution of Xn piles up near µ

Here we add the index n to explicit the dependence on the sample size

(22)

Trusting in the experience and the LLN

The LLN is a theorem that describes the result of performing the same experiment a large number of times

The LLN “guarantees” stable long-term results for the averages of random events

For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the LLN only applies (as the name indicates) when a large number of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be ”balanced” by the other

From Wikipedia https://en.wikipedia.org/wiki/Law_of_large_numbers

(23)

Simulation in R of an experiment with two outcomes:

1 and 0

Example: probability of 1: 0.3 – probability of 0: 0.7

(we will analyze in detail this type of experiment later)

> out=rbinom(10,1,0.3);out ##10 trials [1] 1 0 0 1 0 0 1 0 0 1

> mean(out) [1] 0.4

> out=rbinom(100,1,0.3);out ##100 trials

[1] 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 [33] 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 [65] 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 0 1 0 0 1 1 0 1 0 0 0 0 [97] 0 1 0 0

> mean(out) [1] 0.29

> out=rbinom(1000,1,0.3); mean(out) ##1000 trials [1] 0.31

> out=rbinom(10000,1,0.3); mean(out) ##10000 trials [1] 0.3005

Riferimenti

Documenti correlati

Omitted variable bias means that technology is not truly exogenous but depends on various factors, that are omitted in the model.. - The factors explaining investments in R&D,

The aim of the learning stage is, hence, to find the best areas in order to check the banknote validity by taking into account the training dataset composed by genuine and fake

23 Among the tools most recently available are: OpenDOAR containing 355 registered repositories searchable by country, content and discipline, http://www.opendoar.org/;

In that case, we could say that the external control, and in particular the accounting control entrusted to the external auditor, should not be mandatory, but it should be quite

The aim of this thesis is to analyze both the role of credit rating agencies in financial markets and the regulatory approach of European Union, giving a specific attention to

Two families of non-overlapping coercive domain decomposition methods are proposed for the numerical approximation of advection dominated advection-di usion equations and

Therefore the product of the absolute values of the roots of one of the polynomials g and h is equal

[r]