FIAT 500 C, 1953
Regression
Regression
Segment C: price vs. weight/power
20,000 25,000 30,000 35,000 40,000 45,000
12,00 13,00 14,00 15,00 16,00 17,00 18,00 19,00
w eight/pow er (g/W)
price (M£)
How is our preferred car positioned?
Rover 45
Segment C: price vs. weight/power
20,000 25,000 30,000 35,000 40,000 45,000
12,00 13,00 14,00 15,00 16,00 17,00 18,00 19,00
w eight/pow er (g/W)
price (M£)
Regression
trend
How is our preferred car positioned?
Rover 45
Regression
Definition Let Ψ be a class of curves of IR
2. We call trend or regression line of the set A in the class Ψ the element C
*of Ψ that lies at a
minimum distance from A
For any x, y ∈ IR
2, let d(x, y) be the distance of x from y based on some metric, with
d(x, y) > 0, d(x, y) = 0 ⇔ x = y (non negativity)
d(x, y) = d(y, x) (symmetry)
d(x, y) + d(y, z) > d(x, z) (triangle inequality) Let C be a curve of IR
2, x ∈ IR
n, A discrete and finite ⊆ IR
2Definition d(x, C) = min{d(x, y): y ∈ C} is called distance of x from C Definition d(A, C) = Σ d(x, C) is called distance of A from C
x∈A
1
|A|
normal to C by Pk
xk yk
d
k= d((x
k, y
k), C) = |y
k– f(x
k)|
On the metric
Definition Let Ψ be a class of curves of IR
2. We call trend or regression line of the set A in the class Ψ the element C
*of Ψ that lies at a
minimum distance from A The metric chosen depends on the
situation modeled
1) If in an experiment all the record entries are affected by error, it is convenient to define the distance as the length of the shortest segment touching both P
kand C
2) If uncertainty affects just the
dependent variable y, one can choose the absolute value of the difference between the y-coordinate of P
kand the value attained by the curve in x
kx y
Pk
f(xk) d((xk, yk), C)
xk yk
Linear regression
Definition Let Ψ be a class of curves of IR
2. We call trend or regression line of the set A in the class Ψ the element C
*of Ψ that lies at a
minimum distance from A If the curve is a straight line y = c
1x + c
0, we speak of linear regression.
Consider case (2), in which the distance of P
k= (x
k, y
k) ∈ A from C is given
by the absolute value of the difference between the y-variable of P
kand the value attained by the line in x
k, namely
d
k= |y
k– c
1x
k– c
0|
x y
c1xk – c0 dk
Pk
xk yk
Linear regression
If the curve is a straight line y = c
1x + c
0, we speak of linear regression.
Consider case (2), in which the distance of P
k= (x
k, y
k) ∈ A from C is given
by the absolute value of the difference between the y-variable of P
kand the value attained by the line in x
k, namely
d
k= |y
k– c
1x
k– c
0|
x y
c1xk – c0 dk
Basically, the distance d
kof P
kfrom C depends on c
0and c
1Also the choice of the straight line depends on these two coefficients.
Hence a line at a minimum distance from the point set {P
1, P
2, …, P
m} is one given by those c
0and c
1that minimize
Pk
(d
1+ d
2+ … + d
m)
1 m
xk yk
Linear regression
x y
c1xk – c0 dk
Basically, the distance d
kof P
kfrom C depends on c
0and c
1Also the choice of the straight line depends on these two coefficients.
Hence a line at a minimum distance from the point set {P
1, P
2, …, P
m} is one given by those c
0and c
1that minimize the sum d
1+ d
2+ … + d
mFormulation:
min d
1+ … + d
md
1> y
1– c
1x
1– c
0d
1> c
1x
1+ c
0– y
1…
d
m> y
m– c
1x
m– c
0d
m> c
1x
m+ c
0– y
md
k= |y
k– c
1x
k– c
0|
Pk
xk yk
Linear regression
x y
c1xk – c0 dk
Basically, the distance d
kof P
kfrom C depends on c
0and c
1Also the choice of the straight line depends on these two coefficients.
Hence a line at a minimum distance from the point set {P
1, P
2, …, P
m} is one given by those c
0and c
1that minimize the sum d
1+ d
2+ … + d
mFormulation:
min d
1+ … + d
md
1+ x
1c
1+ c
0> y
1d
1– x
1c
1– c
0> – y
1…
d
m+ x
mc
1+ c
0> y
md
m– x
mc
1– c
0> – y
md
k= |y
k– c
1x
k– c
0|
Pk
∂ d
2(c
0, c
1)
∂ c
0= – Σ
m2( y
k– c
1x
k– c
0)
k=1
1 m
c0 z = d2(c0, c1)
c1
Linear regression
c1*
c0*
* The surface z = d
2(c
0, c
1) is an up-concave paraboloid of IR
3d
2(c
0, c
1) = (d
1 12+ … + d
m2)
m
The minimum (c
0*, c
1*) ∈ IR
2of d
2(c
0, c
1) can be computed by annealing the partial derivatives in c
0and c
1∂ d
2(c
0, c
1)
∂ c
1= –
1Σ 2x
k( y
k– c
1x
k– c
0)
m
m
= ( c
1k=1Σ x
k2+ c
0Σ x
k– Σ x
ky
k)
m
k=1
m
k=1
m
k=1
2 m
= ( c
1Σ x
k– Σ y
k) + 2c
0m
k=1
m
k=1
2 m
xuyw = 1
Σ
xkuykwm
m
k=1
x c
0+ x
2c
1= xy
c
0+ x c
1= y
Alternatively, one can define the distance d(A, c
0, c
1) = d(c
0, c
1) of y = c
1x + c
0from the set A via
d
k2= (y
k– c
1x
k– c
0)
2= ( c
1Σ x
k2+ c
0Σ x
k– Σ x
ky
k)
m k=1
m k=1
m k=1
2 m
= ( c
1Σ x
k– Σ y
k) + 2c
0m k=1
m k=1
2 m
c0 z = d2(c0, c1)
c1
Linear regression
c1*
c0*
* This is known as the least squares method d
2(c
0, c
1) = (d
1 12+ … + d
m2)
m
xuyw = 1
Σ
xkuykwm
m
k=1
x c
0+ x
2c
1= xy
c
0+ x c
1= y
d
k2= (y
k– c
1x
k– c
0)
2Solving the linear systems returns the values of c
0*and c
1*, and hence the regression line
y = c
1*x + c
0*c0 z = d2(c0, c1)
c1
Linear regression
c1*
c0*
* This is known as the least squares method d
2(c
0, c
1) = (d
1 12+ … + d
m2)
m
d
k2= (y
k– c
1x
k– c
0)
2x y
d12
d42
d32
d22
P3 P2
P1
P4
xk yk
Polynomial regression
Let us now pass to a curve of the form y = c
tx
t+ c
t–1x
t–1+ … + c
1x + c
0If we set:
x y
f(xk) dk
Pk
d
k= |y
k– c
tx
kt t–1– c
t–1x
k– … – c
0| the parameters of C can be obtained by solving the LP
min d
1+ … + d
md
1> y
1– c
tx
1– c
t–1x
1– … – c
0d
m> y
m– c
tx
m– c
t–1x
m– … – c
0d
1> c
tx
1+ c
t–1x
1+ … + c
0– y
1d
m> c
tx
m+ c
t–1x
m+ … + c
0– y
mt t–1
t t–1 t t–1
t t–1
…
x y
Other non linear models
d
k= |g
–1(y
k) – c
1f
1(x
k) – … – c
tf
t(x
k)|
can be determined by solving the LP y = g [ Σ c
if
i(x) ]
t i=1
min d
1+ … + d
md
1> g
–1(y
1) – c
1f
1(x
1) – … – c
tf
t(x
1) d
1> c
1f
1(x
1) + … + c
tf
t(x
1) – g
–1(y
1)
…
d
m> g
–1(y
m) – c
1f
1(x
m) – … – c
tf
t(x
m) d
m> c
1f
1(x
m) + … + c
tf
t(x
m) – g
–1(y
m) Generalizing, a model of the type
with f
i, g known functions, g monotone invertible and
or, with minimum squares, by defining
d
k2= [g
–1(y
k) – c
1f
1(x
k) – … – c
tf
t(x
k)]
2d
k2= [g
–1(y
k) – c
1f
1(x
k) – … – c
tf
t(x
k)]
2and solving the relevant
linear system
7,2
23,2
62,9
122,8
203,2
0 50 100 150 200 250
1810 1850 1890 1930 1970 2010
Example
Let us try and estimate by the linear model y = c
1x + c
0the growth of US population
?
min
Σ
|yk – c1xk – c0|m k=1
Example
7,2
23,2
62,9
122,8
203,2
-26,6
23,2
73,0
122,8
172,6
222,4
-50 0 50 100 150 200 250
1810 1850 1890 1930 1970 2010
The outcome is clearly unacceptable: it even provides a negative value for the year 1810!
min
Σ
|yk – c1xk – c0|m k=1
7,2
23,2
62,9
122,8
203,2
0 50 100 150 200 250 300
1810 1850 1890 1930 1970 2010
Example
Let us then try an exponential model of the type y = e
c1x + c0?
min
Σ
|log(yk) – c1xk – c0|m k=1
7,2
23,2
62,9
122,8
203,2
10,1
23,2
53,4
122,8
282,5
0 50 100 150 200 250 300
1810 1850 1890 1930 1970 2010
Example
In practice, this amounts to apply a linear regression to the points (x
k, log(y
k))
The model forecast is that in 2010 the US population will reach 650 million
min
Σ
|log(yk) – c1xk – c0|m k=1
Example
The minimum squares method provides a slightly reduced trend, and a forecast of 589,1 million inhabitants for the year 2010
7,2
23,2
62,9
122,8
203,2
9,1
21,0
48,3
111,2
256,0
0 50 100 150 200 250 300
1810 1850 1890 1930 1970 2010
min
Σ
|log(yk) – c1xk – c0|m k=1
Linear regression in IR n
x
1x
2The locus of points x of IR
nwhose scalar product by a given versor
u = (u
1, …, u
n) returns a known value β is a hyperplane H
The points of H fulfill the linear equation
u
1x
1+ … + u
nx
n= u
⋅x =
βor equivalently
a
1x
1+ … + a
nx
n= a
⋅x = b where
u
x β
b =
β|a| =
β√ a
21+ … + a
2 nu =
a|a|
If distance is expressed as in case (2), then we can describe the hyperplane by
its explicit form x
n= c
0+ c
1x
1+ … + c
n–1x
n–1ending up with an LP or a linear
system similar to that seen in IR
2Linear regression in IR n
x
1x
2The locus of points x of IR
nwhose scalar product by a given versor
u = (u
1, …, u
n) returns a known value β is a hyperplane H
The points of H fulfill the linear equation
u
1x
1+ … + u
nx
n= u
⋅x =
βor equivalently
a
1x
1+ … + a
nx
n= a
⋅x = b where
β
b =
β|a| =
β√ a
21+ … + a
2 nu =
a|a|
If distance is expressed as in case (1), things get a bit more complicated:
dk
u
x1k x2k ux
k
d
k= |u
⋅x
k–
β|
xk
Linear regression in IR n
x
1x
2The optimization problem reads min d
1+ … + d
mβ
If distance is expressed as in case (1), things get a bit more complicated:
xk dk
u
x1k x2k ux
k
d
k= |u
⋅x
k–
β| d
1> x
11u
1+ … x
n1u
n–
βd
1>
β– x
11u
1+ … x
n1u
nd
m>
β– x
1mu
1+ … x
nmu
nd
m>
β– x
1mu
1+ … x
nmu
nu
12+ … + u
2n= 1
…
and is a non linear problem
Linear regression in IR n
x
1x
2The optimization problem reads min d
1+ … + d
mHowever, the information carried by versor u is redundant
For one, since
βis unconstrained we can limit our attention to the directions belonging to a semi sphere
xk dk
u
x1k x2k
d
1> x
11u
1+ … x
n1u
n–
βd
1>
β– x
11u
1+ … x
n1u
nd
m>
β– x
1mu
1+ … x
nmu
nd
m>
β– x
1mu
1+ … x
nmu
nu
12+ … + u
2n= 1
…
and is a non linear problem
Linear regression in IR n
x
1x
2The optimization problem reads min d
1+ … + d
mHowever, the information carried by versor u is redundant
And in the very end, we are just interested to find a direction: to this purpose, also vectors a with distinct modules can be used
xk dk
a
x1k x2k
d
1> x
11u
1+ … x
n1u
n–
βd
1>
β– x
11u
1+ … x
n1u
nd
m>
β– x
1mu
1+ … x
nmu
nd
m>
β– x
1mu
1+ … x
nmu
nu
12+ … + u
2n= 1
…
and is a non linear problem
u
Linear regression in IR n
x
1x
2min d
1+ … + d
mxk dk
x1k x2k
d
1> x
11a
1+ … x
n1a
n– b d
1> b – x
11a
1+ … x
n1a
nd
m> b – x
1ma
1+ … x
nma
nd
m> b – x
1ma
1+ … x
nma
na
1+ … + a
n= 1, a
i> 0
…
This calls for solving 2
n–1linear problems in the n + 1 variables a
1, …, a
n, b
a
Note that d
k= |u
⋅x
k–
β| = |
⋅x
ak–
β|
|a|
But for any choice of a there exists a real b such that d
k= |a
⋅x
k– b|
ax
k
Efficiency curve
5,4 5,1
5,0 5,6
5,2 consumption (l×100km) 4,8
8,9 10,6
10,4 8,5
9,8 12,4
price (K€)
6 5
4 3
2 1
Model
Let us make the (basically wrong) assumption that clients choose a car adopting a rational behaviour, and, prior to purchase, compute its cost on a five-year basis For the sake of simplicity, suppose that
• costs are given by the car price and by fuel consumption
• the car yearly covers 20.000 km
• a litre of fuel costs 1,3€
cost(k) = price(k) + 5
⋅consumption(k)
⋅200
⋅1,3
Suppose that, since less fuel consumption means less pollution, we are interested
in finding the best solutions under both respects
Efficiency curve
5,4 5,1
5,0 5,6
5,2 consumption (l×100km) 4,8
15,92 17,23
16,90 15,78
16,56 18,64
cost (K€)
6 5
4 3
2 1
Model
costi vs. consumi
4,7 4,8 4,9 5,0 5,1 5,2 5,3 5,4 5,5 5,6 5,7
15,500 16,000 16,500 17,000 17,500 18,000 18,500 19,000 costo (k€)
consumo (lt x 100 km)
We look for a math model giving the cost c(x) associated with consumption x as
c(x) = a/x + bx + c (a, b, c > 0)
We look for a math model giving the cost c(x) associated with consumption x as
c(x) = a/x + bx + c
(a, b, c > 0)
more consumption less technology
more consumption more fuel
costi vs. consumi
4,7 4,8 4,9 5,0 5,1 5,2 5,3 5,4 5,5 5,6 5,7
15,500 16,000 16,500 17,000 17,500 18,000 18,500 19,000 costo (k€)
consumo (lt x 100 km)
Efficiency curve
5,4 5,1
5,0 5,6
5,2 consumption (l×100km) 4,8
15,92 17,23
16,90 15,78
16,56 18,64
cost (K€)
6 5
4 3
2 1
Model
+ bxk + c
a xk
xk yk
d
k= y
k–
a– bx
k– c > 0
xk
c(x) = a/x + bx + c
formulation
Efficiency curve
a + x
1b + x
1c < x
1y
1…
a + x
nb + x
nc < x
ny
na, b, c > 0
2
2
5,4 5,1
5,0 5,6
5,2 consumption (l×100km) 4,8
15,92 17,23
16,90 15,78
16,56 18,64
cost (K€)
6 5
4 3
2 1
Model
min d
1+ … + d
nd
k= y
k–
a– bx
k– c > 0
xk
a = 75,678 b = 0,353 c = 0 a = 75,678 b = 0,353 c = 0
optimal solution
max ( + … + )
x1a + (x
1+ … + x
n) b + c
1
1 xn
Efficiency curve
consumption vs. cost
14,000 16,000 18,000 20,000 22,000
4 4,5 5 5,5 6
consumption (lx100km)
cost (k€)
efficiency curve sample