• An algorithm for learning the weights in the network, given a training set of input-output pairs { , }

(1)

Backpropagation Learning Algorithm

• An algorithm for learning the weights in the network, given a training set of input-output pairs { , }

• The algorithm is based on gradient descent method.

• Architecture

: Weight on connection between the unit in layer (l-1) to unit in layer l

x

µ

o

^µ

) l (

w

ij

_i

^th

_j

^th

(2)

Supervised Learning

Supervised learning algorithms require the presence of a “teacher”

who provides the right answers to the input questions.

Technically, this means that we need a training set of the form

where :

• is the network input vector

• is the network output vector

( ) ( ) { ^x ^, ^y ^, ^... ^x

^p

^, ^y

^p

}

L =

¹ ¹

( ^h ^p )

x

^h

= 1 K

( ^h ^p )

y

^h

= 1 K

(3)

Supervised Learning

The learning (or training) phase consists of determining a configuration of weights in such a way that the network output be as close as possible to the desired output, for all examples in the training set.

Formally, this amounts to minimizing the following error function :

where is the output provided by the network when given as input.

( )

∑ ∑

∑

−

=

−

=

h k

h k h

k p

h

h h

y out

E

2 1 2

1

²

1 2

out

h

x

^h

(4)

Back - Propagation

To minimize the error function E we can use the classic gradient – descent algorithm.

To compute the partial derivates , we use the error back propagation algorithm.

It consists of two stages:

• Forward pass : the input to the network is propagated layer after layer in forward direction

• Backward pass : the “ error ” made by the network is propagated backward, and weights are updated properly

w

ij

E ∂

∂

(5)

Dato il pattern µ, l’unità nascosta j riceve un input netto dato da

e produce come output :

∑

=

k

k jk

j w x

h ^µ ^µ

( ) 



 



= 

= ∑

k

k jk j

j

g h g w x

V

^µ ^µ ^µ

(6)

Back-Prop : Updating Hidden-to-Output Weights

where:

( )

( ) ( )

µ µ

µ

µ µ

µ

µ µ

µ

δ η

η η η

η η

j i

i i

ij i i

i

ij k

k k k

k k

ij k ij ij

V

V h

' g O

y

W O O

y

W O O

y

O W y

W W E

∑

∑ ∑

=

−

=

∂

− ∂

=

∂

− ∂

=

 

  −

∂

− ∂

=

∂

− ∂

=

∆

2

2 1

(

^µ ^µ

) ( )

^µ

δ

_iµ

= y

_i

− O

_i

g ' h

_i

(7)

Back-Prop : Updating Input-to-Hidden Weights ( I )

( )

( ) ( )

jk i i

i i

i

jk i i

i i

jk jk

w h h

' g O

y

w O O

y w w E

∂

− ∂

=

∂

− ∂

=

∂

− ∂

=

∆

∑ ∑

µ µ µ

µ µ

η η

η

( ) ( )

jk j j

ij

jk j ij

jk l l

il jk

i

w h h

' g W

w h W g

w W V

w h

∂

= ∂

∂

= ∂

∂

= ∂

∂

= ∂

∂

∂ ∑

µ µ µ µ

µ µ

(8)

Back-Prop : Updating Input-to-Hidden Weights ( II )

Hence, we get :

where :

µ µ µ

k

m m

jm jk

jk j

x

x w w

w h

=

∂

= ∂

∂

∂ ∑

( ) ( ) ( )

( )

µ µ

µ

µ µ µ

µ

µ µ

µ

µ µ

δ η

η

k j

k j ij

i

, i

j ij

i i

, i i

jk

ˆ x

x h ' g W

h ' g W h

' g O

y w

∑

=

−

=

∆

( )

^ij

i i

j

g ' h W

ˆ

^µ

⁼

^µ

∑ ^δ

^µ

δ

(9)

Retropropagazione dell’errore :

• le linee nere indicano il segnale propagato in avanti

• Le linee blu indicano l’errore (i δ) propagato all’indietro

• An algorithm for learning the weights in the network, given a training set of input-output pairs { , }

Backpropagation Learning Algorithm