Backpropagation Learning Algorithm
• An algorithm for learning the weights in the network, given a training set of input-output pairs { , }
• The algorithm is based on gradient descent method.
• Architecture
: Weight on connection between the unit in layer (l-1) to unit in layer l
x
µo
µ) l (
w
iji
thj
thSupervised Learning
Supervised learning algorithms require the presence of a “teacher”
who provides the right answers to the input questions.
Technically, this means that we need a training set of the form
where :
• is the network input vector
• is the network output vector
( ) ( ) { x , y , ... xp, y
p }
L =
1 1( h p )
x
h= 1 K
( h p )
y
h= 1 K
Supervised Learning
The learning (or training) phase consists of determining a configuration of weights in such a way that the network output be as close as possible to the desired output, for all examples in the training set.
Formally, this amounts to minimizing the following error function :
where is the output provided by the network when given as input.
( )
∑ ∑
∑
−
=
−
=
=h k
h k h
k p
h
h h
y out
y out
E
2 1 2
1
21 2
out
hx
hBack - Propagation
To minimize the error function E we can use the classic gradient – descent algorithm.
To compute the partial derivates , we use the error back propagation algorithm.
It consists of two stages:
• Forward pass : the input to the network is propagated layer after layer in forward direction
• Backward pass : the “ error ” made by the network is propagated backward, and weights are updated properly
w
ijE ∂
∂
Dato il pattern µ, l’unità nascosta j riceve un input netto dato da
e produce come output :
∑
=
k
k jk
j w x
h µ µ
( )
=
= ∑
k
k jk j
j
g h g w x
V
µ µ µBack-Prop : Updating Hidden-to-Output Weights
where:
( )
( )
( )
( ) ( )
µ µ
µ
µ µ
µ
µ µ
µ µ
µ µ
µ µ
µ µ
µ µ
µ
δ η
η η η
η η
j i
j i
i i
ij i i
i
ij k
k k k
k k
ij k ij ij
V
V h
' g O
y
W O O
y
W O O
y
O W y
W W E
∑
∑
∑
∑ ∑
∑ ∑
=
−
=
∂
− ∂
=
∂
− ∂
=
−
∂
− ∂
=
∂
− ∂
=
∆
2
2 1
(
µ µ) ( )
µδ
iµ= y
i− O
ig ' h
iBack-Prop : Updating Input-to-Hidden Weights ( I )
( )
( ) ( )
jk i i
i i
i
jk i i
i i
jk jk
w h h
' g O
y
w O O
y w w E
∂
− ∂
=
∂
− ∂
=
∂
− ∂
=
∆
∑ ∑
∑ ∑
µ µ µ
µ µ
µ µ
µ µ
η η
η
( ) ( )
jk j j
ij
jk j ij
jk j ij
jk l l
il jk
i
w h h
' g W
w h W g
w W V
w W V
w h
∂
= ∂
∂
= ∂
∂
= ∂
∂
= ∂
∂
∂ ∑
µ µ µ µ
µ µ
Back-Prop : Updating Input-to-Hidden Weights ( II )
Hence, we get :
where :
µ µ µ
k
m m
jm jk
jk j
x
x w w
w h
=
∂
= ∂
∂
∂ ∑
( ) ( ) ( )
( )
µ µ
µ
µ µ µ
µ
µ µ
µ
µ µ
δ η
δ η
η
k j
k j ij
i
, i
j ij
i i
, i i
jk
ˆ x
x h ' g W
h ' g W h
' g O
y w
∑
∑
∑
=
=
−
=
∆
( )
iji i
j
j