• Non ci sono risultati.

recognition rate (%)MSE

N/A
N/A
Protected

Academic year: 2021

Condividi "recognition rate (%)MSE"

Copied!
20
0
0

Testo completo

(1)

Theoretical / Practical Questions

ƒ How many layers are needed for a given task?

ƒ How many units per layer?

ƒ To what extent does representation matter?

ƒ What do we mean by generalization?

ƒ What can we expect a network to generalize?

Generalization: performance of the network on data not included in the training set

Size of the training set: how large a training set should be for “good” generalization?

Size of the network: too many weights in a network result in poor generalization

(2)
(3)

(a) A good fit to noisy data. (b) Overfitting of the same data: the fit is perfect on the

“training set” (x’s), but is likely to be poor on “test set” represented by the circle.

(4)

Motivation

The size (i.e. the number of hidden units) of an artificial neural network affects both its functional capabilities and its generalization performance

Small networks could not be able to realize the desired input / output mapping

Large networks lead to poor generalization performance

(5)

The Pruning Approach

Train an over-dimensioned net and then remove redundant nodes and / or connections:

Sietsma & Dow (1988, 1991)

Mozer & Smolensky (1989)

Burkitt (1991)

Adavantages:

arbitrarily complex decision regions

faster training

independence of the training algorithm

(6)

The Proposed Method

Consider (for simplicity) a net with one hidden layer:

Suppose that node h is to be removed:

Remove h (and its in/out connections) and adjust the remaining weights so that the I/O behavior is the same

(7)

This is equivalent to solving the system:

which is equivalent to:

In a more compact notation:

with

LS solution :

( )

(j )

n

h j

j ij ij

) (

j n

j ij

O , P w y w y

n

i μ ∑h μh δ μ

=

= = +

=

=

1 1

1

1K K

) ( )

(μ μ

δ j ih h

h j

ij y = w y

b x

A =

( 1)

×

Pno no nh A

b x A minx

(8)

Detecting Excessive Units

Residual-reducing methods for LLSPs start with an initial solution x0 and produces a sequences of points {xk} so that the residuals

decrease

Starting point:

Excessive units can be detected so that is minimum

k

k b r

Ax =

1

k

k r

r

( r b )

x0 =0 0 =

b

(9)

The Proposed Approach

Instead of analyzing the consistency of the system, we directly solve it in the least squares sense:

The method chosen is a projection method developed by Bjorck & Elfving (BIT, 1979) called CGPCNE

imum min

is b Ax

that such x

FIND k

(10)

The Pruning Algorithm

1) Start with an over-sized trained network

2) Repeat

2.1) find the hidden unit h for which is minimum 2.2) solve the corresponding system

2.3) remove unit h

Until Perf(pruned) – Perf(original) < epsilon

3) Reject the last reduced network

b

(11)

Example I : 4-bit parity

Ten initial 4-10-1 networks

nine 4-5-1

Pruned nets 5 hidden nodes (average)

one 4-4-1

0 10 20 30 40 50 60 70 80 90 100

10 9 8 7 6 5 4 3 2 1

number of hidden units

recognition rate (%)

0 0,05 0,1 0,15 0,2 0,25

MSE

recognition rate MSE

MINIMUM NET

(12)

Example II : 4-bit simmetry

Ten initial 4-10-1 networks

Pruned nets 4.6 hidden nodes (average)

0 10 20 30 40 50 60 70 80 90 100

10 9 8 7 6 5 4 3 2 1

number of hidden units

recognition rate (%)

0 0,05 0,1 0,15 0,2 0,25

MSE

recognition rate MSE

MINIMUM NET

(13)

Optimal Brain Surgeon (OBS)

(14)
(15)
(16)
(17)
(18)

Comparison with OBD

(19)

OBS/OBD comparison on

“stopped” networks

(20)

Riferimenti bibliografici

G. Castellano, A. M. Fanelli, M. Pelillo. A method of pruning feed- forward neural networks. IEEE Transactions on Neural Networks, 1997.

B. Hassibi, D. G. Stork. Second-order derivatives for network pruning: Optimanl Brain Surgeon. In: NIPS 1993.

Riferimenti

Documenti correlati

To construct the neural network model ANN1, the following 82 input features were used: sine and cosine values with one year period, current and prior values for

The global sensitivity analysis for a model of a sewer retention tank with the use of artificial neural networks allowed determining the catchment and sewer system

limitation, it is necessary to ensure that the flue gas temperature at the economizer outlet is greater than the lower limit temperature at which the SCR device is put into

In this work, we developed a new method for measuring the leaf water mass using a THz QCL, based on a leaf sta- tistical sampling and a linear equation which relates the leaf

•  The size (i.e. the number of hidden units) of an artificial neural network affects both its functional capabilities and its generalization performance. •  Small

 decoder network: deconvolutional layers to obtain the output image from the extracted features. Convolutional

Input-output map: the network learns by examples based on a sequence of input- output data (i.e. supervised learning).. No a priori assumption is made on the model

Dispositional resilience optimal regulation (OR) and openness to life experiences (OL) dimensions, perceived physical health (PHYS), personal satisfaction about relationships