• Non ci sono risultati.

Hopfield Network

N/A
N/A
Protected

Academic year: 2021

Condividi "Hopfield Network"

Copied!
53
0
0

Testo completo

(1)

Hopfield Network

• Single Layer Recurrent Network

• Bidirectional Symmetric Connection

• Binary / Continuous Units

• Associative Memory

• Optimization Problem

(2)

Hopfield Model – Discrete Case

Recurrent neural network that uses McCulloch and Pitt’s (binary) neurons.

Update rule is stochastic.

Eeach neuron has two “states” : ViL , ViH ViL = -1 , ViH = 1

Usually :

ViL= 0 , ViH= 1 Input to neuron i is :

Where:

wij= strength of the connection from j to i

Vj = state (or output) of neuron j

Ii = external input to neuron i

i j

i

j ij

i

w V I

H = ∑ +

(3)

Hopfield Model – Discrete Case

Each neuron updates its state in an asynchronous way, using the following rule:

The updating of states is a stochastic process:

To select the to-be-updated neurons we can proceed in either of two ways:

At each time step select at random a unit i to be updated (useful for simulation)

Let each unit independently choose to update itself with some constant probability per unit time

(useful for modeling and hardware implementation)

⎪⎩

⎪ ⎨

>

+

= +

<

+

=

= ∑

0 1

0 1

i j i

j ij

i

i j i

j ij

i

i

if H w V I

I V w H

if

V

(4)

Dynamics of Hopfield Model

In contrast to feed-forward networks (wich are “static”) Hopfield networks are dynamical system.

The network starts from an initial state

V(0) = ( V1(0), ….. ,Vn(0) )T and evolves in state space following a trajectory:

Until it reaches a fixed point:

V(t+1) = V(t)

(5)

Dynamics of Hopfield Networks

What is the dynamical behavior of a Hopfield network ? Does it coverge ?

Does it produce cycles ? Examples

(a) (b)

(6)

Dynamics of Hopfield Networks

To study the dynamical behavior of Hopfield networks we make the following assumption:

In other words, if W = (w

ij

) is the weight matrix we assume:

In this case the network always converges to a fixed point.

In this case the system posseses a Liapunov (or energy) function that is minimized as the process evolves.

n ...

j , i w

w

ij

=

ji

∀ = 1

W

T

W =

(7)

The Energy Function – Discrete Case

Consider the following real function:

and let

Assuming that neuron h has changed its state, we have:

But and have the same sign.

Hence

(provided that )

i n

i i

j i n

i n

i j

j

w

ij

V V I V

E ∑ ∑ ∑

=

= =

=

1 1 1

2 1

( ) ( ) t E t

E

E = + −

Δ 1

h

H h

j

w

hj

V

j

I

h

V

E

h

Δ

⎥ ⎥

⎥ ⎥

⎢ ⎢

⎢ ⎢

+

=

Δ ∑

4 2 4 4 3 4 1

H

h

Δ V

h

≤ 0

Δ E

W = W T

(8)

Schematic configuration space

model with three attractors

(9)

Hopfield Net As Associative Memory

Store a set of p patterns xµ, µ = 1,…,p ,in such a way that when presented with a new pattern x, the network responds by producing that stored pattern which most closely resembles x.

N binary units, with outputs s1,…,sN

Stored patterns and test patterns are binary (0/1,±1)

Connection weights (Hebb Rule)

Hebb suggested changes in synaptic strengths proportional to the correlation between the firing of the pre and post-synaptic neurons.

Recall mechanism

Synchronous / Asynchronous updating

Pattern information is stored in equilibrium states of the network

μ μ

μ j p

i

ij

x x

w N

=

=

1

1

⎟ ⎠

⎜ ⎞

⎛ −

= ∑

j

i j

ij

i

Sgn w s

s θ

(10)

Example With Two Patterns

Two patterns

X1= (-1,-1,-1,+1) X2= (+1,+1,+1,+1)

Compute weights

Weight matrix

Recall

Input (-1,-1,-1,+1) → (-1,-1,-1,+1) stable

Input (-1,+1,+1,+1) → (+1,+1,+1,+1) stable

Input (-1,-1,-1,-1) → (-1,-1,-1,-1) spurious

μ μ

μ j i

ij x x

w

=

= 2

4 1

1

=

2 0 0 0

0 2 2 2

0 2 2 2

0 2 2 2

4 w 1

⎟⎠

⎜ ⎞

= ⎛ ∑

j ij j

i Sgn w s

s

(11)

Associative Memory Examples

An example of the behavior of a Hopfield net when used as a content-addressable memory. A 120 node net was trained using the eight examplars shown in (A). The pattern for the digit “3” was corrupted by randomly reversing each bit with a proba- bility of 0.25 and then applied to the net at time zero.

Outputs at time zero and after the first seven iterations are shown in (B).

(12)

Associative Memory Examples

Example of how an associative memory can reconstruct images. These are binary images with 130 x 180 pixels. The images on the right were recalled by the memory after presentation of the corrupted images shown on the left. The middle column shows some intermediate states. A

sparsely connected Hopfield network with seven stored images was used.

(13)

Storage Capacity of Hopfield Network

• There is a maximum limit on the number of random patterns that a Hopfield network can store

P

max

≈ 0.15N If p < 0.15N, almost perfect recall

• If memory patterns are orthogonal vectors instead of random patterns, then more patterns can be stored. However, this is not useful.

• Evoked memory is not necessarily the memory pattern that is most similar to the input pattern

• All patterns are not remembered with equal emphasis, some are evoked inappropriately often

• Sometimes the network evokes spurious states

(14)

Hopfield Model – Continuous Case

The Hopfield model can be generalized using continuous activation functions.

More plausible model.

In this case:

where is a continuous, increasing, non linear function.

Examples

( ) ⎟

⎜ ⎞

⎛ +

=

= ∑

j ij j i

i

i

g u g W V I

V

β β

gβ

( ) ] 1, 1 [

e e

e u e

tanh

u u

u

u

∈ −

+

=

ββ

ββ

β

( ) ] [ 0 1 1

1

2

,

u e

g

u

= +

β

β

(15)

Funzione di attivazione

ß > 1 ß = 1 ß < 1

-1 +1

( )

x tanh

( )

x

f = β

(16)

Updating Rules

Several possible choices for updating the units :

Asynchronous updating: one unit at a time is selected to have its output set Synchronous updating: at each time step all units have their output set

Continuous updating: all units continuously and simultaneously change their outputs

(17)

Continuous Hopfield Models

Using the continuous updating rule, the network evolves according to the following set of (coupled) differential equations:

where are suitable time constants ( > 0).

Note When the system reaches a fixed point ( / = 0 ) we get

Indeed, we study a very similar dynamics

( ) ⎟

⎜ ⎞

⎛ +

+

= +

= ∑

j ij j i

i i

i i

i

V g u V g w V I

dt dV

β

τ

β

τ

i

τ

i

dV

i

dti

( )

i

i

g u

V =

β

( )

j i

j ij

i i

i

u w g u I

dt

du = − + ∑

β

+

τ

(18)

The Energy Function

As the discrete model, the continuous Hopfield network has an “energy” function, provided that W = WT :

Easy to prove that

with equality iff the net reaches a fixed point.

( )

∑ ∑ + ∑ ∫

=

i i i

i j i

V j

i

ij

V V g V dV I V

w

E

0i 1

2

1

β

≤ 0

dt

dE

(19)

Modello di Hopfield continuo (energia)

Perché è monotona crescente e .

N.B.

cioè è un punto di equilibrio

( ) ( )

( )

0

2 1 2

1

2 1

1

⎟ ≤

⎜ ⎞

′ ⎛

=

=

⎟⎟⎠

⎜⎜ ⎞

⎛ − +

=

− +

=

− +

=

dt u du

g

dt du dt dV

I u V

dt w dV

dt I dV dt

V dV g

dt V w dV

dt I dV dt

V dV dt g

V dV w dt V

w dV dt

dE

i i

i i

i i i

i

i i j j

ij i

i

i i

i i

i i

j i ij

ij

i i

i i

i i

j i ij

ij j

i ij

ij

β

β

β

τ τ

g

β

τ

i

> 0

0

0 ⇔ =

= dt

du dt

dE i

u

i

(20)

Modello di Hopfield continuo (relazione con il modello discreto)

Esiste una relazione stretta tra il modello continuo e quello discreto.

Si noti che :

quindi :

Il 2o termine in E diventa :

L’integrale è positivo (0 se Vi=0).

Per il termine diventa trascurabile, quindi la funzione E del modello continuo diventa identica a quello del modello discreto

( )

i

( ) ( )

i i

i

g u g u g u

V =

β

=

1

β ≡ β

( )

i

i

g V

u = 1

1

β

( ) V dV

g

i

i

i

∑ ∫

V 0

1

1

β

β

(21)

Optimization Using Hopfield Network

ƒ

Energy function of Hopfield network

ƒ

The network will evolve into a (locally / globally) minimum energy state

ƒ

Any quadratic cost function can be rewritten as the Hopfield network Energy function. Therefore, it can be minimized using Hopfield network.

ƒ

Classical Traveling Salesperson Problem (TSP)

ƒ

Many other applications

2-D, 3-D object recognition

Image restoration

Stereo matching

Computing optical flow

i i

i j

i

i j

ij

V V I V

w

E = ∑ ∑ ∑ 2

1

(22)

The Traveling Salesman Problem

Problem statement: A travelling salesman must visit every city in his territory exactly once and then return to his starting point. Given the cost of travel between all pairs of cities, find the minimum cost tour.

ƒ

NP-Complete Problem

ƒ

Exhaustive Enumeration:

nodes, enumerations, distinct enumerations

distinct undirected enumerations

Example:

n = 10, 19!/2 = 1.2 x 1018

n n !

(

n −1

)

!

( )

2 1 ! n

(23)

The Traveling Salesman Problem

TSP: find the shortest tour connecting a set of cities.

Following Hopfield & Tank (1985) a tour can be represented by a permutation matrix:

1 2 3 4

A 0 1 0 0

B 1 0 0 0

C 0 0 1 0

D 0 0 0 1

Tour: BACD

(24)

The Traveling Salesman Problem

The TSP, showing a good (a) and a bad (b) solution to the same problem

Network to solve a four-city TSP. Solid and open circles denote units that are on and off respectively when the net is representing the tour 3-2-4-1.

The connections are shown only for unit n22; solid lines are inhibitory connections of strength –dij, and dotted lines are uni- form inhibitory connections of strength –γ.

All connections are symmetric. Thresholds are not shown.

(a) (b)

1

2 3 4

1 2 3 4

Stop

City

d12

d23

(25)

Artificial Neural Network Solution

Solution to n-city problem is presented in an n x n permutation matrix V

X = city

i = stop at wich the city is visited Voltage output: VX,i

Connection weights: TXi,Yj n2 neurons

VX,i = 1 if city X is visited at stop i

dXY= distance between city X and city Y

(26)

Artificial Neural Network Solution

Data term:

We want to minimize the total distance

Constraint terms:

Each city must be visited once

Each stop must contain one city

The matrix must contain n entries

(

1 1

)

1 = 2

∑ ∑ ∑

X,i Y,i+ + Y,i

X Y X i

XY V V V

D d E

∑ ∑ ∑

=

X i j i VX ,i VX ,j

E A

2 2

∑ ∑ ∑

=

i X Y X

i , Y i ,

X V

B V E3 2

2

4 2 ⎟

⎜ ⎞

⎛ −

=

∑ ∑

X i VX,i n E C

(27)

Artificial Neural Network Solution

A, B, C, and D are positive constants

Indici modulo n Total cost function

La funzione energia della rete di Hopfield è:

(

1 1

)

2

2 2 2 2

+

+ +

⎟ ⎠

⎜ ⎞

⎛ −

+ +

=

∑ ∑ ∑

∑ ∑

∑ ∑ ∑

∑ ∑ ∑

i , Y i

, Y i , X

X Y X i

XY

X i X ,i

i X Y X X ,i Y ,i

X i j i X ,i X ,j

V V

V D d

n C V

V B V

V A V

E

Xi i

X

Xi Yj

Xi Y

X i j

Yj ,

Xi

V V I V

T

E = ∑ ∑

2

1

(28)

Weights of the Network

The coefficients of the quadratic terms in the cost function define the weights of the connections in the network

{Inhibitory connection in each row}

{Inhibitory connection in each column}

{Global inhibition}

{Data term}

{Corrente esterna eccitatoria}

( )

( )

(

1 1

)

1 1

+

+

=

i , j i

, j XY

XY ij

ij XY

Yj , Xi

Dd C

B A T

δ δ

δ δ

δ δ

⎩ ⎨

= =

j i if

j i if

ij

0

δ 1

n

Xi

C

I =

(29)

Experiments

10-city problem, 100 neurons

Locations of the 10 cities are chosen randomly with uniform p.d.f. in unit square

Parameters: A = B = 500, C = 200, D = 500

The size of the squares correspond to the value of the voltage output at the corresponding neurons.

Path: D-H-I-F-G-E-A-J-C-B

(30)

TSP – Un’altra formulazione

Un altro modo per esprimere i vincoli del TSP (cioè matrice di permutazione) è il seguente :

vincolo sulle righe

vincolo sulle colonne

La funzione energia diventa :

Vantaggio : minor numero di parametri (A,B,D)

2

2

1

2 ∑ ∑ ⎟

⎜ ⎞

⎛ −

=

X i

V

Xi

E A

2

3

1

2 ∑ ∑

⎜ ⎞

⎛ −

=

i X

V

X i

E B

(

1 1

)

2 3

2 d V V V E E

E D

X i Y i Y i

X Y X i X Y

+ + +

=

+

∑ ∑ ∑

(31)

Problema delle n regine

Si può costruire una rete n x n in cui il neurone ( i, j ) è attivo se e solo se una regina occupa la posizione ( i, j )

Ci sono 4 vincoli :

1.

solo una regina in ciascuna riga

2.

solo una regina in ciascuna colonna

3.

solo una regina in ciascuna diagonale

4.

esattamente n regine sulla scacchiera

(32)

Problema delle n regine

La matrice dei pesi si determina così :

A ≡ peso inibitorio sulle righe B ≡ peso inibitorio sulle colonne C ≡ peso inibitorio “globale”

D ≡ peso inibitorio sulle diagonali

( )

( )

(

i j,k l i j,k l

) (

ik

)

k i l

j

k i l j l

k , j i

D C B A T

δ δ

δ

δ δ

δ δ

− +

+

+

+

=

+

+

1

1

1

(33)

Reti di Hopfield per ottimizzazione

Problemi associati con il modello di Hopfield originale :

1) numero di connessioni O(n4) e numero di unità O(n2) 2) difficoltà per la determinazione dei parametri A, B, C, D

3) le soluzioni ottenute sono raramente “matrici di permutazione”

4) difficoltà nell’evitare i minimi locali

(34)

The Maximum Clique Problem (MCP)

You are given:

An undirected graph G = (V,E) , where - V = {1,….,n}

- E V x V

and are asked to

Find the largest complete subgraph (clique) of G

The problem is known to be NP-hard, and so is problem of determining just the size of the maximum clique. Pardalos and Xue (1994) provide a review of the MCP with 260 references.

(35)

The Maximum Clique Problem (MCP)

Affrontando il problema MCP in termini di rete neurale:

Trasformare MCP da problema discreto a problema continuo

Nell’esempio del TSP con il modello di Hopfield, non è detto che ci sia il percorso inverso (potremmo ottenere ad esempio una matrice che non ha significato); in questo nuovo problema MCP, la bidirezionalità è d’obbligo.

(36)

Some Notation

Given an arbitrary graph G = (V,E) with n nodes:

If C V, xc will denote its characteristic vector which is defined as

Snis the standard simplex in Rn :

A=(aij) is the adjacency matrix of G:

⎩⎨

⎧ ∈

= otherwise C i if xic C

, 0

, 1

⎭⎬

⎩⎨

⎧ ∈ = ≥ ∀

=

= n

i

i i

n

n x R x and x i

S

1

, 0 1

:

⎩⎨

= ⎧

otherwise v v

aij if i j

, 0

~ ,

1

(37)

The Lagrangian of a graph

Si consideri la funzione continua:

dove è il vettore trasposto e A è la matrice di adiacenza.

Lagrangiano del grafo:

esempio:

( )

1 1

n n

ij i j

i j

f x x A x a x x

= =

= ′ =

∑ ∑

' x

( )

,

i j

i j E

f x x x

=

1

2 3

(

1

,

2

,

3

)

1 2 1 3

f x x x = x x + x x

(38)

The Motzkin-Straus Theorem

(39)
(40)

Continuous Formulation of MAX-CLIQUE

Il ponte che crea Motzkin-Straus è unidirezionale; solo se il vettore restituito è nella forma di vettore caratteristico allora c’è bidirezionalità.

Nell’esempio visto ci sono due massimi globali :

Si dimostra che sono massimi globali anche tutti i punti del segmento x’ - x’’

ovvero tutti i punti ; non essendo vettori caratteristici (soluzioni spurie) non è possibile estrarre la clique massima. La soluzione con- siste nel sommare alla diagonale principale di A

1 1 1 1

, , 0 , 0,

2 2 2 2

T T

x′ =⎛⎜ ⎞⎟ x′′ =⎛⎜ ⎞⎟

⎝ ⎠ ⎝ ⎠

[ ]

1 1

, , 0,1

2 2 2

α α T α

∀ ∈

1 2 1

A′ = +A 2 I f x

( )

= x A xT

( )

1

2

f x = xT ⎛⎜⎝ A + I x⎞⎟⎠

(41)

The regularized Motzkin-Straus Theorem (Bomze, 1997)

Teorema

Dato C V e xc vettore caratteristico allora:

- C è una clique massima di G xc è un massimo globale di in - C è una clique massimale di G xc è un massimo locale di in - tutti i massimi locali sono stretti e sono vettori caratteristici

f Sn

f Sn

(42)

Evolutionary Games

Developed in evolutionary game theory to model the evolution of behavior in animal conflicts.

Assumptions

A large population of individuals belonging to the same species which compete for a particular limited resource

This kind of conflict is modeled as a game, the players being pairs of randomly selected population members

Players do not behave “rationally” but act according to a pre-programmed behavioral pattern, or pure strategy

Reproduction is assumed to be asexual

Utility is measured in terms of Darwinian fitness, or reproductive success

(43)

Notations

is the set of pure strategies

is the proportion of population members playing strategy at time

The state of population at a given instant is the vector

Given a population state , the support of , denoted , is defined as the set of positive components of , i.e.,

{

1, ,

}

J = L n

i

( )

x t i t

(

1

, ,

n

)

x = x L x ′

x x σ ( ) x

x

( ) { x i J : x

i

0 }

σ = ∈ >

(44)

Payoffs

Let be the payoff (or fitness) matrix.

represents the payoff of an individual playing strategy against an opponent playing strategy .

If the population is in state , the expected payoff earnt by an – strategist is:

while the mean payoff over the entire population is:

( )

ij

A = a n n ×

a

ij

i

j (

i j, J

)

x i

( ) ( )

1 n

i ij j i

j

x a x Ax π

=

= ∑ =

( ) ( )

1 n

i i i

x x x x Ax

π π

=

= ∑ = ′

(45)

Replicator Equations

Developed in evolutionary game theory to model the evolution of behavior in animal conflicts (Hofbauer & Sigmund, 1998; Weibull, 1995).

Let be a non-negative real-valued matrix, and let

Continuous-time version:

Discrete-time version:

( )

ij

W = w n n ×

( ) ( )

1 n

i ij j

j

t w x t

π

=

= ∑

( ) ( ) ( ) ( ) ( )

1 n

i i i j j

j

d x t x t t x t t

dt π π

=

⎛ ⎞

= ⎜ ⎜ ⎝ − ∑ ⎟ ⎟ ⎠

( ) ( ) ( )

( ) ( )

1

1

i i

i n

j j

j

x t t

x t

x t t

π

=

π

+ = ∑

(46)

Replicator Equations & Fundamental Theorem of Selection

is invariant under both dynamics, and they have the same stationary points.

Theorem: If , then the function

is strictly increasing along any non-constant trajectory of both continuous-time and discrete-time replicator dynamics

W = W ′

( )

F x = x W x

S

n

(47)

Mapping MCP’s onto Relaxation Nets

To (approximately) solve a MCP by relaxation, simply construct a net having units, and a -weight matrix given by

where A is the adjacency matrix of G.

Example:

1 2

n

W = + A I

{ } 0,1 n

(48)

Mapping MCP’s onto Relaxation Nets

The system starting from u(0) will maximize the Motzkin-Straus function and will converge to a fixed point u*which corresponds to a (local) maximum of .

The value

can be regarded as an approximation of the maximum clique size.

Con –measure si misura la qualità

dove è il termine di confronto rispetto alla media, è la replicator equation e è il valore ottimale. Quando 1 il risultato è buono.

f

1 ( )

1 2 k

f u

=

ave RE

ave

f f

Q f α

= −

f

ave

f

RE

α

Q

Q

(49)

Experimental Setup

Experiments were conducted over random graphs having:

size: = 10, 25, 50, 75, 100

density: = 0.10, 0.25, 0.50, 0.75, 0.90

Comparison with Bron-Kerbosch (BK) clique-finding algorithm (1974).

For each pair ( , ) 100 graphs generated randomly with size and density ≈ . The case = 100 and = 0.90 was excluded due to the high cost of BK algorithm.

Total number of graphs = 2400.

Values of Q-measure for various sizes and densities

n δ

n δ n δ

n δ

δ n 10 25 50 75 100

0.10 0.99 (54) 0.99 (36) 0.99 (53) 0.97 (59) 0.92 (82) 0.25 0.99 (54) 0.99 (64) 0.99 (84) 1.00 (98) 0.97 (112) 0.50 1.00 (56) 0.99 (118) 0.99 (153) 0.96 (160) 0.90 (187) 0.75 1.00 (99) 1.00 (175) 1.00 (268) 1.00 (284) 1.00 (369) 0.90 1.00 (119) 1.00 (224) 1.00 (367) 0.99 (513) ----

(50)

Isomorfismo di grafi

(51)
(52)

Risultati

(53)

Riferimenti

Documenti correlati

The shallow marine deposits related to MIS 5e, which unconformably overlie a pre-Neogene substratum, re- cord a fining- and deepening-upwards trend abruptly truncated by

To better understand the differences of the solutions compared in this chapter, Figure 1 shows the position of the applied vertical force (in the centre bowl)

Formation of the first and second types of aggregates occur 2 and 4 orders of magnitude more rapidly than unfolding of the native monomer under the same conditions,

This because the weight k acts as an incentive for the manager of the delegated firm to produce more than the rival but the market price paid by consumers is the same for

Eighty formalin- fixed, paraffin-embedded mCRC samples belonging to TNM stage IV (76 samples) and III (4 samples) that further progressed to TNM stage IV were analyzed for the

La dose esatta di ioexolo somministrata è stata determinata dalla differenza tra il peso della siringa prima e dopo l’iniezione ; più precisamente, dalla

Il presente Volume, oltre a rendere fruibili in forma cartacea i risul- tati emersi dalle giornate del Convegno (la cui registrazione è parzial- mente disponibile on line sul sito

Ciò che mi pare affiori dal confronto, per così dire, ‘interno’ ai testi (ossia a quelli riferiti a Servio, come contrapposti a quelli di Alfeno, in partico- lare, e di Ofilio)