Unsupervised models

(1)

Unsupervised models

and clustering

(2)

Introduction − 1

In order to efficiently mimic the nervous system, it is necessary to get an idea of the nature of the biological processes that actually take place in the brain

The only reasonable assumption is that they are driven by mechanisms aimed at optimizing the target they have to pursue

Each cognitive process evolves from an initial situation associated to a stimulus, up to a ter- minal state in which a response is produced, which is the result of the process itself

It is intuitive that, in this evolution, there is a sort

(4)

In fact, it is just the initial stimulus that provides the information necessary to obtain the desired response

 Such information must be transmitted and pre- served until the conclusion of the process

A reasonable way for interpreting processes that take place in the nervous system is therefore to consider them as transfers of information that follow a “maximum conservation” criterion

Introduction − 2

(5)

The unsupervised learning framework − 1

Sometimes, in real−world applications, we do not have sufficient knowledge on the problem to be solved in order to entrust to the supervised learning paradigm In other words, we do not know the solution to the problem, albeit sampled on a set of examples, since only data to be analyzed and interpreted are available, without prior guidelines

A classic problem of this type is to highlight some data groups, with common characteristics, within a “dis- ordered and miscellaneous” set

This problem can be addressed by using a self−organizing neural network − capable of learning data without a supervisor − which provides solutions at specific points in the weight space

(6)

The unsupervised learning framework − 2

Learning without supervision means aggregating

“similar” examples in topologically neighboring regions, that is to say using the internal features of the data to come up with groups inside them

Therefore, while in the supervised learning protocol the variation of synaptic connections is done in an attempt to minimize the error with respect to the information provided by the supervisor, in this case, the learning procedure is guided by “similarity” criteria, inherently present within the data

(7)

The connection weight changes by an amount proportional to the product of the activation values of the two neurons, thus tending to

“strengthen” the strongest links and to “weaken” the weakest ones More realistic, from the biological point of view, with respect to Back-

The unsupervised learning framework − 3

An unsupervised learning example: the Hebb rule

Hebb (1949): if two connected neurons are active

at the same time, the force (or the weight) of

their connection is increased

(8)

Summing up…

Given a set of examples we want to find something interesting

The experience is given by the collected “examples”

Without any information other than the data itself…

The performance depends on how “interesting” is the obtained result

The unsupervised learning framework − 4

(9)

The unsupervised learning framework − 5

(10)

Self−organizing networks − 1

In the auditory cortex, neurons and fibers are anatomically arranged in an orderly manner with respect to the acoustic frequencies

In the central nervous system, the ganglion cells, which constitute the output stage of the retina, are organized according to receptive fields, sensitive to particular stimuli Studies on brain maps during the prenatal stage, i.e. in the absence of sensory stimuli, have demonstrated that, in the lower levels, the neural structure is genetically pre- determined, while at higher levels the organization matures and emerges as a result of the process of learning from examples

(11)

Trivially, it can be observed that when the chil- dren learn to read, mark the letters one by one and “think” while reading; in adults, however, whole words and sentences are recognized at a glance

This shows that there is a kind of cerebral self−

organization, that develops the brain power to classify and easily recognize some “common” patterns, which is also confirmed by the difficulty of reading a text upside down or containing attached words

Self−organizing networks − 2

(12)

Studied by Grossberg, Von Malsburg, Rumelhart, Zipser and Kohonen since the mid ‘80s, self−

organizing neural networks are particularly indic- ated in those applications where there is no a priori knowledge on how to process certain in- formation, so that the network is expected to automatically “discover” interesting concepts in the “data population”

Both the stimulus and the network response are characterized by a given number of components, describing points in appropriate spaces

Self−organizing networks − 3

(13)

The learning process can be interpreted as a geometric transformation from the input to the output space

The output space has a size smaller than that of the input space, since the stimulus normally contains the information required to activate many simultan- eous processes

Compared to a single one, it is redundant

This means that during self−organization, a re- dundancy reduction operation is carried out

In both the input and the output space, typical regions are evidenced, to which information is associated

Self−organizing networks − 4

(14)

The natural mechanism that controls the information transfer must then identify those regions, that are important for the cognitive process, and make sure that they will correspond in the transformation

Therefore, in the cognitive process, a data clustering operation is carried out, realized with the acquisition of experience

To both the operations of redundancy reduction and clustering, biological evidence can be attached, based on the functioning of the nervous system

Self−organizing networks − 5

(15)

Mapping from the input space to the space defined by the neurons of the self−organizing network

Self−organizing networks − 6

(16)

Self−organizing networks − 7

Initially, the network nodes are arbitrarily positioned in the data space

Graphically speaking, if the violet blob represents the distribution of the training data, and the white point is the current training pattern, the nearest neuron is selected and moved towards it, as (to a lesser extent) its neighbors on the grid

After training, the grid tends to approximate the data distribution

(17)

The main use of self−organizing networks is to perform data analysis, in order to highlight regularities among data or to classify them

Self−organizing neural networks are suitable for the solution of different problems with respect to supervised learning approaches

They can be used for classification tasks − especially in the case in which the number of classes is unknown

They are used to implement clustering, for partitioning data sets into collections, or clusters, containing similar patterns

They can perform data compression

Self−organizing networks − 8

(18)

Self−Organizing Maps − 1

Biological motivation

The Kohonen network is modeled on the basis of a characteristic behaviour of neurons in laminar nervous tissues: such neurons are activated, under the action of a stimulus, in groups characterized by the presence or the absence of activity, defining as

“activity” the emission of a number of pulses (per unit time) that exceeds a certain threshold

The spatial demarcation between the two groups is clear, so that we talk about the formation of “activation bubbles”, obtained starting from a situation of undifferentiated low activity of all the neurons in

(19)

In the case of biological neurons, those neurons that are physically closer to active neurons have strong and excitatory connections, while the peri- pheral ones have inhibiting connections

In the cerebral cortex, there are projections of sensory stimuli on specific networks of cortical neurons

The sensorimotor neurons constitute a distorted map (the extent of each region is proportional to the sensitivity of the corresponding area, not to its size) of the body surface

However, adjacent parts of the cortex correspond to

Self−Organizing Maps − 2

(20)

Self−Organizing Maps − 3

(21)

Kohonen modelled this feature by restricting the variation of the weights only to those neurons that are close to a selected neuron

SOM (T. Kohonen, 1981): “sensorial” maps, con- stituted by a single layer of neurons, where the units specialize themselves to respond to different stimuli, in such a way that…

different inputs activate different (far away) units topologically neighboring units are activated by similar inputs

Self−Organizing Maps − 4

(22)

Architecture

A single layer of neurons

u

_i,

i=1,…,bh

(where

b

is the breadth and

h

the height of the map, in the bi−

dimensional case)

Each input

X

=[

x

₁

,x

₂

,…,x

_n] is connected with all the neurons

To each connection, a weight

w

_ij is associated

Activation function

f

_i

= 1/d(W

_i

,X)

, where

d

is a distance Lateral untrainable connections

Self−Organizing Maps − 5

(23)

The weights of each neuron are modified:

in an excitatory sense, in proportion to the value of its activation and to those of the neurons belonging to its neighborhood, and also proportionally to the distance from them

in an inhibitory sense, in proportion to the value of the activation of the neurons outside the neighborhood, and also proportionally to the distance from them

Therefore, proposing the same input to the network:

neurons that originally had a high activation value, and their neighbors, will show even a greater activation

neurons that responded little, will react even less

Self−Organizing Maps − 6

(24)

If “well−distributed” data are presented to the network, in an iterative fashion, each neuron spe- cializes itself in responding to inputs of a certain type Moreover, neighboring neurons respond to

“similar” stimuli projecting, in practice, the layer of neurons on the input space

Data dimensionality reduction, from

n

(input size) to

m

(map size, usually 2−3)

Each pattern is represented by the coordinates of the unit on which it is projected, that is the one that has the maximum activation, or, in other words,

Self−Organizing Maps − 7

(25)

Lateral interaction

The output layer neurons are connected with a

“neighborhood” of neurons, according to a system of lateral interactions defined “Mexican hat”

These connection weights are not subjected to learning, but they are fixed and positive in the neighborhood of each neuron

Self−Organizing Maps − 8

(26)

In detail:

Self−Organizing Maps − 9

To each neuron (node) corresponds a set of instances from the dataset To each neuron (node) is associated a vector of weights, a codebook, which describes the typical profile of the neuron

The positions of the neurons in the map are important, i.e. (1) two neighboring neurons have similar codebooks; (2) a set of contiguous neurons corresponds to a particular pattern in the data

The connections between the input and output layers indicate the relationships between the input and output vectors

(27)

Therefore, in the output layer, a unique neuron must be the “winner” (showing the maximum activation value) for each pattern presented to the network

The winner neuron identifies the “class” to which the input belongs

The “Mexican hat” connections tend to favour the formation of bubbles of activation, that identify similar inputs

Self−Organizing Maps − 10

(28)

In practice:

the input space

S

is partitioned into

p

subspaces, if

p

(=

bh

) is the dimension of the layer (in neurons) each subspace

s

_i 

S

is defined by:

s

_i

= {X

_j t.c.

d(X

_j

,W

_i

) = min

_t

d(X

_j

,W

_t

)}

Self−Organizing Maps − 11

(29)

Simplifications of the model to implement the training algorithm:

weights are changed only in the neighborhood of the neuron that has the maximum activation (this type of training is also known as competitive learning)

we consider only the excitatory lateral interactions within a limited neighborhood of the winner neuron

Note: Changing the weights in the excitatory sense means making them more and more close to the inputs; changing the inhibitory weights causes an opposite behaviour

Self−Organizing Maps − 12

(30)

Randomize the node weights Repeat:

• For each input x_i:

1. find the winner neuron u_j

2. update the weights of the winner neuron and of the neurons belonging to its neighborhood, as:

w

_j

(t+1) = w

_j

(t) + a(t)|x

_i

(t) − w

_j

(t)|

a(t+1) = a(t)(1−g)

(

g

small positive constant  1)

Self−Organizing Maps − 13

Training algorithm

Initialization phase Competition phase: A dis-

tance must be defined between the codebooks and the input

Cooperation phase: Update the codebooks; weights of the adjacent nodes are also updated, not at the same level; this ensures the similarity of weights among contiguous nodes; the neighborhood to be considered is reduced progressively

Adaptation phase: At first, high learning rate, move

(31)

Self−Organizing Maps − 14

w_j(t+1) w_j(t) x_i(t)

(32)

The role of learning is to rotate the synaptic weight vector to align it

Self−Organizing Maps − 15

(33)

The SOM learning algorithm is very simple (it does not require the derivative calculation):

Neuron j*, with the weight vector closest to the input pattern, is selected; that neuron “recall” the input vector and changes its weights in order to align them with the input

Also the weight vectors of the neighboring neurons of

j*

are updated

the network tries to create regions consisting of a large set of values around the input with which it learns

those vectors that are spatially close to the training values will be correctly classified even if the network has never seen them before (generalization capability)

Self−Organizing Maps − 16

(34)

The purpose of a SOM is to have, for similar inputs, neighboring winner neurons, so that each activation bubble represents a class, containing all the input data with common characteristics

The neighboring region can be chosen as a square, a circle or a hexagon around the winner neuron

The neighborhood must be chosen large, at the beginning, whereas it should slowly decrease during learning

Self−Organizing Maps − 17

(35)

Summing up…

The training data for SOMs usually have no labels and the map learns to differentiate and distinguish features based on similarities

SOMs are able to realize an unsupervised clustering, i.e. they can identify, in the input space, a set of partitions induced by the similarities/differences among data (no knowledge on the number and type of “classes”)

Each partition is represented by a prototype (a centroid) defined by the weight vector, the codebook of the corresponding neuron

Self−Organizing Maps − 18

(36)

Self−Organizing Maps − 19

Summing up…

SOMs differ from other artificial neural networks because they apply competitive learning as opposed to error correlated learning, which involves Back- propagation and gradient descent

In competitive learning, nodes compete for the right to respond to the input data subset

After training, it is possible to label (classify) data, based on the partition of the input space to which they belong

(37)

Gene clustering from microarray data

Prediction of microRNA target (miRNAs are small non coding RNAs that regulate transcriptional processes via binding the target gene mRNA)

Human tissue classification

DNA motif identification for extracting binding sites in DNA sequences

Prediction of the behaviour of synthetic peptides in inhibiting the adhesion of human tumor cells to extracellular proteins

…

SOMs and Bioinformatics

(38)

Since additional structural information is often available in possible applications of self−

organizing maps (SOMs), a transfer of standard unsupervised learning methods to sequences and more complex graph structures would be valuable Nevertheless SOMs constitute a metric−based approach, and therefore they can be applied directly to structured data if data comparison is defined and a notion of adaptation within the data space can be found

Unsupervised models