Unsupervised models
and clustering
Table of contents
Introduction
The unsupervised learning framework
Self−Organizing Maps (SOMs)
Introduction − 1
In order to efficiently mimic the nervous system, it is necessary to get an idea of the nature of the biological processes that actually take place in the brain
The only reasonable assumption is that they are driven by mechanisms aimed at optimizing the target they have to pursue
Each cognitive process evolves from an initial situation associated to a stimulus, up to a ter- minal state in which a response is produced, which is the result of the process itself
It is intuitive that, in this evolution, there is a sort
In fact, it is just the initial stimulus that provides the information necessary to obtain the desired response
Such information must be transmitted and pre- served until the conclusion of the process
A reasonable way for interpreting processes that take place in the nervous system is therefore to consider them as transfers of information that follow a “maximum conservation” criterion
Introduction − 2
The unsupervised learning framework − 1
Sometimes, in real−world applications, we do not have sufficient knowledge on the problem to be solved in order to entrust to the supervised learning paradigm In other words, we do not know the solution to the problem, albeit sampled on a set of examples, since only data to be analyzed and interpreted are available, without prior guidelines
A classic problem of this type is to highlight some data groups, with common characteristics, within a “dis- ordered and miscellaneous” set
This problem can be addressed by using a self−organiz- ing neural network − capable of learning data without a supervisor − which provides solutions at specific points in the weight space
The unsupervised learning framework − 2
Learning without supervision means aggregating
“similar” examples in topologically neighboring regions, that is to say using the internal features of the data to come up with groups inside them
Therefore, while in the supervised learning protocol the variation of synaptic connections is done in an attempt to minimize the error with respect to the information provided by the supervisor, in this case, the learning procedure is guided by “similarity” criteria, inherently present within the data
The connection weight changes by an amount proportional to the product of the activation values of the two neurons, thus tending to
“strengthen” the strongest links and to “weaken” the weakest ones More realistic, from the biological point of view, with respect to Back-
The unsupervised learning framework − 3
An unsupervised learning example: the Hebb rule
Hebb (1949): if two connected neurons are active
at the same time, the force (or the weight) of
their connection is increased
Summing up…
Given a set of examples we want to find something interesting
The experience is given by the collected “examples”
Without any information other than the data itself…
The performance depends on how “interesting” is the obtained result
The unsupervised learning framework − 4
The unsupervised learning framework − 5
Self−organizing networks − 1
In the auditory cortex, neurons and fibers are anatomically arranged in an orderly manner with respect to the acoustic frequencies
In the central nervous system, the ganglion cells, which constitute the output stage of the retina, are organized according to receptive fields, sensitive to particular stimuli Studies on brain maps during the prenatal stage, i.e. in the absence of sensory stimuli, have demonstrated that, in the lower levels, the neural structure is genetically pre- determined, while at higher levels the organization matures and emerges as a result of the process of learning from examples
Trivially, it can be observed that when the chil- dren learn to read, mark the letters one by one and “think” while reading; in adults, however, whole words and sentences are recognized at a glance
This shows that there is a kind of cerebral self−
organization, that develops the brain power to classify and easily recognize some “common” pat- terns, which is also confirmed by the difficulty of reading a text upside down or containing attached words
Self−organizing networks − 2
Studied by Grossberg, Von Malsburg, Rumelhart, Zipser and Kohonen since the mid ‘80s, self−
organizing neural networks are particularly indic- ated in those applications where there is no a priori knowledge on how to process certain in- formation, so that the network is expected to automatically “discover” interesting concepts in the “data population”
Both the stimulus and the network response are characterized by a given number of components, describing points in appropriate spaces
Self−organizing networks − 3
The learning process can be interpreted as a geometric transformation from the input to the output space
The output space has a size smaller than that of the input space, since the stimulus normally contains the information required to activate many simultan- eous processes
Compared to a single one, it is redundant
This means that during self−organization, a re- dundancy reduction operation is carried out
In both the input and the output space, typical regions are evidenced, to which information is associated
Self−organizing networks − 4
The natural mechanism that controls the information transfer must then identify those regions, that are important for the cognitive process, and make sure that they will correspond in the transformation
Therefore, in the cognitive process, a data clustering operation is carried out, realized with the acquisition of experience
To both the operations of redundancy reduction and clustering, biological evidence can be attached, based on the functioning of the nervous system
Self−organizing networks − 5
Mapping from the input space to the space defined by the neurons of the self−organizing network
Self−organizing networks − 6
Self−organizing networks − 7
Initially, the network nodes are arbitrarily positioned in the data space
Graphically speaking, if the violet blob represents the distribution of the training data, and the white point is the current training pattern, the nearest neuron is selected and moved towards it, as (to a lesser extent) its neighbors on the grid
After training, the grid tends to approximate the data distribution
The main use of self−organizing networks is to perform data analysis, in order to highlight regularities among data or to classify them
Self−organizing neural networks are suitable for the solution of different problems with respect to super- vised learning approaches
They can be used for classification tasks − especially in the case in which the number of classes is unknown
They are used to implement clustering, for partitioning data sets into collections, or clusters, containing similar patterns
They can perform data compression
Self−organizing networks − 8
Self−Organizing Maps − 1
Biological motivation
The Kohonen network is modeled on the basis of a characteristic behaviour of neurons in laminar nervous tissues: such neurons are activated, under the action of a stimulus, in groups characterized by the presence or the absence of activity, defining as
“activity” the emission of a number of pulses (per unit time) that exceeds a certain threshold
The spatial demarcation between the two groups is clear, so that we talk about the formation of “activ- ation bubbles”, obtained starting from a situation of undifferentiated low activity of all the neurons in
In the case of biological neurons, those neurons that are physically closer to active neurons have strong and excitatory connections, while the peri- pheral ones have inhibiting connections
In the cerebral cortex, there are projections of sensory stimuli on specific networks of cortical neurons
The sensorimotor neurons constitute a distorted map (the extent of each region is proportional to the sensitivity of the corresponding area, not to its size) of the body surface
However, adjacent parts of the cortex correspond to
Self−Organizing Maps − 2
Self−Organizing Maps − 3
Kohonen modelled this feature by restricting the variation of the weights only to those neurons that are close to a selected neuron
SOM (T. Kohonen, 1981): “sensorial” maps, con- stituted by a single layer of neurons, where the units specialize themselves to respond to different stimuli, in such a way that…
different inputs activate different (far away) units topologically neighboring units are activated by similar inputs
Self−Organizing Maps − 4
Architecture
A single layer of neurons
u
i,i=1,…,bh
(whereb
is the breadth andh
the height of the map, in the bi−dimensional case)
Each input
X
=[x
1,x
2,…,x
n] is connected with all the neuronsTo each connection, a weight
w
ij is associatedActivation function
f
i= 1/d(W
i,X)
, whered
is a distance Lateral untrainable connectionsSelf−Organizing Maps − 5
The weights of each neuron are modified:
in an excitatory sense, in proportion to the value of its activation and to those of the neurons belonging to its neighborhood, and also proportionally to the distance from them
in an inhibitory sense, in proportion to the value of the activation of the neurons outside the neighborhood, and also proportionally to the distance from them
Therefore, proposing the same input to the network:
neurons that originally had a high activation value, and their neighbors, will show even a greater activation
neurons that responded little, will react even less
Self−Organizing Maps − 6
If “well−distributed” data are presented to the network, in an iterative fashion, each neuron spe- cializes itself in responding to inputs of a certain type Moreover, neighboring neurons respond to
“similar” stimuli projecting, in practice, the layer of neurons on the input space
Data dimensionality reduction, from
n
(input size) tom
(map size, usually 2−3)Each pattern is represented by the coordinates of the unit on which it is projected, that is the one that has the maximum activation, or, in other words,
Self−Organizing Maps − 7
Lateral interaction
The output layer neurons are connected with a
“neighborhood” of neurons, according to a system of lateral interactions defined “Mexican hat”
These connection weights are not subjected to learning, but they are fixed and positive in the neighborhood of each neuron
Self−Organizing Maps − 8
In detail:
Self−Organizing Maps − 9
To each neuron (node) corresponds a set of instances from the dataset To each neuron (node) is associated a vector of weights, a codebook, which describes the typical profile of the neuron
The positions of the neurons in the map are important, i.e. (1) two neighboring neurons have similar codebooks; (2) a set of contiguous neurons corresponds to a particular pattern in the data
The connections between the input and output layers indicate the relationships between the input and output vectors
Therefore, in the output layer, a unique neuron must be the “winner” (showing the maximum activation value) for each pattern presented to the network
The winner neuron identifies the “class” to which the input belongs
The “Mexican hat” connections tend to favour the formation of bubbles of activation, that identify similar inputs
Self−Organizing Maps − 10
In practice:
the input space
S
is partitioned intop
subspaces, ifp
(=bh
) is the dimension of the layer (in neurons) each subspaces
i S
is defined by:s
i= {X
j t.c.d(X
j,W
i) = min
td(X
j,W
t)}
Self−Organizing Maps − 11
Simplifications of the model to implement the training algorithm:
weights are changed only in the neighborhood of the neuron that has the maximum activation (this type of training is also known as competitive learning)
we consider only the excitatory lateral interactions within a limited neighborhood of the winner neuron
Note: Changing the weights in the excitatory sense means making them more and more close to the inputs; changing the inhibitory weights causes an opposite behaviour
Self−Organizing Maps − 12
Randomize the node weights Repeat:
• For each input xi:
1. find the winner neuron uj
2. update the weights of the winner neuron and of the neurons belong- ing to its neighborhood, as:
w
j(t+1) = w
j(t) + a(t)|x
i(t) − w
j(t)|
a(t+1) = a(t)(1−g)
(
g
small positive constant 1)Self−Organizing Maps − 13
Training algorithm
Initialization phase Competition phase: A dis-
tance must be defined between the codebooks and the input
Cooperation phase: Update the codebooks; weights of the adjacent nodes are also updated, not at the same level; this ensures the simil- arity of weights among con- tiguous nodes; the neigh- borhood to be considered is reduced progressively
Adaptation phase: At first, high learning rate, move
Self−Organizing Maps − 14
wj (t+1) wj (t) xi (t)
The role of learning is to rotate the synaptic weight vector to align it
Self−Organizing Maps − 15
The SOM learning algorithm is very simple (it does not require the derivative calculation):
Neuron j*, with the weight vector closest to the input pattern, is selected; that neuron “recall” the input vector and changes its weights in order to align them with the input
Also the weight vectors of the neighboring neurons of
j*
are updatedthe network tries to create regions consisting of a large set of values around the input with which it learns
those vectors that are spatially close to the training values will be correctly classified even if the network has never seen them before (generalization capability)
Self−Organizing Maps − 16
The purpose of a SOM is to have, for similar inputs, neighboring winner neurons, so that each activation bubble represents a class, containing all the input data with common characteristics
The neighboring region can be chosen as a square, a circle or a hexagon around the winner neuron
The neighborhood must be chosen large, at the beginning, whereas it should slowly decrease during learning
Self−Organizing Maps − 17
Summing up…
The training data for SOMs usually have no labels and the map learns to differentiate and distinguish features based on similarities
SOMs are able to realize an unsupervised cluster- ing, i.e. they can identify, in the input space, a set of partitions induced by the similarities/differences among data (no knowledge on the number and type of “classes”)
Each partition is represented by a prototype (a centroid) defined by the weight vector, the code- book of the corresponding neuron
Self−Organizing Maps − 18
Self−Organizing Maps − 19
Summing up…
SOMs differ from other artificial neural networks because they apply competitive learning as opposed to error correlated learning, which involves Back- propagation and gradient descent
In competitive learning, nodes compete for the right to respond to the input data subset
After training, it is possible to label (classify) data, based on the partition of the input space to which they belong