• Non ci sono risultati.

Unsupervised learning

N/A
N/A
Protected

Academic year: 2022

Condividi "Unsupervised learning"

Copied!
28
0
0

Testo completo

(1)

Kohonen SOMs cluster data, i.e., they identify clouds of data which are densely distributed in specific regions, according to which the input space can be partitioned.

• As in the k-means algorithm, each partition can be represented by the centroid of the cloud: in a SOM it

corresponds to the weights associated with a neuron of the map. Conversely, each weight vector associated with a neuron of a trained SOM is a centroid for a data cloud.

• It is possible to classify data a posteriori based on the partition of the input space to which they belong. Supposing we have a labeled data set, each partition induced by the SOM can be labeled after the class for which the corresponding neuron is most frequently the winner neuron (stacked generalization)

Unsupervised learning

Kohonen’s Self-Organizing Map (SOM): summary

(2)

In practice, to label neurons of a SOM a posteriori according to a labeled data set X (stacked generalization/majority vote) :

label(i) = argmax hist(winner(X, wi),L) where

X is the dataset represented as a matrix (each row xi is a pattern).

W are the SOM weights represented as a matrix (row wi is the weight set of the i-th neuron

hist (v, L) (see the Matlab function) is the frequency histogram of vector v, whose elements may have L different values (from 1 to L).

winner (X, wi) returns a vector J having the same number of

elements as the rows of X (i.e., the number of data in the dataset):

Element Ji is the label of the winning neuron when xi is input.

Unsupervised learning

Kohonen’s Self-Organizing Map (SOM): summary

(3)

With respect to k-means, a SOM adds an ordering of the

centroids which preserves the topological properties of the input space of which the SOM can be considered a lower-dimensional projection.

The fact that neighboring neurons respond to similar

(neighboring) patterns creates a lattice whose nodes are located where data are actually found and whose arcs never cross, as if a net was casted over the pattern space and its nodes were orderly anchored in the middle of regions where data are densely

present.

Somehow, it creates a distorted re-projection of data onto a lower-dimensional space where, however, data are (not strictly but surely more) uniformly distributed.

Unsupervised learning

Kohonen’s Self-Organizing Map (SOM): observations

(4)

It can be demonstrated (see Bishop’s book) that the algorithm for training a SOM with neighborhood radius decreasing with time is equivalent to the K-means algorithm.

The weight update equation can be derived as the (gradient- based) solution of the ‘vector quantization problem’:

given K, find a set of K centroids (codebook in TLC terms; each centroid is then termed codebook vector) that minimize the

squared error (actually it holds for any exponent) made when each sample in a (large) dataset is substituted by the closest centroid.

Unsupervised learning

Kohonen’s Self-Organizing Map (SOM): observations

(5)

Unsupervised Learning

(6)

Until now, we have assumed our training samples are

“labeled” by their category membership.

Methods that use labeled samples are said to be supervised.

However, there are problems where the definition of the

classes and even the number of the classes may be unknown.

Machine learning methods which deal with such data are said to be unsupervised.

Questions:

Why would one even be interested in learning from unlabeled samples?

Is it even possible in principle to learn anything of value from unlabeled samples?

Unsupervised learning

(7)

1. To limit the cost of the often surprisingly costly process of collecting and labeling a large set of sample patterns.

E.g., videos are virtually free, but accurately labeling the video pixels is expensive and time consuming.

2. To obtain a larger training set than the one available using semi-supervised learning.

Train a classifier on a small set of samples, then tune it up to make it run without supervision on a large, unlabeled set.

Or, in the reverse direction, let a large set of unlabeled data group automatically, then label the groupings found.

3. To detect a gradual change of patterns over time.

4. To find features that will be useful for categorization.

5. To gain insight into the nature or structure of the data during the early stages of an investigation.

Why unsupervised learning

(in random order)

(8)

In practice, unsupervised learning methods implement what is usually referred to as data clustering.

Qualitatively and generally, the problem of data clustering can be defined as:

• Grouping of objects into meaningful categories

• Given a representation of N objects, find k clusters based on a similarity measure.

Unsupervised learning: clustering

(9)

The problem can be tackled from several points of view:

Statistics: represent the density function for all data as the mixture of a number of different distributions

S

i

w

i

p

y|wi

(y|w

i

)

and fit a set of weights wi

and component densities p

y|wi

to the given data

Data clustering

(10)

The problem can be tackled from several points of view:

“Geometry/topology”:

Partition the pattern space such that data belonging to each partition are highly homogeneous (i.e., they are similar to one another)

More directly related with classification:

Group (label) data such that average intra-group distance is minimized and average inter-group distance is maximized (yet another optimization problem!)

Data clustering

(11)

Why data clustering?

Natural Classification: degree of similarity among species.

Data exploration: discover underlying structure, generate hypotheses, detect anomalies.

Data Compression: for organizing/indexing/storing/broadcasting data.

Applications: useful to any scientific field that collects data!

Relevance: 2340 papers about data clustering indexed in Scopus in 2014!

Data clustering

(12)

Data clustering: examples

800,000 scientific papers clustered into 776 topics based on how often

the papers were cited together by the authors of other papers

(13)

Given a set of N unlabeled examples D = (x

1

; x

2

; …; x

N

) in a d-dimensional feature space, D is partitioned into a number of disjoint subsets Dj 's:

D = 

j=1,k

Dj where i ≠ j  D

i

 D

j

= 

where the points in each subset are similar to each other according to a given criterion .

Data clustering

(14)

A partition is denoted by

p = (

D

1

;D

2

; .. ; D

k)

and the problem of data clustering is thus formulated as p* = argmin f(p)

p

where f() is formulated according to

.

Data clustering

(15)

A general optimization (minimization) algorithm for a

classification function J(Y, W) (Y being the dataset and W the ordered set of labels assigned to each sample) can be

described as follows:

Choose an initial classifier W0 repeat (step i)

Change the classification such that J decreases until the classification is the same as the previous one.

If the variables were continuous, a gradient method could be used. However, here, we are dealing with discrete variables.

Data clustering

(16)

A reasonable algorithm (based on the simplifying assumption that the optimization problem is separable, i.e., that the

minimum of an n-dimensional function can be found by

minimizing it along each dimension separately) would assign to each sample the label that causes the largest (negative) J.

NB Since the problem is non-separable, there is no guarantee that J decreases as the sum of J’s . It may even increase!

A better but slower solution, which guarantees monotonicity, would be to change, in each step, the label that causes the

greatest negative J.

Data clustering

(17)

K-means clustering is developed by choosing the Euclidean distance as the similarity criterion  and

J =

S

k=1,Nc

S

y(i)~wk |y(i) - mk | as the function to be optimized.

y(i) is the ith sample and mk is the centroid of the kth cluster.

y

(i)~

w

k

refers to all samples y

(i)

assigned to cluster k

Then M  {m1 , m2 , .. , mNc} is the set of reference vectors, each of which represents the ‘prototype’ for a class.

J is minimized by choosing mk as the sample mean of the data having label

w

k

.

K-means clustering

(18)

In practice:

• The algorithm partitions the input space S into a pre- defined number of subspaces, induced by the Euclidian distance.

• Each subspace si of S is defined as:

si ={xjϵS | d(xj,mi) = mint d(xj,mt)}

This induces a so-called

Voronoi tessellation of the input space (example limited to 2D patterns)

Unsupervised learning

K-means clustering

(19)

Randomly initialize m  {m1, m2, m3, m4, ..., mNc}

repeat

Classify N samples according to nearest mi

Recompute mi (as the mean of the patterns assigned to cluster i)

until there is no change in m

K-means clustering

(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)

Riferimenti

Documenti correlati

Following the filter we have the key-component of this work, that is the Low Noise Amplifier, then the first Mixer, that translates the carrier frequency to a lower and

The main idea, in the application of elastic wavelets to the evolution of solitary waves, consists not only in the wavelet representation of the wave, but also in the assumption

One plausible way to achieve a kind of agreement, among liberal and nonliberal societies, on certain political principles and democratic values is to appeal to the strategy

Stag hunters still end up visiting stag hunters, because even dis- counted reinforcement beats a reinforcement of zero, but now rabbit hunters will get locked either into pairs

Those activities which work SHOULD SEE THEIR ROLE AS SUPPORTING THE FUNDAMENTAL PURPOSE AND AIMS OF THE MUSEUM and should respect the limitations this may put on

While the web texts and the sustainability sub-corpora displayed a high preference for items associated to the company’s daily business activities and the company’s CSR

Any use of the downloads that infringes upon the intellectual property rights of Confessions of a Homeschooler or that is for commercial purposes will be investigated, and the

(1) University of Trieste, Dipartimento di Matematica e Geoscienze, trieste, Italy (lara.tiberi@gmail.com), (2) Geological Survey of Slovenia, Department for Regional