Higher-order clustering with non-uniform hypergraphs

(1)

Master’s degree programme in Informatica – Computer Science Second Cycle (D.M. 270/2004)

Final thesis

Higher-order clustering

with non-uniform hypergraphs

Supervisor: Chiar.mo Prof. Marcello Pelillo Assistant supervisor: Dr. Sebastiano Vascon Graduand: Roberta Prendin Matriculation Number 819575 Academic Year: 2015– 2016

(2)

that never let me down in the past 29 years – nonsteroidal anti-inflammatory drugs.

(3)

1 i n t r o d u c t i o n 9

1.1 Thesis goals . . . 9

1.2 Thesis structure . . . 10

2 t h e h y p e r g r a p h c l u s t e r i n g p r o b l e m 12 2.1 The hypergraph clustering problem in literature . . . 12

2.2 Uniform hypergraph clustering . . . 13

2.2.1 A game-theoretic approach to hypergraph clustering . . . 14

2.2.2 Principles of Evolutionary Game Theory . . . 15

2.2.3 The hypergraph clustering game . . . 17

2.2.4 A formal definition of cluster . . . 18

2.3 Non-uniform hypergraph clustering . . . 19

2.3.1 Multi-model fitting in literature . . . 21

2.3.2 Advantages of non-uniform hypergraphs . . . 22

2.3.3 Parallels, or lack thereof, with the clustering game . . . . 24

3 c l u s t e r e x t r a c t i o n w i t h n o n-uniform hypergraphs 25 3.1 The Baum-Eagon and Baum-Sell inequalities . . . 26

3.1.1 Parallels with dynamical system theory . . . 27

3.2 Application to non-uniform hypergraph clustering . . . 27

3.2.1 Cluster extraction with uniform hypergraph . . . 28

4 e x p e r i m e n t s o n s y n t h e t i c d ata s e t s 30 4.1 Experiments on uniform hypergraph clustering . . . 31

4.1.1 Tests on correctness . . . 33

4.1.2 Tests on robustness to clutter items . . . 38

4.2 Experiments on non-uniform hypergraph clustering . . . 40

4.2.1 Tests on correctness . . . 42

4.2.2 Tests on robustness to clutter items . . . 44

(4)

4.3 Experiments on mixed models . . . 47

4.3.1 Tests with planes and segments . . . 48

4.3.2 Tests with circles and segments . . . 54

4.3.3 Tests on clusters with different sizes . . . 60

5 c a s e s t u d y: larger and noisy data sets 66 5.1 An excursus on T-Linkage . . . 66

5.2 The data sets . . . 67

5.3 Experiments with GHC . . . 68

5.3.1 Comparison with T-Linkage . . . 71

5.4 Experiments with NHC . . . 72

6 c o n c l u s i o n a n d f u t u r e w o r k s 77

7 a c k n o w l e d g e m e n t s 78

(5)

Figure 1 A weighted k-graph where k = 3. . . 13

Figure 2 A weighted non-uniform hypergraph with cardinalities k1 = 2and k2 = 3. . . 20

Figure 3 Example of a data set containing both cocircular and coplanar points. . . 23

Figure 4 A weighted non-uniform hypergraph with cardinalities k1 = 2and k2 = 3. . . 26

Figure 5 Average execution times with data setCSO(left) andSO2. 31 Figure 6 NIS, σ = 0. . . 34 Figure 7 NIS, σ = 0.01. . . 34 Figure 8 NIS, σ = 0.02. . . 34 Figure 9 NIS, σ = 0.04. . . 35 Figure 10 NIS, σ = 0.08. . . 35 Figure 11 IS, σ = 0. . . 35 Figure 12 IS, σ = 0.01. . . 36 Figure 13 IS, σ = 0.02. . . 36 Figure 14 IS, σ = 0.04. . . 36 Figure 15 IS, σ = 0.08. . . 37

Figure 16 ISwith σ = 0.08, β = 1000 (left) or 15000. . . 37

Figure 17 Misclassifications rates usingNISandIS. . . 38

Figure 18 SO, 10 outliers. . . 39

Figure 21 Misclassification rates usingSO. . . 41

Figure 22 IS, σ = 0 . . . 42

Figure 23 IS, σ = 0.01 . . . 43

(6)

Figure 24 IS, σ = 0.02 . . . 43

Figure 25 IS, σ = 0.04 . . . 43

Figure 26 IS, σ = 0.08 . . . 44

Figure 27 Number of misclassification errors usingIS2. . . 45

Figure 28 SO2, 10 outliers. . . 46

Figure 31 Misclassification rate usingSO2. . . 47

Figure 32 PS, σ = 0. . . 50

Figure 33 PS, σ = 0.01. . . 50

Figure 34 PS, σ = 0.02. . . 50

Figure 35 PS, σ = 0.04. . . 51

Figure 36 PS, σ = 0.08. . . 51

Figure 37 Number of misclassification errors withPS. . . 52

Figure 38 PSO, 10 outliers. . . 52

Figure 41 Misclassification rates usingPSO. . . 53

Figure 42 CS, σ = 0. . . 55

Figure 43 CS, σ = 0.01. . . 56

Figure 44 CS, σ = 0.02. . . 56

Figure 45 CS, σ = 0.04. . . 56

Figure 46 CS, σ = 0.08. . . 57

Figure 47 CS, σ = 0.08, with tweaked termination parameters. . . . 57

Figure 48 Number of misclassification errors withCS. . . 57

Figure 49 CSO, 10 outliers. . . 58

Figure 52 Misclassification rates usingCSO. . . 59

Figure 53 UNCS, σ = 0. . . 61

(7)

Figure 55 UNCS, σ = 0.02. . . 62

Figure 56 UNCS, σ = 0.04. . . 62

Figure 57 UNCS, σ = 0.08. . . 62

Figure 58 UNCS, σ = 0.08 with different termination parameters. . . 63

Figure 60 UNCSO, 10 outliers. . . 63

Figure 59 Number of misclassification errors withUNCS. . . 64

Figure 63 Misclassification rates usingUNCSO. . . 65

Figure 64 Star5andCircle4. . . 67

Figure 65 Star5, 0 outliers. . . 69

Figure 69 Comparison between GHC and T-Linkage onStar5. . . 71

Figure 70 StarCircle, 0 outliers. . . 73

Figure 74 Casual cocircularity between outliers and points of the cluster. . . 75

(8)

Table 1 Parameters used forNISandIS. . . 33

Table 2 Parameters used forSO. . . 38

Table 3 Parameters used forIS2. . . 42

Table 4 Parameters used forSO. . . 45

Table 5 Parameters used forPSandPSO. . . 49

Table 6 Parameters used forCSandCSO. . . 55

Table 7 Parameters used forUNCSandUNCSO. . . 60

Table 8 Parameters used forStar5. . . 68

Table 9 MSE of GHC and T-Linkage. . . 72

Table 10 Parameters used forStarCircle. . . 73

(9)

In general terms, clustering can be defined as the problem of organizing a fi-nite set of elements into groups in such a way that similar elements belong to the same cluster while dissimilar elements are assigned to different clusters. Most of the approaches available in literature focus on pairwise clustering, as-suming that the similarities between objects are strictly pairwise and can be represented using a simple weighted graph; however, in many situations this assumption is limiting: higher-order relations seem to be more appropriate to reduce potential information loss. When clustering takes into account these higher-order similarities, it assumes the name hypergraph clustering, since the data that need to be grouped can be represented using a hypergraph: the nodes are the objects to be clustered while the hyperedges represent high-order similarities. Another assumption is however in place: hyperedges must have the same fixed cardinality. The goal of the present thesis is to identify and test a hypergraph clustering approach that takes into account the different higher-order relations existing between sets of objects, dismissing the two pre-vious assumptions: starting from the observations of Pelillo et al. on clustering using uniform hypergraphs, this work explores the possible implications of clustering using non-uniform hypergraphs, providing a detailed examination of several test cases.

(10)

1

I N T R O D U C T I O N

The Oxford Dictionary defines the term cluster as «a group of similar things or

people positioned or occurring closely together» [1]. While this definition might

slightly vary depending on the specific field of study, a common theme always emerges: a cluster is a group of entities – stars, atoms, people, items – bonded together and forming a close association. A similar concept is behind the meaning of the term cluster in Computer Science: the clustering problem consists of organizing a set of entities into maximally coherent groups, according to a suitable similarity parameter.

1.1 t h e s i s g oa l s

Computer Science offers a wide set of techniques and approaches to tackle the clustering problem. While most of them are based on pairwise affinities between items, for some applications such as image categorization or face clustering these pairwise relations are too restrictive and may induce infor-mation loss. Consider for example the classic problem of clustering lines in a d-dimensional space: since any two points are always collinear, the most meaningful way to obtain an effective affinity measure is to evaluate the po-tential collinearity of triplets of points. When higher-order relations appear to be more meaningful the clustering problem is referred as hypergraph clustering, since any of its instances can be encoded by a hypergraph where the nodes are the items to be clustered and the weighted edges represent higher-order similarities. Traditionally, the available techniques for hypergraph clustering assume that the order of these similarities is a fixed value. This translates into a strong assumption on the hypergraph involved in the clustering process:

(11)

since all relationships must always be between the same number of objects, hypergraph clustering only employs uniform hypergraphs. Furthermore, tra-ditional clustering is often interpreted as a partitioning problem over the set of entities: clusters are obtained by forcing all items into exactly one of the pre-determined number of clusters. As a result, when the number of classes is not known a priori or when the data set contains clutter items, some clusters might be spurious or meaningless. Finally, clustering approaches based on uniform hypergraphs are usually capable of investigating only one model at a time, showing their shortcomings when simultaneous multi-model fitting is requested.

This thesis identifies and tests a hypergraph clustering approach that takes into account the different higher-order relations existing between objects:

start-ing from the observations of [5] on clustering using uniform hypergraphs, this

work focuses on the possible implications of clustering using non-uniform hy-pergraphs – hyhy-pergraphs with different edge cardinalities –, providing a possi-ble solution to explore multi-model fitting propossi-blems where more flexibility is needed. Furthermore, rather than partitioning the entities into a fixed number of classes, the approach described in the following pages extracts groups of objects without knowing in advance the overall number of sets they should be grouped into.

1.2 t h e s i s s t r u c t u r e

This thesis is organized as follows:

1. Chapter2reviews the hypergraph clustering problem, providing an account

of the state of the art as well as a description of the game-theoretic ap-proach proposed by Pelillo and Rota Bulò. This chapter also introduces an extension of the previous approach to non-uniform hypergraphs, pre-senting the advantages of this solution and the state of the art for multi-model fitting.

(12)

2. Chapter3 describes the mathematical principles behind the approaches

reviewed in the previous chapter;

3. Chapter4introduces the results of a large series of experiments on both

the game theoretic approach on hypergraph clustering and its extension to non-uniform hypergraphs;

4. Chapter5describes a particular set of experiments on larger and noisier

data sets, introducing a comparison with a multi-model fitting technique called T-Linkage;

5. Chapter 6 concludes the thesis, offering suggestions for possible future

(13)

2

T H E H Y P E R G R A P H C L U S T E R I N G P R O B L E M

The hypergraph clustering problem consists of grouping a finite set of objects by means of a hypergraph, which encodes both the entities to be clustered and the existing higher-order relationships between them.

2.1 t h e h y p e r g r a p h c l u s t e r i n g p r o b l e m i n l i t e r at u r e

The hypergraph clustering problem has gathered quite a bit of attention from different areas such as machine learning, computer vision or VLSI design. Despite exploring higher-order similarities between objects, some of the ap-proaches designed in recent years can be reduced to standard pairwise al-gorithms, since they transform the initial similarity hypergraph into a corre-sponding graph. As examples, consider the clique expansion and star

expan-sion methods proposed by Zien et al. [11]: in both cases the original similarity

hypergraph used in the clustering process is cast into a weighted graph, whose edge-weights are computed as a function of the weights of the original

hyper-graph. A similar approach is also proposed by Rodriguez [13], who

trans-forms the hyperedges of the original weighted hypergraph into the cliques of a corresponding weighted graph, thus exploring the relationship between the spectral properties of a Laplacian of the graph and the cost of minimum

partitions of the hypergraph. Bolla [12] also studies the spectral properties

of the Laplacian matrix associated to an unweighted hypergraph, defining a connection between this matrix and the minimum cut of the hypergraph itself.

Zhou et al. [14] focus on a generalization of Normalized Cut, a famous approach

traditionally used in pairwise clustering: their method partitions the original hypergraph using the smallest eigenvector of its normalized Laplacian.

(14)

Other methods used to investigate higher-order relations are based on

factor-ization: for example, Shashua et al. [19] interpret the hypergraph clustering

problem through a non-negative factorization of the closest hyper-stochastic version of the tensor describing the affinities between the items of a given data set. As for the approaches used in VLSI design, many are structured as multi-level methods: initially they construct a hierarchy of coarser and coarser hypergraphs according to some measure of homogeneity; then, they greed-ily explore this hierarchy to build and update the partitioning obtained using

the coarsest hypergraph [20]. Finally, Pelillo and Rota Bulò [5] propose a

hy-pergraph clustering technique based on Game Theory: by giving a rigorous definition of the typical properties of any cluster, they interpret the clustering process as a multi-player, super-symmetric game naturally evolving towards an evolutionary stable strategy, which in turn corresponds to a cluster.

2.2 u n i f o r m h y p e r g r a p h c l u s t e r i n g

While, theoretically, hypergraphs may have edges of different cardinalities, in practice most of the available approaches in literature focus on uniform hyper-graphs or k-hyper-graphs, a class of hyperhyper-graphs whose edges have a fixed cardi-nality k > 2. Formally, any uniform hypergraph can be defined in terms of a triplet H = (V, E, ω) where:

0.2

0.3

x₁ x₂

x3 x4

(15)

• V ={1, . . . , n} is the finite set of nodes or vertices;

• E ⊆ 2V_{\{0} is the finite set of hyperedges of fixed cardinality k > 2;}

• ω : E → R+ _{is a function mapping each hyperedge e ∈ E to a real}

positive value1

.

Any instance of the hypergraph clustering problem can be represented in terms of a uniform hypergraph H = (V, E, ω), where each node v ∈ V maps one of the objects to be clustered and each k-edge e ∈ E represents a higher-order relationship existing between k > 2 different objects. The numerical represen-tation of the strength of these relationships is quantified by function ω, which associates a weight to each hyperedge.

2.2.1 A game-theoretic approach to hypergraph clustering

The game-theoretic approach proposed in [5] is one of the several methods

developed in recent years to solve the hypergraph clustering problem. The starting point of this solution is the observation that, despite the lack of a universal definition, any cluster can be informally seen as a finite group of entities conforming to two basic conditions:

• Internal coherency: the elements of a cluster should all have high mutual similarities;

• External incoherency: the elements outside a cluster should be highly dis-similar to the ones inside, which means that the internal coherency of a cluster should decrease when one or more external elements are intro-duced.

As we will see in the next sections, Game Theory offers a way to elegantly formalize the concept of cluster, incorporating both of these conditions into the classic notion of Evolutionary Stable Strategy (ESS).

1 We are considering only positive similarities, however it must be noted that the frameworks may also work with arbitrary similarities.

(16)

2.2.2 Principles of Evolutionary Game Theory

In Game Theory a game can be defined as a strategic situation modelling

con-flict or cooperation between players, who act as rational decision makers [3].

Any game is formally described as a triplet Γ = (P, S, π) where:

• P ={1, . . . , k} is the finite set of players, where k > 2;

• S ={1, . . . , n} is the finite set of pure strategies available to all players. An

ordered set of pure strategies played by different players is called strategy profile s = (s1, . . . , sk)∈ Sk;

• π : Sk _{→ R is the payoff function assigning a payoff to each strategy}

profile s = (s1, . . . , sk)∈ Sk.

During the 1970s the classical principles of Game Theory were widely applied to biology: Evolutionary Game Theory aimed to understand the evolutionary bases of animal behaviour, focusing on how the strategies of an evolving pop-ulation may change in time. Interestingly, the evolutionary version of Game Theory studies a simplified but functional environment where mutations do not exist, reproduction is asexual and the agents are not supposed to under-stand the rules behind the game they play: much like animals, the players do not display rational thinking but must act according to a preassigned pure strategy, a biologically encoded behaviour that will be transmitted to their off-spring exactly as it is. At each turn of the game a natural evolutionary process randomly extracts the players from a large population, with the goal of testing the ability of the corresponding strategies to survive: as time passes the fittest strategies will spread while the remaining ones will cease to exist, altering the original distribution of these inherited behaviours in the population.

Another important concept in Evolutionary Game Theory is the Evolutionary Stable Strategy (ESS), a refinement of the Nash equilibrium proposed by J.

(17)

over a given population reaches an equilibrium when no strategy can prevail upon the other ones. To understand how this notion is formally defined, let’s

first introduce the expected payoff function u : ∆k→ R:

u(y(1), . . . , y(k)) = X (s1,...,sk)∈Sk π(s₁, ..., sk) k Y i=1 y(i)_s_i (1)

Assume x[k] represents a sequence (x, ..., x) of k identical populations x and

ej describes a n-dimensional vector where every element is set to 0 with the

exception of xj = 1; then, we can define two types of expected payoffs:

• The expected payoff over the whole population x ∈ ∆:

u(x[k]) (2)

• The expected payoff of the j-th player in the population x ∈ ∆:

u(ej, x[k−1]) (3)

If we use these notations then a Nash equilibrium x ∈ ∆ exists when the per-formances of the players in the population are, at best, as good as the overall population expected payoff:

u(ej, x[k−1])_{6 u(x}[k]) ∀j ∈ S (4)

Notice that this definition says nothing about the possible influence of small perturbations on the stability of the equilibrium: as a result, J. Maynard Smith proposed the more general concept of Evolutionary Stable Strategy, a strategy that if adopted by a given population resists any alternative strategy with an

(18)

initially rare distribution [8]. Formally, suppose that a small fraction of

mu-tant agents following strategy distribution y ∈ ∆ appears in a population x ∈ ∆. The resulting population, obtained as the combination of these two types of players, is described by the following equation:

w = (1 − )x + (x) (5)

If the expected payoff of a mutant player is lower than the expected payoff of a regular agent, the evolutionary process will not promote these invading indi-viduals: the strategies they encode will eventually disappear from the mixed population. This means that the original population x is evolutionary stable, resisting to external attacks when the fraction of mutant agents is small enough.

2.2.3 The hypergraph clustering game

Given an edge-weighted k-graph H = (V, E, ω) representing an instance of the

hypergraph clustering problem, [5] suggests that a cluster of H is an ESS of

the corresponding hypergraph clustering game. H is in fact interpreted as a

k-player, non-cooperative clustering game Γ = (P, S, π) where the nodes of the

k-graph encode the finite set of pure strategies S and the payoff function π

is proportional to the affinity between the objects/strategies (v1, ..., vk) ∈ Vk

chosen by the players:

π(v1, ..., vk) =        1 k!ω({v1, ..., vk}) if {v1, ..., vk} ∈ E, 0 otherwise (6)

π is invariant under permutations of the strategy profile (v1, ..., vk), which

translates into a super-symmetric game. The clustering game is played in

(19)

play-ers are randomly extracted from a population x whose state at time t is a n-dimensional array belonging to the standard simplex ∆, which includes all possible states describing such population.

∆ =

x ∈ Rn:X

j∈S

x_j= 1and xj > 0 for all j ∈ S

(7)

As expected, the players don’t play rationally but follow an inherited pattern,

a preassigned strategy j such that xj(t)∈ x represents the fraction of players in

the population at time t programmed to select the jth strategy. As time passes

the initial population, generally corresponding to the barycenter of ∆, evolves according to the Darwinian principle of natural selection and eventually con-verges over an an evolutionary stable population where the fittest strategies not only have survived the evolutionary dynamic, but also represent the

ele-ments of an ESS-cluster. The whole process is guided by the payoff function6,

which is structured to promote the evolution of highly coherent strategies. To obtain an ESS-cluster it is enough to extract the support σ of this converged population x, which corresponds to the group of strategies that are still existing when the evolutionary dynamic is completed:

σ(x) ={j ∈ S : xj> 0} (8)

Notice that the values belonging to the support of the population represent a measure of the degree of membership of the fittest strategies to their cluster.

2.2.4 A formal definition of cluster

As anticipated, the notion of ESS-cluster seamlessly integrates the basic

proper-ties of clusters described at the beginning of section2.2.1. First of all, a measure

(20)

a population x ∈ ∆ is an ESS-cluster then every object belonging to its sup-port has the same average similarity to the cluster, which is in fact the overall

average similarity of the cluster [5].

u(ej, x[k−1]) = u(x[k]) ∀j ∈ σ(x) (9)

As for the external incoherency condition, equation4 already shows that any

entity not belonging to an ESS-cluster has an average similarity to that clus-ter that cannot be greaclus-ter than the overall similarity of the clusclus-ter. Since the ESS notion includes additional stability properties, we can also conclude that the average similarity of an external objects can never be equal to the over-all similarity of a cluster. This means that any perturbation to the support of an ESS-cluster x ∈ ∆ lowers the overall similarity of the cluster, providing a

measure of its external incoherency [5].

u(ej, x[l−1]) < u(x[k]) ∀j /∈ σ(x) (10)

2.3 n o n-uniform hypergraph clustering

In the previous section we described a model that solves the hypergraph clus-tering problem by interpreting it as a clusclus-tering game. One of the assumptions on which that model is based is that the hypergraph encoding the clustering process is a k-graph: all edges must have the same fixed cardinality k > 2. Is it reasonable, however, to extend the previous method to the more general class of non-uniform hypergraphs, whose hyperedges are not required to all have the same cardinality? And which advantages could we exploit by clustering using non-uniform hypergraphs?

(21)

Formally, any non-uniform hypergraph can be defined as the quadruplet H = (V, E, R(H), ω), where: 0.2 0.5 0.3 x₁ x₂ x3 x4

Figure 2: A weighted non-uniform hypergraph with cardinalities k1= 2and k2= 3.

• V ={1, . . . , n} is the finite set of nodes or vertices;

• E ⊆ 2V_{\{0} is the finite set of hyperedges of various cardinalities;}

• R(H) ={|Ek| : Ek ∈ E} is the finite set of edge types of H, containing all

the possible cardinalities of the hyperedges [27];

• ω : E → R+ _{is a function mapping each hyperedge to a corresponding}

real positive value.

As in classic hypergraph clustering, any instance of the non-uniform graph clustering problem can be represented in terms of a non-uniform hyper-graph H = (V, E, R(H), ω), where each node v ∈ V maps one of the objects to be clustered while each edge e ∈ E represents a relationship between a group of different objects. An important difference must be observed, however: the number of nodes bounded together by each hyperedge varies depending on the cardinalities listed in R(H). As a result, function ω not only quantifies the strength of these relationships, but can also differentiate between the various edge types of the hyperedges it examines. Here lies the potential of using non-uniform hypergraphs: by relaxing the assumption on the cardinalities of H, we are allowed to take into account all the possible similarities between arbitrary

(22)

groups of data points, without forcefully casting them into relations between a fixed number of elements. As a result, non-uniform hypergraphs are suitable to solve the clustering problem with a higher level of precision, differentiating be-tween the various models the data points may conform to without restrictions or potential loss of information.

2.3.1 Multi-model fitting in literature

Much like in the hypergraph clustering literature, one of the main challenges of multi-model fitting is producing a robust estimation of the models of a given data set without a priori knowledge of the actual number of models. The existing approaches for multi-model fitting can be organized into two main groups, parametric and non-parametric methods. Among the most famous non-parametric techniques we can count multi-model estimators such as

Mul-tiRANSAC [22] and FLoSS [23]. Both are based on RANSAC, a method that

despite supporting only one model at a time, is particularly robust when the number of outliers in the data is rather large. In general terms, RANSAC se-lects the most probable model for a given data set by first producing a large number of model proposals, which in turn are obtained by randomly sam-pling the available data; then, it selects the model with the largest set of inliers

with respect to some threshold [31]. As we can see RANSAC and all the

non-parametric methods extending it are based on consensus analysis, where models are selected by analyzing the distribution of residual for each model.

A different approach using preference analysis is proposed by RHA [24]:

in-stead of evaluating consensus it relies on the distribution of residuals of single

data points with respect to the models [25]. J-linkage uses a somewhat similar

technique, avoiding consensus analysis and working instead with a conceptual representation of the points, which are described in terms of the

characteris-tic function of the set of models they prefer [26] [25]. An improvement of

J-Linkage, called T-Linkage, has been recently presented: it replaces the binary preference analysis of J-Linkage with a continuous generalization that offers

(23)

better performances [25]. Chin et al. [15] further exploit residual information

by representing points in terms of a permutation of models; this representation is then used to separate inliers from outliers. The data is then over-clustered with kernel-PCA and spectral clustering, and the resulting structures are

se-quentially merged together using model selection criteria [25]. Another way

of approaching multi-model selection in Computer Vision involves the mini-mization of a cost function composed by two elements, the quality of fit and a penalty measure weighted on the complexity of the model. Several mini-mization techniques such as SA-RCM, ARJMC or PEaRL have been suggested

[16] [17] [31]. Finally, space segmentation has been gaining a certain momentum:

LSA, for example, uses local information around points to fit subspaces. The clustering step is instead supported by pairwise similarities computed using

angles between the local subspaces [29].

2.3.2 Advantages of non-uniform hypergraphs

Non-uniform hypergraph clustering allows a higher level of precision and flex-ibility that clustering using simple k-graphs simply cannot offer. As a telling example, consider the problem of grouping a set of d-dimensional Euclidean points into segments and circles. Interpreting this data set using a uniform 3-graph certainly causes a substantial loss of information: while it makes perfect sense to define similarities over triplets of points in terms of how close they are to being collinear, hyperedges sized 3 completely ignore any information that might instead be gathered from cocircularity, which is needed if we want to extract both the segments and the circles of the data set with a higher degree of precision; any triplet of points is in fact always cocircular. The same con-siderations apply to uniform 4-graphs: as before, while defining similarities over quadruplets of points in terms of how close they are to being cocircular seems to be a suitable solution, 4-edges are meaningful to evaluate cocircularity only, completely discounting any precise information that might be obtained from collinearity and thus causing another important loss of information. No

(24)

matter the amount of their edges or nodes, uniform hypergraphs show their shortcomings with complex data sets containing multiple models that need to be clustered: k-graphs can only focus on a single measure of dissimilarity, ignoring any other significant information that might instead be obtained by using non-uniform hypergraphs.

−2 0 2 4 6 −2 −1 0 1 2

Figure 3: Example of a data set containing both cocircular and coplanar points.

In fact, once we relax the assumption that only k-graphs can be used when dealing with higher-order relationships between objects, we can take advtage of all the properties of the models that might be in the data set we are an-alyzing: non-uniform hypergraphs translate into more precise similarity mea-surements, which in turn produce more accurate clusters that can be extracted in a single shot, so to speak, even when they conform to different models. Get-ting back to our previous example, a non-uniform hypergraph H with edge

types R(H) = {3, 4} would be the most suitable solution to cluster the data

set since it maximizes all of the information that can be obtained from both collinearity and cocircularity, thus making our analysis no longer limited to a single model.

(25)

2.3.3 Parallels, or lack thereof, with the clustering game

So far we have seen the advantages of employing non-uniform hypergraphs to solve the clustering problem. There is, however, one main setback that must be discussed: once we switch to non-uniform hypergraph clustering we lose the clustering game metaphor seen in the previous pages. To understand why, we must remember that the evolutionary dynamic needed to converge on the ESS-clusters is structured as a game: at each turn k agents are ran-domly extracted from a population x to play the game. This fixed number of players corresponds to the single cardinality k of the uniform hypergraph H = (E, V, ω) used to solve the corresponding hypergraph clustering problem: while the agents playing the evolutionary game are randomly extracted from the population, their number is fixed and depends on the single edge type of the uniform hypergraph. When we consider non-uniform hypergraphs, how-ever, the number of edge types increases by definition: as a result, at each turn the game no longer has a fixed number of players, thus making the game metaphor invalid. As we will see in the next chapter, however, we can still take advantage of how the evolution towards an ESS-cluster is structured in the case of uniform hypergraph clustering, and cast this dynamic to the more general case of non-uniform hypergraphs.

(26)

3

C L U S T E R E X T R A C T I O N W I T H N O N - U N I F O R M

H Y P E R G R A P H S

The extraction of the clusters of any instance of the non-uniform hypergraph clustering problem is unfortunately a computationally hard task. Still, heuris-tics can been used to approximate good solutions. Indeed, suppose a non-uniform hypergraph H = (V, E, R(H), ω) represents an instance of the

hyper-graph clustering problem; then, as proved in [5], its clusters are in a one-to-one

correspondence with the strict local maximizers of the following function f(x) over the standard simplex ∆:

f(x) = X k∈R(H) u(x[k]) = X k∈R(H)   X e∈Ek ω(e)Y j∈e x_j   (11)

This means that the problem of extracting the clusters can be transformed into the problem of discovering a strict local maximizer of f(x) in ∆, thus allowing us to use standard optimization techniques to solve an otherwise

computationally hard task. Notice that equation 11 takes into account the

various edge types of H and thus describes a non-homogeneous polynomial with non-negative coefficients. As an example, consider the polynomial obtained from

the hypergraph in fig. 4:

f(x1, x2, x3, x4) = 1.5(x1x2x4) + 0.1(x1x3) + 0.7(x3x4) (12)

As we can see, this polynomial is clearly non-homogeneous since its terms have different degrees.

(27)

1.5

0.7 0.1

x₁ x₂

x₃ x₄

Figure 4: A weighted non-uniform hypergraph with cardinalities k1= 2and k2= 3.

To maximize the arbitrary polynomial described by 14 it is possible to use

the Baum-Sell inequality [4], an extension of the Baum-Eagon inequality aimed

at providing an iterative solution to maximization problems involving non-homogeneous polynomial functions.

3.1 t h e b au m-eagon and baum-sell inequalities

Developed in the late 1960s, the Baum-Eagon and Baum-Sell inequalities are iterative tools used to maximize homogeneous and non-homogeneous polyno-mial functions in probability domains. The Baum-Eagon inequality has served as the basis for many statistical estimation techniques developed within the theory of probabilistic functions of Markov chains, and it has also been used

in the field of speech recognition [32]. The Baum-Sell inequality has instead

be used in the analysis of the asymptotic behaviour of growth transformation

in the vicinity of local extrema [32]. In practice, the Baum-Eagon inequality

states that if Q(x) is a homogeneous polynomial in the variables xj with

non-negative coefficients and x ∈ ∆, we can define a mapping z = M(x) from ∆ to itself as follows: zj= xj ∂Q(x) ∂x_j Xn l=1 xl ∂Q(x) ∂x_l j = 1, ..., n (13)

(28)

As a result, Q(M(x)) > Q(x) unless M(x) = x. With words, this result means that not only is Q(x) smaller than Q(M(x)), but Q(x) is also less than the value of P at any point on the segment joining x to M(x). To extend this inequality to non-homogeneous polynomials Baum and Sell observed that M increases Q homotopically, which means that for all 0 6 µ 6 1,Q(µM(x) + (1 − µ)x > Q(x), with equality if and only if M(x) = x. This result is exactly what we need to

maximize the non-homogeneous polynomial described by function11.

3.1.1 Parallels with dynamical system theory

Interestingly, both inequalities can be interpreted from the point of view of

dy-namical system theory [5]: the operator M in fact defines a discrete dynamical

system. Now, it might be useful to understand how M behaves when in proxim-ity of its equilibrium points. In the theory of dynamical systems an equilibrium point x is stable if the system remains near x whenever started sufficiently close to it. Additionally, the equilibrium point x is asymptotically stable if x is also a local attractor, meaning that as time passes the system tends towards x. To establish whether x is a local attractor we can use Lyapunov’s direct method, which consists of seeking strict Lyapunov’s functions, continuous and real-value functions strictly increasing along non-constant trajectories. The inequalities seen before state that the polynomial Q is basically a Lyapunov function for the discrete-time system defined by M.

3.2 a p p l i c at i o n t o n o n-uniform hypergraph clustering

If we apply the Baum-Sell inequality to the function describing a non

homoge-neous polynomial11we obtain the following discrete-time dynamics:

x_j(t + 1) = x_j(t) X k∈R(H) u(ej_{, x(t)}[k−1]₎ u(x(t)[k]₎ j = 1, . . . , n (14)

(29)

In practice, suppose we have a non-uniform hypergraph H = (V, E, ω, R) much

like the one in fig. 2: the corresponding polynomial described by f(x) can be

rearranged as the sum of as many homogeneous polynomials as the edge types of R(H). As a result, the partial derivative of f(x) can be computed as the sum of the partial derivatives of each homogeneous polynomial composing f(x).

Notice that the following equivalences are at work in function14:

∂f(x) ∂x_j = 1 ku(e j , x(t)[k−1]), j = 1, . . . , n (15) n X l=1 x_l∂f(x) ∂x_l = 1 ku(x(t) [k]₎ ₍₁₆₎

Due to the lack of a clustering metaphor we are not formally guaranteed that the clusters obtained using this discrete-time dynamics will conform to the basic conditions of internal coherency and external incoherency. Still,

heuris-tically, it is indisputable that 14 is structured to promote the highly-coherent

data points and eliminate the others. Furthermore, getting back to our previ-ous parallel with dynamic system theory, since every cluster is a strict local maximizer of function f(x) in ∆, we can also conclude that a point x ∈ ∆ is a cluster of an instance of a non-uniform hypergraph clustering problem if and only if it is also a local attractor for the dynamics. Finally, observe that the approach extracts only one cluster at a time: in case of data sets contain-ing several non-overlappcontain-ing clusters, a typical technique we can use to extract all of them consists of peeling off the vertices of the uniform hypergraph that have already been extracted in a cluster and reiterating the approach on the remaining nodes.

3.2.1 Cluster extraction with uniform hypergraph

Intuitively, the extraction of the ESS-clusters is a special case of the previous analysis on clustering with non-uniform hypergraphs. The clusters we are

(30)

looking for are in fact in a one-to-one corrispondence with the strict local max-imizers of the following function f(x) over the standard simplex ∆:

f(x) = u(x[k]) = X

e∈E

ω(e)Y

j∈e

xj (17)

Notice that function17implicitly takes into consideration the single edge type

of the k-graph used to solve the hypergraph clustering problem. To obtain a strict local maximizer of f(x) we can use the Baum-Eagon inequality presented

in the previous pages [5] since 17 describes a homogeneous polynomial with

non-negative coefficients, thus obtaining the following discrete-time dynamics to extract the ESS-clusters:

x_j(t + 1) = x_j(t)u(e

j_{, x(t)}[k−1]₎

u(x(t)[k]₎ j = 1, . . . , n (18)

Notice that function18perfectly embodies the evolutionary process described

in section 2.2.3: since u(ej, x(t)[k−1]) represents the expected payoff of a

j-strategist in the population x while u(x(t)[k]₎ _{is a measure of the expected}

population payoff, in time the fittest strategies – those whose expected payoffs

u(ej_{, x(t)}[k−1]₎ _{is greater than the expected population payoff u(x(t)}[k]₎_{– will}

(31)

4

E X P E R I M E N T S O N S Y N T H E T I C D ATA S E T S

This chapter focuses on the effectiveness of the two approaches described in the previous chapters: the game-theoretic hypergraph clustering technique and its

extension to non-uniform hypergraphs. Following the instructions in [5] we

performed two different types of experiments:

• Experiments on correctness: given a data set containing known clusters of points, each approach should return the very same clusters even when the points’ coordinates are perturbed;

• Experiments on robustness to clutter items: given a data set containing known clusters of points and a variable amount of outliers, each ap-proach should return the clusters and discard the outliers.

In order to complete these tests several data sets containing collinear, coplanar and cocircular groups of points have been produced; we will provide a thor-ough description of them in the next sections, as well as an extensive analysis of the performances of each approach. Lastly, it’s worth mentioning that both techniques have been implemented in Matlab. The code is not optimized. All the experiments have been executed on an Intel Core i5 3GHz, with 16GB of RAM. The total number of iterations per cluster is always less than 100 while the execution time is proportional to the number of hyperedges: as an

exam-ple, with data setsCSOandSO2we obtain the average execution times showed

in fig. 5. Notice that, as usual, each box plot represents the minimum (bottom

line), maximum (top line), median (red line) execution time of 10 iterations of

the approach on either CSO and SO2; outliers, if present, are also shown.

In-terestingly most box plots are rather small in size, meaning that the execution time does not vary much. The only exception is the box plot obtained using

(32)

SO2and 200k hyperedges: this is understandable since the peel off phase of the approach might leave a varying number of hyperedges intact, thus influencing the overall execution time of the algorithm.

Total number of hyperedges 25k 50k 100k 200k Execution time (s) 5 10 15 20 25 30 35

Total number of hyperedges 25k 50k 100k 200k Execution time (s) 0 5 10 15 20 25

Figure 5: Average execution times with data setCSO(left) andSO2.

4.1 e x p e r i m e n t s o n u n i f o r m h y p e r g r a p h c l u s t e r i n g

The first experiments we perform focus on the performances of the

game-theoretic hypergraph clustering approach. Following the examples in [5], we

developed two data sets to test the correctness of this technique:

• Non-intersected segments (NIS): three groups of 20 collinear points in the

[−2, 2]3 _{space, perturbed with different levels of local Gaussian noise}

(σ = 0, 0.01, 0.02, 0.04, 0.08). The points are positioned so that their 2-dimensional projections are not intersected;

(33)

• Intersected segments (IS): three groups of 20 collinear points in the [0, 4]3 space, perturbed with different levels of local Gaussian noise (σ = 0, 0.01,

0.02, 0.04, 0.08). The points are positioned so that their 2-dimensional

projections are intersected.

The tests on robustness have instead been conducted over a third data set:

• Segments and outliers (SO): two groups of 20 collinear points plus 10, 20 or

40 outliers in the [−2, 4]3 space, perturbed with σ = 0.01. The collinear

points are positioned so that their 2-dimensional projections are inter-sected.

The hypergraph used to support the clustering process is a 3-graph: since any two points in the space trivially define a line, 3-edges are suitable to maximize the information obtained from collinearity. Notice that due to computational reasons only a subset of all the possible hyperedges are considered in the ex-periments; to represent all possible hyperedges fairly we extracted at a fixed rate a limited number of unique hyperedges from the pool of all possible hyper-edges. The affinity between the points composing each hyperedge is measured as the average deviation of such points from their best fitting line. This affin-ity is then transformed into an actual weight using a Gaussian kernel, where

d(i, j, k) represents the average orthogonal distance of points {i, j, k} from the

best fitting line. β is instead a precision parameter.

ω({i, j, k}) = exp(−βd(i, j, k)2) (19)

As for the actual implementation of the approach, we assume that the discrete-time dynamics stop when the following condition is verified:

∃xj: q

(34)

The support of x is obtained by extracting all elements xj > γfrom the popula-tion x. Notice that both and γ are threshold parameters that must be tuned for each data set. Finally, both the hyperedges and the corresponding weights

are stored into simple matrices1

.

4.1.1 Tests on correctness

The first batch of experiments uses data sets NISandISto verify whether the

approach groups the available points in the correct clusters, even when the points’ coordinates are increasingly perturbed. The parameters used in the implementation are displayed in the table below:

β γ Edges

NIS 10000 1e − 15 1e − 6 10000

IS 10000 1e − 15 1e − 6 10000

Table 1: Parameters used forNISandIS.

The results show that the approach indeed works as intended: when σ =

0, 0.01 or 0.02 the implementation correctly groups each set of collinear points

into different clusters (figg. 6-10, 22-26). Similar results are also obtained

when σ is set to 0.04: occasional misclassifications might occurr but the overall performance of the approach is still solid. Finally, when σ is 0.08 the number of classification errors increases significantly due to the higher perturbation level. Notice that when σ = 0.08 we need to fine-tune the termination parameters and γ to extract all points correctly: this is due to the effect of the Gaussian noise, which reduces the degree of membership of the elements of a cluster.

1 Initially the implementation used a nksimilarity tensor to store the weights of the hypergraph. This solution however becomes unpracticable once the number of nodes – and cardinalities, as we will see in the following sections – increases.

(35)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Segment 1 Segment 2 Segment 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cluster 1 Cluster 2 Cluster 3 Figure 6:NIS, σ = 0. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Segment 1 Segment 2 Segment 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cluster 1 Cluster 2 Cluster 3 Figure 7:NIS, σ = 0.01. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Segment 1 Segment 2 Segment 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cluster 1 Cluster 2 Cluster 3 Figure 8:NIS, σ = 0.02.

(36)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Segment 1 Segment 2 Segment 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cluster 1 Cluster 2 Cluster 3 Figure 9:NIS, σ = 0.04. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Segment 1 Segment 2 Segment 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cluster 1 Cluster 2 Cluster 3 Figure 10:NIS, σ = 0.08. 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 11:IS, σ = 0.

(37)

0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 12:IS, σ = 0.01. 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 13:IS, σ = 0.02. 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 14:IS, σ = 0.04.

(38)

0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Cluster 1 Cluster 2 Cluster 3 Figure 15:IS, σ = 0.08.

The precision parameter β significantly modifies the results: with smaller val-ues the first cluster usually contains the points closest to the intersection of the segments no matter the perturbation level; with a higher β the results are instead more accurate, which translates into less misclassification errors even

when σ is set to 0.08 (fig. 16). Fig. 17shows the number of classification errors

produced using 10 randomized data sets containing three groups of collinear points each: higher perturbation levels increase the median amount of errors, but this behaviour is to be expected since we are working with more complex data sets. When σ is set to either 0, 0.01 or 0.02, instead, the box plots are flattened on the median since we have no misclassifications.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Cluster 1 Cluster 2 Cluster 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Cluster 1 Cluster 2 Cluster 3

(39)

Perturbation levels 0 0.01 0.02 0.04 0.08 Number of errors 0 1 2 3 4 5

Figure 17: Misclassifications rates usingNISandIS.

4.1.2 Tests on robustness to clutter items

The second batch of tests uses data set SO to verify whether the approach

clusters the available data correctly, disregarding the outliers placed in the

[−2, 4]3 space. The parameters used in the implementation are showed in the

table below:

β γ Edges

SO 10000 1 − e30 0.03 10000

Table 2: Parameters used forSO.

Notice that smaller threshold values are needed to let the dynamic evolve: greater values would cause the dynamic to stop after few interations because of the small values assumed by the random outliers in the population x.

(40)

-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Segment 1 Segment 2 Outliers -2 -1 0 1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 Cluster 1 Cluster 2 Cluster 3 Figure 18:SO, 10 outliers. -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Segment 1 Segment 2 Outliers -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Cluster 1 Cluster 2 Cluster 3 Figure 19:SO, 20 outliers. -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Segment 1 Segment 2 Outliers 0.5 1 1.5 2 2.5 3 3.5 4 -2 -1 0 1 2 3 4 Cluster 1 Cluster 2 Cluster 3 Figure 20:SO, 40 outliers.

(41)

The results in figg. 18-20 show that no matter the amount of outliers the

ap-proach works efficiently, clustering only the interesting points and discarding most of the remaining objects in the space. Interestingly, some outliers are sometimes added to garbage clusters – clusters usually extracted during the last iterations of the evolutionary dynamic. Our tests have shown that the proba-bility of extracting large clusters of outliers is very small: as a result, we can assume that garbage clusters containing a limited amount of objects can be safely rejected. To understand the effects of this assumption on our tests we have computed the misclassification rate (MSE) as follows:

MSE = Number of misclassifications

Total number of points (21)

Fig. 21shows the results of this analysis onSO: as expected, if we set a higher

threshold to the number of points a group must contain to be considered a cluster we obtain extremely precise clusters; on the contrary, a lower threshold leaves most if not all of the garbage clusters intact, causing an increased mis-classification rate. To conclude, the results examined in this section show that the approach is indeed both correct and robust to clutter objects.

4.2 e x p e r i m e n t s o n n o n-uniform hypergraph clustering

The second group of experiments we perform is about the correctness and the robustness to clutter of the extended approach on non-uniform hypergraphs. The more general goal of these tests is to assess whether the extended approach works at least as well as its original predecessor on very similar data sets. From a practical point of view the implementation used for the game-theoretic ap-proach on hypergraph clustering has been adapted to use non-uniform hyper-graphs: it is enough to alter the computation of the time-discrete dynamics to accommodate polynomials of different degrees and to use separate similarity

(42)

1 2 3 4 5 6 Rejection threshold 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 MSE 40 Outliers 20 Outliers 10 Outliers

Figure 21: Misclassification rates usingSO.

matrices, one for each cardinality listed in R(H). Data set IS2 was produced

specifically to study the correctness of the method:

• Intersected segments (IS2): three groups of 10 collinear points in the [0, 4]3

space, perturbed with different levels of local Gaussian noise (σ = 0, 0.01, 0.02, 0.04, 0.08). The points are positioned so that their projections on the

2-dimensional space are intersected.

Data setSO2was instead created for the experiments on robustness to clutter:

• Segments and outliers (SO2): two groups of 20 collinear points plus 10,

20 or 40 outliers in the [−3, 4]3 space, perturbed with σ = 0.01. The

collinear points are positioned so that their 2-dimensional projections are intersected.

The edges of the non-uniform hypergraphs used to cluster the data sets have

(43)

the approach once more than one cardinality is available. As in the previous chapter, due to computational reasons only a subset of all 3-edges and 4-edges are considered in the experiments. The affinity between the points composing each hyperedge is measured as the average deviation of such points from their best fitting line; the corresponding weight is then obtained using a Gaussian kernel: if the number of points per edge is 3 the weights are computed using

25, otherwise we use the formula 26where d(i, j, k, w) represents the average

orthogonal distance of points{i, j, k, w} from the best fitting line and β is again

a precision parameter. Finally, the conditions to stop the dynamics and extract the support are the same as in the previous chapter.

ω({i, j, k, w}) = exp(−βd(i, j, k, w)2) (22)

4.2.1 Tests on correctness

The first batch of experiments onISverifies whether the approach groups the

available points in the correct clusters, even when the point’s coordinates are perturbed. The parameters used in the implementation are in the table below:

β γ Edges (per cardinality)

IS2 10000 1 − e15 1e − 6 10000

Table 3: Parameters used forIS2.

0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 22:IS, σ = 0

(44)

0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 23:IS, σ = 0.01 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 24:IS, σ = 0.02 0 0.5 1 1.5 2 2.5 3 3.5 4 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 25:IS, σ = 0.04

(45)

0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Segment 1 Segment 2 Segment 3 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3 Figure 26:IS, σ = 0.08

The results show that the extended approach works as well as its predecessor on k-graphs: all points are clustered correctly when σ is set to either 0, 0.01,

0.02. When σ is either 0.04 or 0.08, instead, an increased number of

classifica-tion may occur; as before, this is caused by the level of Gaussian perturbaclassifica-tion: the more the points’ coordinates are perturbed, the more the degree of

mem-bership of the elements of a cluster might be reduced (figg. 22-26). Fig. 27

displays the results of repeating the previous experiment using 10 random-ized data sets containing three groups of collinear points: as expected, higher perturbation levels cause a higher error variance. When σ is set to either 0, 0.01 or 0.02, instead, the box plots are reduced to their median values since the approach produces no classification errors. This analysis confirms that the ex-tended approach performs as well as its predecessor on uniform hypergraphs.

4.2.2 Tests on robustness to clutter items

The second batch of experiments on SO2 verifies whether the extended

ap-proach clusters the groups of collinear points ignoring the outliers. The more general goal is to assess whether the results of the new method are comparable to those of its predecessor on k-graphs. The parameters used in the implemen-tation are displayed in the next table:

(46)

Perturbation levels 0 0.01 0.02 0.04 0.08 Number of errors 0 1 2 3 4 5

Figure 27: Number of misclassification errors usingIS2.

SO2 10000 1e − 30 7e − 4 40000

Table 4: Parameters used forSO.

The results in figg. 28-30are in line with what we obtained using the approach

on k-graphs: the extended method correctly converges on the two groups of collinear points, performing no classification errors no matter the amount of outliers scattered in the space. As before, a small number of outliers is often grouped into a third garbage cluster, which can be rejected assuming that the

probability of extracting large clusters of outliers is very small. Fig. 31shows

what happens to the misclassification rate when we reject clusters depending on their size: as expected, a higher threshold corresponds to extremely precise clusters while a lower threshold does not remove the garbage clusters causing an increased percentage of errors. Notice that the misclassification rate is again

(47)

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Segment 1 Segment 2 Outliers 0 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3

Figure 28:SO2, 10 outliers.

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Segment 1 Segment 2 Outliers 0.5 1 1.5 2 2.5 3 3.5 4 0.5 1 1.5 2 2.5 3 3.5 4 Cluster 1 Cluster 2 Cluster 3

Figure 29:SO2, 20 outliers.

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Segment 1 Segment 2 Outliers 0.5 1 1.5 2 2.5 3 3.5 4 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Cluster 1 Cluster 2 Cluster 3

(48)

Rejection threshold 1 2 3 4 5 6 MSE 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 40 Outliers 20 Outliers 10 Outliers

Figure 31: Misclassification rate usingSO2.

All-in-all, the approach on non-uniform hypergraphs appears to be correct and robust in all circumstances: no matter the complexity of the data set, the dy-namic always converges on the desired clusters. Furthermore, its performances are comparable to the ones of the game-theoretic technique on uniform hyper-graphs.

4.3 e x p e r i m e n t s o n m i x e d m o d e l s

Our third round of experiments explores the behavior of the extended ap-proach when segments, planes and circles are placed in the same data set, with the general goal of verifying whether the clusters extracted by the algorithm are still correct and robust to clutter items.

(49)

4.3.1 Tests with planes and segments

The first batch of tests focuses on a mixed data set containing both a collinear and a coplanar group of points. The hypergraph H used to support the

clus-tering process has two cardinalities, R(H) = {3, 4}: since any couple of points

is always collinear and any triplet of points is always coplanar, 3-edges are suitable to gather information from collinearity while 4-edges are perfect for coplanarity. The weight of each hyperedge is obtained as follows:

• Triplets. Given three points {x, y, z} composing a 3-edge, we first

com-pute their affinity as the average deviation of three points from their best fitting line, which is computed by minimizing the normal quadratic dis-tance of the given triplet of points. Then, we transform this affinity into an actual weight using the following Gaussian kernel, where d(x, y, z) represents the mean distance of the points from their best fitting line:

ω({x, y, z}) = exp(−βd(x, y, z)2) (23)

• Quadruplets. Given four points{x, y, z, k} composing a 4-edge, their

affin-ity is computed as their average deviation from their best fitting plane, which is obtained by minimizing the normal quadratic distance of the given quadruplet of points. As before, we transform this affinity into an actual weight using the following Gaussian kernel, where d(x, y, z, k) is the mean distance of the points from their best fitting plane:

ω({x, y, z, k}) = exp(−βd(x, y, z, k)2) (24)

Two data sets have been produced to assess the correctness and the robustness to clutter of the approach:

• Plane and segment (PS): 20 coplanar points and 20 collinear points in the

(50)

0.01, 0.02, 0.04 or 0.08), The points are placed so that their 2-dimensional projections are intersected.

• Plane, segment and outliers (PSO): 20 coplanar points, 20 collinear points

and outliers (10, 20 or 40) scattered on the [0, 4]3 space. The points are

perturbed with σ = 0.01 and are placed so that their 2-dimensional pro-jections are still intersected.

The remaining parameters used in the implementation are showed below:

PS 10000 1 − e5 5e − 05 50000

PSO 10000 1e − 10 1e − 03 50000

Table 5: Parameters used forPSandPSO.

Notice that due to computational reasons the hypergraphs used in the clus-tering process contain a subset of all the possible hyperedges. The discrete

dynamic still stops when condition20 is met, and the support of x is still

ex-tracted using parameter γ. The results of the tests on correctness are shown

in figg. 32-36: the approach converges on the correct clusters even when the

points’ coordinates are very perturbed. As expected, most misclassifications are performed when σ is set to 0.04 or 0.08; when σ is instead set to 0, 0.01 or 0.02 the approach usually clusters all points correctly, making no mistakes. Statistics about these tests, obtained repeating the approach 10 times over ran-domized data sets containing collinear and coplanar points, are shown in fig.

37. As expected, the median number of misclassification errors increases with

the perturbation levels of σ: the more the points’ coordinates are perturbed, the wider is the box plot representing the minimum, maximum and median number of classification errors. When σ is set to either 0, 0.01 or 0.02 the box plots are reduced to their median values exactly as before, since the approach produces no classification errors.

(51)

-2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Segm. Plane -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Cluster 1 Cluster 2 Figure 32:PS, σ = 0. -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 3 ₄ 4 Segm. Plane -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 3 ₄ 4 Cluster 1 Cluster 2 Figure 33:PS, σ = 0.01. -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Segm. Plane -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Cluster 1 Cluster 2 Figure 34:PS, σ = 0.02.

(52)

-2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Segm. Plane -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 ₃ 4 4 Cluster 1 Cluster 2 Cluster 3 Figure 35:PS, σ = 0.04. -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 3 4 4 Segm. Plane -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 3 4 4 Cluster 1 Cluster 2 Cluster 3 Figure 36:PS, σ = 0.08.

As for the experiments on robustness to clutter objects, the approach produces

accurate results (figg. 38-40). Depending on where the outliers are placed we

might have some misclassifications or additional garbage clusters; assuming the probability of extracting large clusters of outliers is very small, however,

we can safely reject these additional groups of points. Fig. 41shows how the

misclassification rate increases the lower we set the threshold used to reject the garbage clusters.

(53)

0 0.01 0.02 0.04 0.08 Perturbation levels 0 0.5 1 1.5 2 2.5 3 3.5 4 Number of errors

Figure 37: Number of misclassification errors withPS.

-2 0 -1 0 0 1 2 2 1 3 4 2 ₃ 4 4 Segment Plane Outliers -2 0 -0.5 0 0 0.5 1 2 1 1.5 2 2.5 2 3 4 4 Cluster 1 Cluster 2 Cluster 3

Figure 38:PSO, 10 outliers.

-2 0 -1 0 0 1 2 2 1 3 4 2 3 ₄ 4 Segment Plane Outliers -2 0 -0.5 0 0.5 0 1 1.5 2 1 2 2.5 3 2 3 ₄ 4 Cluster 1 Cluster 2 Cluster 3

(54)

-2 0 -1 0 0 1 2 2 1 3 4 2 3 ₄ 4 Segment Plane Outliers -2 0 -1 0 0 1 2 2 1 3 4 2 3 ₄ 4 Cluster 1 Cluster 2 Cluster 3

Figure 40:PSO, 40 outliers.

1 2 3 4 5 6 Rejection threshold 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 MSE 40 Outliers 20 Outliers 10 Outliers

Figure 41: Misclassification rates usingPSO.

To conclude, all the previous results show that the approach is indeed correct and robust to clutter items even when multiple models are present into a single data set.

(55)

4.3.2 Tests with circles and segments

The second group of tests uses a mixed data set containing both a collinear and a cocircular group of points. The hypergraph H used to support the clustering

process has again two cardinalities, R(H) = {3, 4}: since any couple of points

is always collinear and any triplet of points is always cocircular, 3-edges and

4-edges are suitable to gather information from both collinearity and

cocircu-larity. Much like in the previous experiments, the weight of each hyperedge is obtained as follows:

• Triplets. Given three points {x, y, z} composing a 3-edge, we first

com-pute their affinity as the average deviation of three points from their best fitting line, which is computed by minimizing the normal quadratic dis-tance of the given triplet of points. Then, we transform this affinity into an actual weight using the following Gaussian kernel, where d(x, y, z) represents the mean distance of the points from their best fitting line:

ω({x, y, z}) = exp(−βd(x, y, z)2) (25)

• Quadruplets. Given four points{x, y, z, k} composing a 4-edge, their

affin-ity is computed as their average deviation from their best fitting circle, which is obtained by minimizing the sum of squared radial deviations of

the points [28]. As before, we transform this affinity into an actual weight

using the following Gaussian kernel, where d(x, y, z, k) is the mean dis-tance of the points from their best fitting plane:

ω({x, y, z, k}) = exp(−βd(x, y, z, k)2) (26)

As with the previous data set we are interested in verifying both the correctness and the robustness of the approach using two data sets:

• Circle and segment (CS): 20 coplanar points and 20 collinear points in the

(56)

0.01, 0.02, 0.04 or 0.08). The points are positioned so that the circle and the segment are intersected.

• Circle, segment and outliers (CSO): 20 coplanar points, 20 collinear points

and a variable amount of outliers (10, 20 or 40) in the [−2, 3]3 space.

The points are perturbed with σ = 0.01 and are positioned so that all components of the data set are intersected.

The remaining parameters used in the implementation are displayed below:

CS 10000 1e − 10 1e − 04 50000

CSO 10000 1e − 18 1e − 04 50000

Table 6: Parameters used forCSandCSO.

The results of the tests on correctness are shown in figg. 42-46: when σ is

set to 0, 0.01 or 0.02 each cluster is extracted with precision and the approach performs no misclassification errors. With higher levels of perturbation such as σ = 0.04 or σ = 0.08 a higher and higher number of classification errors is instead produced. Notice that some of the misclassifications obtained when

σ = 0.08 can be removed by simply tweaking the termination parameters of

the approach: fig. 47shows the results obtained when both parameters and

γare lowered. -1 0 -0.2 0 0.2 -1 0.4 0.6 1 0 0.8 1 1.2 1 2 ₂ 3 Segment Circle -1 0 -0.2 0 0.2 -1 0.4 0.6 1 0 0.8 1 1.2 1 2 ₂ 3 Cluster 1 Cluster 2 Figure 42:CS, σ = 0.

(57)

-1 0 -0.2 0 0.2 -1 0.4 0.6 1 0 0.8 1 1.2 1 2 ₃ 2 Segment Circle -1 0 -0.2 0 0.2 -1 0.4 0.6 1 0 0.8 1 1.2 1 2 ₃ 2 Cluster 1 Cluster 2 Figure 43:CS, σ = 0.01. -1 0 -0.5 0 -1 0.5 1 0 1 1.5 1 2 ₂ 3 Segment Circle -1 0 -0.5 0 -1 0.5 1 0 1 1.5 1 2 ₂ 3 Cluster 1 Cluster 2 Figure 44:CS, σ = 0.02. -2 -0.2 0 0 0.2 -2 0.4 0.6 -1 0.8 1 0 1.2 1 ₂ 2 3 Segment Circle -2 -0.2 0 0 0.2 -2 0.4 0.6 -1 0.8 1 0 1.2 1 ₂ 2 3 Cluster 1 Cluster 2 Cluster 3 Figure 45:CS, σ = 0.04.

(58)

-2 -0.5 0 0 -2 0.5 -1 1 0 1.5 1 ₂ 2 3 Segment Circle -2 -0.5 0 0 -2 0.5 -1 1 0 1.5 1 ₂ 2 3 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Figure 46:CS, σ = 0.08. -2 -0.5 0 0 -1 0.5 0 1 1.5 1 2 ₂ 3 Segment Circle -2 -0.5 0 0 -1 0.5 0 1 1.5 1 2 ₂ 3 Cluster 1 Cluster 2 Cluster 3

Figure 47:CS, σ = 0.08, with tweaked termination parameters.

0 0.01 0.02 0.04 0.08 Perturbation levels -0.5 0 0.5 1 1.5 2 2.5 3 Number of errors