Packing to fewer dimensions

(1)

Packing to fewer dimensions

(for space compression and query speedup)

Paolo Ferragina

Dipartimento di Informatica Università di Pisa

(2)

Speeding up cosine computation

 What if we could take our vectors and

“ pack” them into fewer dimensions (say 50,000100) while preserving distances?

 Now, O(nm) to compute cos(d,q) for all n docs

 Then, O(km+kn) where k << n,m

 Two methods:

 Latent semantic indexing

 Random projection

(3)

Briefly



LSI is data-dependent

 Create a k-dim subspace by eliminating redundant axes

 Pull together “ related” axes – hopefully

 car and automobile



Random projection is data-independent

 Choose a k-dim subspace that guarantees good stretching properties with high

probability between any pair of points.

What about polysemy ?

(4)

Latent Semantic Indexing

courtesy of Susan Dumais

Sec. 18.4

(5)

Notions from linear algebra

 Matrix A, vector v

 Matrix transpose (A^t)

 Matrix product

 Rank

 Eigenvalues and eigenvector v: Av = v

Example

(6)

Overview of LSI

 Pre-process docs using a technique from linear algebra called Singular Value

Decomposition

 Create a new (smaller) vector space

 Queries handled (faster) in this new space

(7)

Singular-Value Decomposition

 Recall m  n matrix of terms  docs, A.

 A has rank r  m,n

 Define term-term correlation matrix T=AA^t

 T is a square, symmetric m  m matrix

 Let U be m  r matrix of r eigenvectors of T

 Define doc-doc correlation matrix D=A^tA

 D is a square, symmetric n  n matrix

 Let V be n  r matrix of r eigenvectors of D

(8)

A ’ s decomposition

 Given U (for T, m  r) and V (for D, n  r) formed by orthonormal columns (unit dot-product)

 It turns out that

A = U  V

^t

 Where  is a diagonal matrix with the singular values (=square root of the

eigenvalues of T=AA^t) in decreasing order.

=

A U  ^V

^t

mn mr rr rn

(9)

The case of r = 2

From Jure Leskovec’s slides (Stanford), this and next ones

=

(10)

The case of r=2

=

(11)

7 5

r

k = projected space size

k

(12)

r = 3

(13)

(14)

(15)

(16)

(17)



 Fix some k << r, zero out all but the k biggest eigenvalues in  [choice of k is crucial]

 Denote by k this new version of , having rank k

 Typically k is about 100, while r (A ’ s rank) is > 10,000

=

U 

_k

^V

^t

Dimensionality reduction

A

_k

document

useless due to 0-col/0-row of _k

m x r r x n

r k k

k

0 0

0

A

m x k k x n

(18)

Guarantee



A

k

is a pretty good approximation to A:

 Relative distances are (approximately) preserved

 Of all m  n matrices of rank k, Ak is the best approximation to A wrt the following measures:

 minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = k

 minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = k2_k+22_r2

 Frobenius norm

||A||

_F2

= 

_2



_2



_r2

(19)

(20)

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

(31)

Interpret as another row in the matrix

(32)

This product U *  gives the projection of users to the 2 new dims, corresponding to the

vectors v1, v2 of V matrix below

(33)

(34)

d

This is a new user that expresses preferences on-the-fly at query time

(35)

(36)

(37)

Recap

 c-th concept = c-th col of U_k(which is m x k)

 U_k[i][c] = strength of association between c-th concept and i-th row (term/user/…)

 V_tk[c][j] = strength of association between c-th concept and j-th column (document/movie/…)

 Projected query: q’ = q Vk

 q ’ [c] = strenght of concept c in q

 Projected column (doc/movie): d’ j = dj Vk

 d ’ j [c] = strenght of concept c in dj

Packing to fewer dimensions