• Non ci sono risultati.

Packing to fewer dimensions

N/A
N/A
Protected

Academic year: 2021

Condividi "Packing to fewer dimensions"

Copied!
20
0
0

Testo completo

(1)

Packing to fewer dimensions

Paolo Ferragina

Dipartimento di Informatica Università di Pisa

(2)

Speeding up cosine computation

What if we could take our vectors and

“ pack” them into fewer dimensions

(say 50,000100)

while preserving distances?

Now, O(nm) to compute cos(d,q) for all n docs

Then, O(km+kn) where k << n,m

Two methods:

“ Latent semantic indexing”

Random projection

(3)

Briefly

LSI is data-dependent

Create a k-dim subspace by eliminating redundant axes

Pull together “ related” axes – hopefully

car and automobile

Random projection is data-independent

Choose a k-dim subspace that guarantees good stretching properties with high

probability between any pair of points.

What about polysemy ?

(4)

Latent Semantic Indexing

courtesy of Susan Dumais

Sec. 18.4

(5)

Notions from linear algebra

Matrix A, vector v

Matrix transpose (A

t

)

Matrix product

Rank

Eigenvalues and eigenvector v: Av = v

Example

(6)

Overview of LSI

Pre-process docs using a technique from linear algebra called Singular Value

Decomposition

Create a new (smaller) vector space

Queries handled (faster) in this new space

(7)

Singular-Value Decomposition

Recall m  n matrix of terms  docs, A.

A has rank r  m,n

Define term-term correlation matrix T=AA

t

T is a square, symmetric m  m matrix

Let U be m  r matrix of r eigenvectors of T

Define doc-doc correlation matrix D=A

t

A

D is a square, symmetric n  n matrix

Let V be n  r matrix of r eigenvectors of D

(8)

A ’ s decomposition

Given U

(for T, m  r)

and V

(for D, n  r)

formed by orthonormal columns

(unit dot-product)

It turns out that A = U SV

t

Where S is a diagonal matrix with the

eigenvalues of T=AAt in decreasing order.

=

A U S V

t

mn mr rr rn

(9)

S

Fix some k << r, zero out all but the k biggest eigenvalues in S [choice of k is crucial]

Denote by Sk this new version of S, having rank k

Typically k is about 100, while r (A ’ s rank) is > 10,000

=

U S

k

V

t

Dimensionality reduction

A

k

document

useless due to 0-col/0-row of k

m x r r x n

r k k

k

0 0

0

A

m x k k x n

(10)

A running example

(11)

A running example

(12)

A running example

(13)

Guarantee

A

k

is a pretty good approximation to A:

Relative distances are (approximately) preserved

Of all m  n matrices of rank k, Ak is the best approximation to A wrt the following measures:

minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = k

minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = k2k+22r2

Frobenius norm ||A||

F2

= 

2



2



r2

(14)

Reduction

Since we are interested in doc/doc correlation, we consider:

D=At A =(U SV t)t (U SV t) = (SV t)t (SV t)

Hence X = S Vt is a matrix r x n, may play the role of A

To reduce its size we set Xk = Sk Vt is a matrix k x n and thus get At A  Xkt Xk (both are n x n matrices)

We use Xk to define how to project A:

Since Xk = Sk Vk t  Xk Ukt A (use def of SVD of A)

Since Xk may play role of A, its cols are proj. docs

Similarly Q can be interpreted as a new col of A and thus it is enough to multiply Ukt times Q to get the projected query, O(km) time

U,V are formed by

orthonormal eigenvectors of the matrices D,T

(15)

Which are the concepts ?

c-th concept = c-th col of Uk (which is m x k)

Uk[i][c] = strength of association between c-th concept and i-th term

Vtk[c][j] = strength of association between c-th concept and j-th document

Projected document: d’ j = Utk dj

d ’ j [c] = strenght of concept c in dj

Projected query: q’ = Utk q

q ’ [c] = strenght of concept c in q

(16)

Random Projections

Paolo Ferragina

Dipartimento di Informatica Università di Pisa

Slides only !

(17)

An interesting math result

f() is called JL-embedding

Setting v=0 we also get a bound on f(u)’s stretching!!!

Lemma (Johnson-Linderstrauss, ‘82)

Let P be a set of n distinct points in m-dimensions.

Given  > 0, there exists a function f : P  IRk such that for every pair of points u,v in P it holds:

(1 - ) ||u - v||2 ≤ ||f(u) – f(v)||2 ≤ (1 + ) ||u-v||2 Where k = O(-2 log n)

(18)

What about the cosine-distance ?

f(u)’s, f(v)’s stretching

substituting formula above for ||u-v||2

(19)

How to compute a JL-embedding?

E[pi,j] = 0 Var[pi,j] = 1

If we set the projection matrix P = pi,j as a random m x k matrix, where its components are independent

random variables with one of the following two distributions:

2

(20)

Finally...

 Random projections hide large constants

 k  (1/)2 * log n, so k may be large…

 it is simple and fast to compute

 LSI is intuitive and may scale to any k

 optimal under various metrics

 but costly to compute, do exist good libraries

Riferimenti

Documenti correlati

Docs Italia avrà un motore di ricerca che permette di navigare tra i documenti, sempre più importante mano a mano cresce la mole di testi pubblicati sulla piattaforma.

is to evaluate dose delivery of different VMAT plans such as Head and Neck (SIB: Simultaneously Integrated Boost), lung (SBRT: Stereotactic Body Radiation Therapy) and

 Pre-process docs using a technique from linear algebra called Singular

In [120], a non-stationary version of the cubic B-spline scheme is proposed with subdivision rules in (7.11), where the correspondent limit stencil in (7.12) is computed using

(b) Find the cartesian equation of the path of the point, sketch it and indicate the direction of

STANDARD n-DIMENSIONAL EUCLIDEAN SPACES: vector algebra, dot product, norm and distance, orthogonal projection, orthogonality, angles, spanning sets, linear in- dependence,

Environmental systems show a complex behavior evolving over many timescales: the problem of disentangling a mixture of several dynamical pattern may be based on Singular

The 1-month lead time SPEIs forecasting capability of W-BS-SVR model was compared with the Support Vector Regression (SVR) and Boosting-Support Vector Regression (BS-SVR) models