• Non ci sono risultati.

Estimating the Frequency of Data Items in Massive Distributed Streams

N/A
N/A
Protected

Condividi "Estimating the Frequency of Data Items in Massive Distributed Streams"

Copied!
8
0
0

Testo completo

(1)

Estimating the Frequency of Data Items in Massive Distributed Streams

(2)

(i)

u

0

h∈H

M10

j

j

j

v∈N

vk

v

0

1

0

0

0

1

1

(3)

1

2

0

1

1

2

1

2

3

1

j

2

4

1

j

j

5

i

6

i

7

1

8

j

9

j

j

10

j

j

11

j

12

j

j

0

0

0

j

13

j

j

14

0

1≤j≤r

ξj

j

j

j

21i

0

0

0

2

2

v(SS)

v

v(SS)

v

v(CM)

1

t

v(CM)

i

v(CM)

v(CM)

v

v

v

v

1

2

3

v

3

u

u

u

u

v

2

u

u

u

u

u

u

u

(4)

1

2

3

4

1

t

5

6

7

1

8

9

i

i

10

2

11

12

3

13

0

2

14

u

15

u(CM)

1≤i≤t

i

16

u(CM)

u(CM)

17

u(CM)

0

u

v

ε2k2m

v

i

i

i

u,i

i

0

u,i

0

v∈N

u

{hi(u)=hi(v)}

{A}

i

i

i

u,i

0

v∈N

v

{hi(u)=hi(v)}

0

0

u,i

u,i2

u,i

2

2

02

v∈N

v2

2{h

i(u)=hi(v)}

2

02

02

v∈N

v2

2

v

ε2k2m

v∈N

v2

ε22km2

0

u,i

02

2

2

2

0

u,i

2

2

u,i

u,i

u,i

u,i

2

u,i

2

u,i

u

u

u

i∈{1,...,t}

u,i

0

0

i∈{1,...,t}

u,i

u,i

u,i

t

u

u

mb

u

u

u

u

u

u

u

i

u,i

u,i

v∈N \{u}

v

{hi(v)=hi(u)}

u

u

i∈{1,...,t}

u,i

u,i

v∈N \{u}

v

{hi(v)=hi(u)}

u

u

u,i

2(b−1)

ε

u,i

u

u,i

u

u,i

u

u

u

u

i∈{1,...,t}

u,i

u

i∈{1,...,t}

u,i

u

t

algorithm combines the features of both the streaming model and

(5)

Input stream !(3)

Cloud

CASE C B

Node 3

Input stream !(1) Input stream !(6)

Node 6

Node 1 Node 2

Node 7 Node 4 Node 5

Input stream !(4) Input stream !(5)

CASE C B

CASE C B

CASE C B

CASE C B

CASE C B CASE

C B

Input stream !(2)

Input stream !(7) F0

F0 F0

F0

F0

F0 F0

i

(i)

i

(i)

i

j

i

i∈S

(i)

3

v

DHT

DHT

3

v

v(SS)

v

v

3

(i)

v(CM)

v

v(SS)

v(CM)

1

`

j

q

j

q

1

t

(6)

k

−1

−4

−1

−4

−1

−4

k

0

`

k

1

k

`−1

k

`−1

k−1

r=0

r

`−1

k

k

k

`

i=k

k

k

k

0

n

i=1

i

i

2

0

0

0

0

0

results obtained with the algorithm proposed in [4], providing a

(7)

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

0 200 400 600 800 1000

Item frequency

Item identifier

Exact Distribution CM [Same precision]

CM [Same memory]

EV [Same precision]

CASE [F0 known]

CASE [F0 unknown]

(a) Uniform distribution

10 100 1000 10000 100000

0 200 400 600 800 1000

Item frequency

Item identifier

Exact Distribution CM [Same precision]

CM [Same memory]

EV [Same precision]

CASE [F0 known]

CASE [F0 unknown]

(b) Zipfian-like distribution – α = 1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

0 200 400 600 800 1000

Item frequency

Item identifier

Exact Distribution CM [Same precision]

CM [Same memory]

EV [Same precision]

CASE [F0 known]

CASE [F0 unknown]

(c) Truncated Poisson distribution

10 100 1000 10000 100000

0 200 400 600 800 1000

Item frequency

Item identifier

Exact Distribution CM [Same precision]

CM [Same memory]

EV [Same precision]

CASE [F0 known]

CASE [F0 unknown]

0

0

ing an algorithm that guarantees an relative (ε, δ)-approximation of

(8)

0 1x106 2x106 3x106 4x106 5x106 6x106 7x106 8x106

Distance with Exact distribution

Distribution CM [Same precision]

CM [Same memory]

CASE [F0 known]

CASE [F0 unknown]

(a) CM and CASE are fed with real datasets Setting: ε = 0.01 and δ = 0.1

1 10 100 1000 10000 100000

100 1000 10000

Distance with Exact distribution

Value of F0 CM [Same precision]

CM [Same memory]

CASE [F0 known]

CASE [F0 unknown]

(b) ED as a function of the number of distinct items Setting: m = 100, 000; ε = 0.1; δ = 0.1.

10 100 1000 10000

10000 100000 1x106

Distance with Exact distribution

Value of F1

CM [Same precision]

CM [Same memory]

CASE [F0 known]

CASE [F0 unknown]

0

= 1, 000; ε = 0.1; δ = 0.1.

1 10 100 1000 10000 100000

1 10 100

Distance with Exact distribution

Value of k

CM [Same precision]

CM [Same memory]

CASE [F0 known]

CASE [F0 unknown]

0

no. 1, pp. 126–137, 1996.

Riferimenti

Documenti correlati

During each iteration, the algorithm associates each column (resp. row) to the nearest column (resp. row) cluster which does not introduce any cannot-link violation. row) is involved

Le azioni intraprese hanno permesso in primo luogo una comprensione approfondita delle problematiche inerenti le dinamiche operative del Ce.Di; successivamente

The quality of the present data prevents us from firmly establishing if the observed spectral variations were due to a change in the ionization state of the circumnuclear medium

Studies included in the analysis relative to in situ L. monocytogenes inactivation in fermented sausages. Range a/a Reference No. monocytogenes strain Type of fermented sausage

mento e la scelta della ragione piutto- sto che dell’istinto, della semplicità piuttosto che dell’accumulo, della me- moria piuttosto che dell’oblio, della coerenza piuttosto

The results on the small-sized instances p01-p03, which feature up to 20 delivery and pick-up clients, show the effectiveness of the DP engine for the Pricing Problem, and in

A lymph node ratio of 0.26 or higher was identified as independent risk factor of lateral neck recurrence in a multicentric study on 211 patients with PTC who underwent

Literature Studies Wisdom Poured.. Out