• Non ci sono risultati.

Binary Merge-Sort

N/A
N/A
Protected

Academic year: 2021

Condividi "Binary Merge-Sort"

Copied!
8
0
0

Testo completo

(1)

Binary Merge-Sort

Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2;

03 Merge-Sort(A,i,m);

04 Merge-Sort(A,m+1,j);

05 Merge(A,i,m,j) Merge-Sort(A,i,j)

01 if (i < j) then 02 m = (i+j)/2;

03 Merge-Sort(A,i,m);

04 Merge-Sort(A,m+1,j);

05 Merge(A,i,m,j)

Divide

Conquer

Combine

1 2 8 10 7 9 13 19 1 2 7

Merge is linear in the

#items to be merged

(2)

Few key observations

Items = (short) strings = atomic...

On english wikipedia, about 10

9

tokens to sort

(n log n) memory accesses (I/Os ??)

[5ms] * n log

2

n ≈ 3 years

In practice it is a “faster”, why?

(3)

Recursion

10 2

10 2

5 1

5 1

13 19 13 19

9 7

9 7

15 4

15 4

8 3

8 3

12 17 12 17

6 11

6 11 10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11 10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11

10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11

log

2

N

(4)

Implicit Caching…

10 2

2 10

5 1

1 5

13 19 13 19

9 7

7 9

15 4

4 15

8 3

3 8

12 17 12 17

6 11

6 11 1 2 5 10 7 9 13 19 3 4 8 15 6 11 12 17

1 2 5 7 9 10 13 19 3 4 6 8 11 12 15 17 1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19

log

2

N

N/M runs, each sorted in internal memory (no I/Os) M

2 passes (one Read/one Write) = 2 * (N/B) I/Os

— I/O-cost for binary merge-sort is ≈ 2 (N/B) log

2

(N/M) Lo g

2

( N /M )

2 passes (R/W)

2 passes (R/W)

(5)

B

A key inefficiency

1 2 4 7 9 10 13 19 3 5 6 8 11 12 15 17

B

After few steps, every run is longer than B !!!

B

We are using only 3 pages

But memory contains M/B pages ≈ 2

30

/2

15

= 2

15

B

Output

Buffer

Disk

1, 2, 3

1, 2, 3

Output Run

4, ...

(6)

Multi-way Merge-Sort

Sort N items with main-memory M and disk-pages B:

 Pass 1: Produce (N/M) sorted runs.

 Pass i: merge X = M/B-1 runs  log

X

N/M passes

Main memory buffers of B items

Pg for run1

Pg for run X

Out Pg

Disk Disk

Pg for run 2

. . . . .

.

. . .

(7)

How it works

1 2 5 10 7 9 13 19 1 2 5 7….

1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19

M

N/M runs, each sorted in internal memory = 2 (N/B) I/Os

2 passes (one Read/one Write) = 2 * (N/B) I/Os

— I/O-cost for X-way merge is ≈ 2 (N/B) I/Os per level Lo g

X

( N /M )

M

X

X

(8)

Cost of Multi-way Merge-Sort

Number of passes = log

X

N/M  log

M/B

(N/M)

Total I/O-cost is ( (N/B) log

M/B

N/M ) I/Os

Large fan-out (M/B) decreases #passes In practice

M/B ≈ 10

5

 #passes = 1  few mins

Tuning depends on disk features

Compression would decrease the cost of a pass!

Riferimenti

Documenti correlati

Un albero binario ` e una struttura di dati in cui c’` e un nodo radice ed ogni nodo ` e collegato a due nodi figli; i nodi in fondo alla gerarchia sono chiamati foglie. Uno heap

● Ogni volta che un elemento cambia padre, entra a far parte di un insieme con almeno il doppio di elementi di quello cui faceva parte. – Dopo k merge() fa parte di un insieme

1 Per ognuna delle n − 1 iterazione del ciclo esterno, può essere eseguito al massimo un confronto con esito false, cioè non seguito da uno scambio, dato che esso causa la

 A swap of &#34;far away&#34; elements may result in a duplicate key passing over to the left of a. preceding instance of the

change operator priority), it is possibile to build the corresponding tree according to the simplified grammar. &lt;exp&gt; = &lt;operand&gt; | &lt;exp&gt;

 Focusing of the task of sorting, the heap sort ordering algorithm, is implemented through 3 functions.  heapify

THEOREM 4.2 Permuting n items takes O(min{n, n B log M/B n/M}) I/Os in a two-level model in which the internal memory has size M and the disk page has size B. In what follows we

Supponendo che si parta con un mazzo già parzialmente ordinato, si prende la prossima carta e la si inserisce nel posto giusto tra le altre.. Inserita questa, si passa alla