• Non ci sono risultati.

On an average application 30% of CPU time is spent on sorting data

N/A
N/A
Protected

Academic year: 2021

Condividi "On an average application 30% of CPU time is spent on sorting data "

Copied!
16
0
0

Testo completo

(1)

Classification

Paolo Camurati and Stefano Quer

Dipartimento di Automatica e Informatica Politecnico di Torino

(2)

On the importance of sorting

On an average application 30% of CPU time is spent on sorting data

Example

 CPU scheduling

Processes pi with duration

p0 21

p1 3

p2 1

p3 2

Impact of sorting on average wait time

(3)

On the importance of sorting

 average wait time (0+21+24+25)/4 =17.5

 average wait time (0+3+5+26)/4 =8.5

 average wait time (0+1+3+6)/4 =2.5

0 3 5 26 27

p0

p1 p3 p2

0 3 6 27

p0 p1

p2p3

1

0 21 24 25 27

p1 p2 p3 p0

 p0-p1-p2-p3

 p1-p3-p0-p2

 p2-p3-p1-p0

Sorted

(4)

Sorting applications

Trivial applications

 Sorting a list of names, organizing an MP3 library, displaying Google PageRank results, etc.

Simple problems if data are sorted

 Find the median, binary search in a database, find duplicates in a mailing list, etc.

Non trivial applications

 Data compression, computer graphics (e.g., convex hull), computational biology, etc.

(5)

Definitions

Sorting

 Input

Symbols belonging to a set having an order relation

a1, a2, …, an

 Output

Permutation of the input symbols

a1’, a2’, …, an

Such that the order relation

a1 a2 an

holds

(6)

Definitions

Order relation

 Binary relation between elements of a set A satisfying the following properties

Reflexivity

x A  x x

Antisymmetry

x, y A  x y y x  x = y

Transitivity

x, y, z A  x y y z  x z

A is a partially ordered set (poset)

If relation

holds

x, y

A, A is totally ordered

set

(7)

Classification

Internal sorting

 Data are in main memory

 Direct access to elements

External sorting

 Data are on mass memory

Completely or at least partially

 Sequential access to elements

(8)

Classification

In place sorting

 n data in array plus a constant number of auxiliary memory locations

Stable sorting

 For data with duplicated keys the relative ordering is unchanged

 Example

Record with 2 keys

Name (key is a string)

Group (key is an integer)

(9)

Stability: An example

Chiara 3 Barbara 4 Andrea 3 Roberto 2 Giada 4 Franco 1 Lucia 3 Fabio 3

Unsorted data

Second sorting according to group

NON stable algorithm

Second sorting according to

group

stable algorithm First sorting

according to string name

Andrea 3 Barbara 4 Chiara 3 Fabio 3 Franco 1 Giada 4 Lucia 3 Roberto 2

Franco 1 Roberto 2 Chiara 3 Fabio 3 Andrea 3 Lucia 3 Giada 4 Barbara 4

Franco 1 Roberto 2 Andrea 3 Chiara 3 Fabio 3 Lucia 3 Barbara 4 Giada 4

(10)

Classification

Complexity

 Complexity can be computed in terms of overall number of steps, each one with a constant cost

 A more detailed analysis is also possible, computing the total number of

Comparisons

Exchange operations

 When data is large, exchanging them may be expensive and may be better to have more comparisons and less exchange operations

 Asymptotic complexity however does not change

(11)

Classification

 O(n2)

Simple, iterative, based on comparison

Insertion sort, Selection sort, Exchange (Bubble) sort

 O(n3/2)

Shellsort (with certain sequences)

 O(n · log n)

More complex, recursive, based on comparison

Merge sort, Quicksort, Heapsort

 O(n)

Applicable with restrictions on data, based on computation

Counting sort, Radix sort, Bin/Bucket sort

(12)

A lower bound for the complexity

Algorithms

based on comparison

 Elementary operation

Comparison

ai : aj

 Outcome

Decision

ai>aj or aiaj

Decisions organized as a decision tree

(13)

 Sort array of 3 distinct elements a

1

, a

2

, a

3

a1:a2

a2:a3 a1:a3

a1<a2<a3 a1:a3 a2:a3

a3<a1<a2 a3<a2<a1

a2<a1<a3

a1<a3<a2 a2<a3<a1

>

>

>

>

>

<

<

<

<

<

A lower bound for the complexity

n = 3

for the sake of simplicity

(14)

 For n distinct integers

The number of possible sortings equals the number of permutations, i.e., is n!

 Each solution

Sits on a tree leaf

 Complexity

Number h of comparisons, that is, the tree height h

 For a complete tree

The number of leaves is 2h

a1:a2

a2:a3 a1:a3

a1<a2<a3 a1:a3 a2:a3

a3<a1<a2 a3<a2<a1 a2<a1<a3

a1<a3<a2 a2<a3<a1

>

>

>

>

>

<

<

<

<

<

A lower bound for the complexity

(15)

 Then we must have

2h n!

The tree has to include (generate) all possibile sorting

on the leaves

a1:a2

a2:a3 a1:a3

a1<a2<a3 a1:a3 a2:a3

a3<a1<a2 a3<a2<a1 a2<a1<a3

a1<a3<a2 a2<a3<a1

>

>

>

>

>

<

<

<

<

<

A lower bound for the complexity

(16)

 Then we have

2h n!

2h n! > (n/e)n

2h > (n/e)n

lg 2h >lg (n/e)n

h > n · lg (n/e)

h > n (lg n - lg e) = (n lg n)

a1:a2

a2:a3 a1:a3

a1<a2<a3 a1:a3 a2:a3

a3<a1<a2 a3<a2<a1 a2<a1<a3

a1<a3<a2 a2<a3<a1

>

>

>

>

>

<

<

<

<

<

Stirling’s approximation

n! > (n/e)n

We compute the log of both members

A lower bound for the complexity

Log property

Riferimenti

Documenti correlati

This paper aims to investigate the design managerial practices able to support Technology Epiphany, which are defined as the discovery of quiescent meanings in new or

suo, che da francofilo e pacifista animatore delle discussioni di Sohlberg si tramuta nel 1931 in nazista, diventando, fra le altre cose, diplomatico della Germania hitleriana

The cultural and methodological hegemony that draws it strength from this production does pose a specific problem for social scientists, especially in the field of the sociology

Through the adaptation to the Italian context of a brief scale deputed to measure well-being at school, the SRW, and its transformation using the Rasch analysis, our study aimed

In addition, these participants rate the job candidates on statements central to four theoretical mechanisms often related to the scarring effect of unemployment: general

In particular, when data are characterised by a low internal homogeneity (as in the Large businesses case) it is more difficult to build any type of tree. In general,

An exploratory analysis on the impact of guaranteed minimum income has been carried out after the approval and before the application using social media data. Twitter elements