Classification
Paolo Camurati and Stefano Quer
Dipartimento di Automatica e Informatica Politecnico di Torino
On the importance of sorting
On an average application 30% of CPU time is spent on sorting data
Example
CPU scheduling
Processes pi with duration
● p0 21
● p1 3
● p2 1
● p3 2
Impact of sorting on average wait time
On the importance of sorting
average wait time (0+21+24+25)/4 =17.5
average wait time (0+3+5+26)/4 =8.5
average wait time (0+1+3+6)/4 =2.5
0 3 5 26 27
p0
p1 p3 p2
0 3 6 27
p0 p1
p2p3
1
0 21 24 25 27
p1 p2 p3 p0
p0-p1-p2-p3
p1-p3-p0-p2
p2-p3-p1-p0
Sorted
Sorting applications
Trivial applications
Sorting a list of names, organizing an MP3 library, displaying Google PageRank results, etc.
Simple problems if data are sorted
Find the median, binary search in a database, find duplicates in a mailing list, etc.
Non trivial applications
Data compression, computer graphics (e.g., convex hull), computational biology, etc.
Definitions
Sorting
Input
Symbols belonging to a set having an order relation
a1, a2, …, an
Output
Permutation of the input symbols
● a1’, a2’, …, an’
Such that the order relation
● a1’ ≤ a2’ ≤ …≤ an’
holds
Definitions
Order relation
≤ Binary relation between elements of a set A satisfying the following properties
Reflexivity
● ∀ x ∈ A x ≤ x
Antisymmetry
● ∀ x, y ∈ A x ≤ y ∧ y ≤ x x = y
Transitivity
● ∀ x, y, z ∈ A x ≤ y ∧ y ≤ z x ≤ z
A is a partially ordered set (poset)
If relation
≤holds
∀x, y
∈A, A is totally ordered
set
Classification
Internal sorting
Data are in main memory
Direct access to elements
External sorting
Data are on mass memory
Completely or at least partially
Sequential access to elements
Classification
In place sorting
n data in array plus a constant number of auxiliary memory locations
Stable sorting
For data with duplicated keys the relative ordering is unchanged
Example
Record with 2 keys
● Name (key is a string)
● Group (key is an integer)
Stability: An example
Chiara 3 Barbara 4 Andrea 3 Roberto 2 Giada 4 Franco 1 Lucia 3 Fabio 3
Unsorted data
Second sorting according to group
NON stable algorithm
Second sorting according to
group
stable algorithm First sorting
according to string name
Andrea 3 Barbara 4 Chiara 3 Fabio 3 Franco 1 Giada 4 Lucia 3 Roberto 2
Franco 1 Roberto 2 Chiara 3 Fabio 3 Andrea 3 Lucia 3 Giada 4 Barbara 4
Franco 1 Roberto 2 Andrea 3 Chiara 3 Fabio 3 Lucia 3 Barbara 4 Giada 4
Classification
Complexity
Complexity can be computed in terms of overall number of steps, each one with a constant cost
A more detailed analysis is also possible, computing the total number of
Comparisons
Exchange operations
When data is large, exchanging them may be expensive and may be better to have more comparisons and less exchange operations
Asymptotic complexity however does not change
Classification
O(n2)
Simple, iterative, based on comparison
Insertion sort, Selection sort, Exchange (Bubble) sort
O(n3/2)
Shellsort (with certain sequences)
O(n · log n)
More complex, recursive, based on comparison
Merge sort, Quicksort, Heapsort
O(n)
Applicable with restrictions on data, based on computation
Counting sort, Radix sort, Bin/Bucket sort
A lower bound for the complexity
Algorithms
based on comparison Elementary operation
Comparison
● ai : aj
Outcome
Decision
● ai>aj or ai≤aj
Decisions organized as a decision tree
Sort array of 3 distinct elements a
1, a
2, a
3a1:a2
a2:a3 a1:a3
a1<a2<a3 a1:a3 a2:a3
a3<a1<a2 a3<a2<a1
a2<a1<a3
a1<a3<a2 a2<a3<a1
>
>
>
>
>
<
<
<
<
<
A lower bound for the complexity
n = 3
for the sake of simplicity
For n distinct integers
The number of possible sortings equals the number of permutations, i.e., is n!
Each solution
Sits on a tree leaf
Complexity
Number h of comparisons, that is, the tree height h
For a complete tree
The number of leaves is 2h
a1:a2
a2:a3 a1:a3
a1<a2<a3 a1:a3 a2:a3
a3<a1<a2 a3<a2<a1 a2<a1<a3
a1<a3<a2 a2<a3<a1
>
>
>
>
>
<
<
<
<
<
A lower bound for the complexity
Then we must have
2h ≥ n!
The tree has to include (generate) all possibile sorting
on the leaves
a1:a2
a2:a3 a1:a3
a1<a2<a3 a1:a3 a2:a3
a3<a1<a2 a3<a2<a1 a2<a1<a3
a1<a3<a2 a2<a3<a1
>
>
>
>
>
<
<
<
<
<
A lower bound for the complexity
Then we have
2h ≥ n!
2h ≥ n! > (n/e)n
2h > (n/e)n
lg 2h >lg (n/e)n
h > n · lg (n/e)
h > n (lg n - lg e) = Ω(n lg n)
a1:a2
a2:a3 a1:a3
a1<a2<a3 a1:a3 a2:a3
a3<a1<a2 a3<a2<a1 a2<a1<a3
a1<a3<a2 a2<a3<a1
>
>
>
>
>
<
<
<
<
<
Stirling’s approximation
n! > (n/e)n
We compute the log of both members
A lower bound for the complexity
Log property