Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Bucket sort, radix sort.
Unlike comparison-based algorithms in Sections 3.2 and 3.3, distribution-based sorting algorithms do not make comparisons between keys in the input array; instead, they distribute keys into ordered buckets and merge keys in the buckets.
Let us create an abstract class, AbstractBucketSort
inheriting AbstractSort
:
L2
: declares a list of buckets where each bucket is represented by a Deque
.
L6
: does not pass any comparator to AbstractSort
since no comparison is needed.
L7
: creates a pre-defined number of buckets.
What kind of input keys can work with distribution-based sorting algorithms?
We then define a helper method sort()
:
L7
: bucketIndex
is a function that takes a comparable key and returns the appropriate bucket index of the key.
L9-10
: adds the keys within the range in the input array to the buckets.
L13-16
: puts all keys in the buckets back to the input array while keeping the order.
How many assignments are made by the
sort()
method inBucketSort
?
Let us create the IntegerBucketSort
class inheriting BucketSort
that sorts integers within a specific range in ascending order:
L2
: takes the smallest integer in the range.
L8
: passes the number of buckets, max - min
, to be initialized to the super constructor.\
We then override the sort()
method:
L3
: passes a lambda expression that takes key
and returns key - MIN
indicating the index of the bucket that key
should be assigned to.
Is it possible to implement
DoubleBucketSort
that can handle double instead of integer keys?
Let us create an abstract class, RadixSort
, inheriting BucketSort<Integer>
:
L3
: creates 10
buckets to sort any range of integers.
L6
: returns the order of the most significant digit in the input array.
Let us create the LSDRadixSort
class inheriting RadixSort
that sorts the integer keys from the least significant digit (LSD) to the most significant digit (MSD):
L5
: iterates from LSD to MSD.
The followings compare runtime speeds between advanced sorting algorithms where the input arrays consist of random integer keys:
Selection sort, heap sort, insertion sort, shell sort.
Selection-based sorting algorithms take the following steps:
For each key where and :
Search the maximum key where .
Swap and
The complexities differ by different search algorithms:
Algorithm
Search
Compare
Swap
Selection Sort
Heap Sort
Selection Sort uses linear search to find the minimum (or maximum) key, whereas Heap Sort turns the input array into a heap, so the search complexity becomes instead of .
Can the search be faster than ?
Let us create the SelectionSort
class inheriting AbstractSort
:
Let us then override the sort()
method:
How does the
sort()
method work withComparator.reverseOrder()
?
Let us create the HeapSort
class inheriting AbstractSort
:
Before we override the sort()
method, let us define the following helper methods:
L1-7
: finds the right position of the k
'th key by using the sink operation.
L9-11
: finds the parent index of the k
'th key given the beginning index.
L13-15
: finds the left child index of the k
'th key given the beginning index.
How is this
sink()
method different from the one in theBinaryHeap
class?
Finally, we override the sort()
method:
What is the worst-case scenario for Selection Sort and Heap Sort?
Insertion-based sorting algorithms take the following steps:
The complexities differ by different sequences:
Algorithm
Sequence
Compare
Swap
Insertion Sort
Adjacent
Shell Sort
Knuth
Let us create the InsertionSort
class inheriting AbstractSort
:
Let us then define an auxiliary method, sort()
:
L6
: swaps the two keys.
Given the auxiliary method, the sort()
method can be defined as follows where h = 1
:
How many swaps does Insertion Sort make for the following array?
[7, 1, 2, 3, 4, 5, 6, 14, 8, 9, 10, 11, 12, 13, 0]
Shell Sort groups keys whose gap is defined by a particular gap sequence:
For the above example, by using the Hibbard sequence, it first groups keys whose gap is 7:[7, 14, 0], [1, 8], [2, 9], [3, 10], [4, 11], [5, 12], [6, 13]
It then sorts each group using Insertion Sort, which results in the following array:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
The above procedure is repeated for gaps 3 and 1; however, the array is already sorted, so no more swapping is necessary.
Let us create an abstract class, ShellSort
, inheriting AbstractSort
:
L2
: stores a particular gap sequence.
L7
: pre-populate the gap sequence that can handle the input size up to 10000
.
Then, let us define two abstract methods, populateSequence()
and getSequenceStartIndex()
:
L5
: populates a particular sequence for the input size n
.
L11
: returns the index of the first gap to be used given the input size n
.
Let us then override the sort()
method:
L4
: should not re-populate the sequence unless it has to.
L7
: sorts the gap group by using the auxiliary method.
Let us create the ShellSortKnuth
class that populates the Knuth Sequence for ShellSort
:
The two abstract methods can be overridden as follows:
The codes for unit testing and benchmarking are available in SortTest
:
This chapter discusses different types of sorting algorithms and benchmarks their performance in terms of comparisons and assignments.
Comparison-based Algorithms
Divide and Conquer Algorithms
Distribution-based Algorithms
The abstract class for all sorting algorithms.
Let us create an abstract class :
L2-4
: member fields:
comparator
: specifies the precedence of comparable keys.
comparisons
: counts the number of comparisons performed during sorting.
assignments
: counts the number of assignments performed during sorting.
The two-member fields, comparisons
and assignments
, are used to micro-benchmark sorting algorithms inheriting AbstractSort
.
Let us define three helper methods, compareTo()
, assign()
, and swap()
:
L7
: compares two keys in the array and increments the count.
L18
: assigns the value to the specific position in the array and increments the count.
L29
: swaps the values of the specific positions in the array by calling the assign()
method.
These helper methods allow us to analyze the exact counts of those operations performed by different sorting algorithms.
How is it different to benchmark runtime speeds from counting these two operations?
Finally, let us define the default sort()
that calls an overwritten abstract method, sort()
:
L5
: calls the abstract method sort()
that overwrites it.
L15
: sorts the array
in the range of (beginIndex
, endIndex
) as specified in comparator
.
When would it be useful to sort a specific range in the input array?
Quiz 3: Sorting Algorithms
Create the class under the package that inherits .
Override the populateSequence()
and getSequenceStartIndex()
methods such that it performs Shell Sort by using the Hibbard sequence: .
Feel free to use the code in .
Create the class under the package that inherits .
Override the sort()
method such that it performs Radix Sort from the most significant digit (MSD) to the least significant digit (LSD).
Feel free to use the code in .
Create the class under the test package.
Test the correctness of your TernaryHeapQuiz
using the method.
Add more tests for a more thorough assessment if necessary.
Compare runtime speeds between LSDRadixSort
and RadixSortQuiz
for random cases.
Create a PDF file quiz3.pdf and write a report that includes charts and explanations to compare runtime speeds between:
ShellSortKnuth
and ShellSortQuiz
.
LSDRadixSort
and RadixSortQuiz
.
Push everything under the sort package to your GitHub repository:
Submit quiz3.pdf to Canvas.
Merge sort, quick sort, intro sort
Divide the problem into sub-problems (recursively).
Conquer sub-problems, which effectively solve the super problem.
Why do people ever want to use QuickSort?
Divide the input array into two sub-arrays.
Sort each of the sub-arrays and merge them into the back.
Let us create the class inheriting AbstractSort
:
L2
: holds the copy of the input array.
Let us then override the sort()
method that calls the helper method:
L4-5
: increases the size of the temp
array if necessary.
L5
: unchecked type Object
to T[]
.
What is the advantage of declaring the member field
temp
?
The helper method can be defined as follows:
L11
: sorts the left sub-array.
L12
: sorts the right sub-array.
L13
: merges the left and right sub-arrays.
Finally, the merge()
method can be defined as follows:
L10-11
: copies the input array to the temporary array and counts the assignments.
L14-15
: no key left in the 1st half.
L16-17
: no key left in the 2nd half.
L18-19
: the 2nd key is greater than the 1st key.
L20-21
: the 1st key is greater than or equal to the 2nd key.
Pick a pivot key in the input array.
Exchange keys between the left and right partitions such that all keys in the left and right partitions are smaller or bigger than the pivot key, respectively.
Repeat this procedure in each partition, recursively.
Let us then override the sort()
method:
L3
: stops when the pointers are crossed.
L6
: sorts the left partition.
L7
: sorts the right partition.
The partition()
method can be defined as follows:
L5
: finds the left pointer where endIndex
> fst
> pivot
.
L6
: finds the right pointer where beginIndex
< snd
< pivot
.
L7
: the left and right pointers are crossed.
L8
: swaps between keys in the left and right partitions.
L11
: swaps the keys in the beginIndex
and pivot
.
The programming design in L5
and L6
are not ideal since the while loops do not include anybody that can confuse other programmers.
How can we determine if Quick Sort is meeting the worse-case?
Let us define the IntroSoft
class inheriting QuickSort
:
L2
: declares a sorting algorithm to handle the worst cases.
L14
: resets the counters for both the main and auxiliary sorting algorithms.
We then override the sort() method by passing the maximum depth to the auxiliary method:
Finally, the auxiliary method sort()
can be defined as follows:
L4-5
: switches to the other sorting algorithm if the depth of the partitioning exceeds the max depth.
The following shows runtime speeds, assignment costs, and comparison costs between several sorting algorithms for the random, best, and worst cases.
This section explains the homework assignment regarding Hybrid Sort.
Your task is to write a program that sorts a 2D-array of comparable keys into an 1D-array where all the keys are sorted in ascending order. Here is a sample input to your program:
Here is the expected output given the sample input:
Each row in the input array has one of the following properties:
Sorted in ascending order (e.g., the 1st row).
Sorted in descending order (e.g., the 2nd row).
Mostly sorted in ascending order (e.g., the 3rd row).
Mostly sorted in descending order (e.g., the 4th row).
Randomly distributed (e.g., the 5th row).
The easiest way is to merge all rows in the input array into an 1D-array then sort it using . The implementation of this strategy is provided to establish the baseline: . Your goal is to design a sorting mechanism that gives a noticeable speed-up over this baseline.
Try different input arrays; you are responsible for the robustness of your program.
Try different configurations for speed comparison (e.g., row size, column size, shuffle ratio).
Write the report hw1.pdf describing the logic behind your mechanism and the speed comparison against the baseline.
Submit hw1.pdf to Canvas.
You are not allowed to:
Use any implementation that is not written by you nor provided by this course.
Call any sorting methods built-in Java or 3rd-party libraries.
Merge all rows first and apply a fancy sorting algorithm to the 1D array as the baseline does.
The input matrix can:
Contain an arbitrary number of rows, where the size of each row is also arbitrary.
Baseline: 20697
Cannot run: 7
Extremely slow: 6
Code not found: 1
L7
: the lambda expression returns the bucket index, the value in the 'th significant digit.
L3-12
:
L3
: iterates all keys within the range .
L4-9
: finds the index of the maximum key within the range .
L11
: swaps the maximum key with the last key in the range .
L4-5
: turns the input array into a heap :
L4
: iterates from the parent of the key in the ending index .
L5
: sinks the key .
L8-11
: selection sort :
L8
: iterates all keys within the range .
L9
: swaps the maximum key with the beginning key in the range .
L10
: sinks to heapify .
For each key where and :
Keep swapping and until .
L4
: iterates keys in the input array .
L5
: compares keys in the sublist of the input array .
Knuth:
Hibbard:
Pratt:
Shell: , where
L6
: iterates the sequence where is the number of gaps in the sequence.
L2
: populates the Knuth sequence up to the gap .
L13
: returns the index of the first key .
Why should we use as the largest gap in the sequence?
Compare runtime speeds between ShellSortKnuth
and ShellSortQuiz
for random, ascending, and descending cases using the method.
Main:
Test:
How many assignments are made for the number of keys by the above version of MergeSort?
Let us create the class inheriting AbstractSort
:
Although Quick Sort is the fastest on average, the worst-case complexity is .
sorting algorithms that guarantee faster worst-case complexities than Quick Sort: Quick Sort for random cases and a different algorithm for the worst case.
L9
: returns as the maximum depth.
Does the max-depth need to be set to ?
Create a class inheriting under the package .
Override the sort()
method and evaluate your program for robustness and runtime speeds using :
Commit and push everything under the package.
Include any type of the keys that are .
Complexity
MergeSort
TimSort
QuickSort
IntroSort
Best
Worst
Average