3.2. Comparison-based Sort

Selection sort, heap sort, insertion sort, shell sort.

Selection-based Sort

Selection-based sorting algorithms take the following steps:

For each key $A_i$ where $|A| = n$ and $i \in [n, 0)$ :
- Search the maximum key $A_m$ where $m \in [1, i)$ .
- Swap $A_i$ and $A_m$

The complexities differ by different search algorithms:

Algorithm

Compare

Swap

Selection Sort

$O(n)$

$O(n^2)$

$O(n)$

Heap Sort

$O(\log n)$

$O(n\log n)$

$O(n \log n)$

Selection Sort uses linear search to find the minimum (or maximum) key, whereas Heap Sort turns the input array into a heap, so the search complexity becomes $O(\log n)$ instead of $O(n)$ .

Can the search be faster than $O(\log n)$ ?

Selection Sort

Let us create the SelectionSort class inheriting AbstractSort:

public class SelectionSort<T extends Comparable<T>> extends AbstractSort<T> {
    public SelectionSort() {
        this(Comparator.naturalOrder());
    }

    public SelectionSort(Comparator<T> comparator) {
        super(comparator);
    }
}

Let us then override the sort() method:

@Override
public void sort(T[] array, final int beginIndex, final int endIndex) {
    for (int i = endIndex; i > beginIndex; i--) {
        int max = beginIndex;

        for (int j = beginIndex + 1; j < i; j++) {
            if (compareTo(array, j, max) > 0)
                max = j;
        }

        swap(array, max, i - 1);
    }
}

L3-12: $O(n^2)$
- L3: iterates all keys within the range $\rightarrow O(n)$ .
- L4-9: finds the index of the maximum key within the range $\rightarrow O(n)$ .
- L11: swaps the maximum key with the last key in the range $\rightarrow O(1)$ .

How does the sort() method work with Comparator.reverseOrder()?

Heap Sort

Let us create the HeapSort class inheriting AbstractSort:

public class HeapSort<T extends Comparable<T>> extends AbstractSort<T> {
    public HeapSort() {
        this(Comparator.naturalOrder());
    }

    public HeapSort(Comparator<T> comparator) {
        super(comparator);
    }
}

Before we override the sort() method, let us define the following helper methods:

private void sink(T[] array, int k, int beginIndex, int endIndex) {
    for (int i = getLeftChildIndex(beginIndex, k); i < endIndex; k = i, i = getLeftChildIndex(beginIndex, k)) {
        if (i + 1 < endIndex && compareTo(array, i, i + 1) < 0) i++;
        if (compareTo(array, k, i) >= 0) break;
        swap(array, k, i);
    }
}

private int getParentIndex(int beginIndex, int k) {
    return beginIndex + (k - beginIndex - 1) / 2;
}

private int getLeftChildIndex(int beginIndex, int k) {
    return beginIndex + 2 * (k - beginIndex) + 1;
}

L1-7: finds the right position of the k'th key by using the sink operation.
L9-11: finds the parent index of the k'th key given the beginning index.
L13-15: finds the left child index of the k'th key given the beginning index.

How is this sink() method different from the one in the BinaryHeap class?

Finally, we override the sort() method:

@Override
public void sort(T[] array, int beginIndex, int endIndex) {
    // heapify all elements
    for (int k = getParentIndex(beginIndex, endIndex - 1); k >= beginIndex; k--)
        sink(array, k, beginIndex, endIndex);

    // swap the endIndex element with the root element and sink it
    while (endIndex > beginIndex + 1) {
        swap(array, beginIndex, --endIndex);
        sink(array, beginIndex, beginIndex, endIndex);
    }
}

L4-5: turns the input array into a heap $\rightarrow O(n \log n)$ :
- L4: iterates from the parent of the key in the ending index $\rightarrow O(n)$ .
- L5: sinks the key $\rightarrow O(\log n)$ .
L8-11: selection sort $\rightarrow O(n \log n)$ :
- L8: iterates all keys within the range $\rightarrow O(n)$ .
- L9: swaps the maximum key with the beginning key in the range $\rightarrow O(1)$ .
- L10: sinks to heapify $\rightarrow O(\log n)$ .

What is the worst-case scenario for Selection Sort and Heap Sort?

Insertion-based Sort

Insertion-based sorting algorithms take the following steps:

For each key $A_i$ where $|A| = n$ and $i \in [1, n)$ :
- Keep swapping $A_j$ and $A_i$ until $A_j \leq A_i$ .

The complexities differ by different sequences:

Algorithm

Sequence

Compare

Swap

Insertion Sort

Adjacent

$O(n^2)$

Shell Sort

Knuth

$O(n^{1.5})$

Insertion Sort

Let us create the InsertionSort class inheriting AbstractSort:

public class InsertionSort<T extends Comparable<T>> extends AbstractSort<T> {
    public InsertionSort() {
        this(Comparator.naturalOrder());
    }

    public InsertionSort(Comparator<T> comparator) {
        super(comparator);
    }
}

Let us then define an auxiliary method, sort():

protected void sort(T[] array, int beginIndex, int endIndex, final int h) {
    int begin_h = beginIndex + h;

    for (int i = begin_h; i < endIndex; i++)
        for (int j = i; begin_h <= j && compareTo(array, j, j - h) < 0; j -= h)
            swap(array, j, j - h);
}

L4: iterates keys in the input array $\rightarrow O(n)$ .
L5: compares keys in the sublist of the input array $\rightarrow O(\frac{n}{h})$ .
L6: swaps the two keys.

Given the auxiliary method, the sort() method can be defined as follows where h = 1:

@Override
public void sort(T[] array, int beginIndex, int endIndex) {
    sort(array, beginIndex, endIndex, 1);
}

How many swaps does Insertion Sort make for the following array? [7, 1, 2, 3, 4, 5, 6, 14, 8, 9, 10, 11, 12, 13, 0]

Shell Sort

Gap Sequence

Shell Sort groups keys whose gap is defined by a particular gap sequence:

Knuth: $(3^k - 1) / 2 \Rightarrow \{1, 4, 13, 40, 121, \ldots\}$
Hibbard: $2^k - 1 \Rightarrow \{1, 3, 7, 15, 31, 63, \ldots\}$
Pratt: $2^p \cdot 3^q \Rightarrow \{1, 2, 3, 4, 6, 8, 9, 12, \ldots\}$
Shell: $n / 2^k \Rightarrow \{500, 250, 125, \ldots\}$ , where $n = 1000$

For the above example, by using the Hibbard sequence, it first groups keys whose gap is 7:[7, 14, 0], [1, 8], [2, 9], [3, 10], [4, 11], [5, 12], [6, 13]

It then sorts each group using Insertion Sort, which results in the following array: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

The above procedure is repeated for gaps 3 and 1; however, the array is already sorted, so no more swapping is necessary.

Implementation

Let us create an abstract class, ShellSort, inheriting AbstractSort:

public abstract class ShellSort<T extends Comparable<T>> extends InsertionSort<T> {
    protected List<Integer> sequence;

    public ShellSort(Comparator<T> comparator) {
        super(comparator);
        sequence = new ArrayList<>();
        populateSequence(10000);
    }
}

L2: stores a particular gap sequence.
L7: pre-populate the gap sequence that can handle the input size up to 10000.

Then, let us define two abstract methods, populateSequence() and getSequenceStartIndex():

/**
 * Populates the gap sequence with respect to the size of the list.
 * @param n the size of the list to be sorted.
 */
protected abstract void populateSequence(int n);

/**
 * @param n the size of the list to be sorted.
 * @return the starting index of the sequence with respect to the size of the list.
 */
protected abstract int getSequenceStartIndex(int n);

L5: populates a particular sequence for the input size n.
L11: returns the index of the first gap to be used given the input size n.

Let us then override the sort() method:

@Override
 public void sort(T[] array, int beginIndex, int endIndex) {
     int n = endIndex - beginIndex;
     populateSequence(n);

     for (int i = getSequenceStartIndex(n); i >= 0; i--)
         sort(array, beginIndex, endIndex, sequence.get(i));
 }

L4: should not re-populate the sequence unless it has to.
L6: iterates the sequence $\rightarrow O(s)$ where $s$ is the number of gaps in the sequence.
L7: sorts the gap group by using the auxiliary method.

Knuth Sequence

Let us create the ShellSortKnuth class that populates the Knuth Sequence for ShellSort:

public class ShellSortKnuth<T extends Comparable<T>> extends ShellSort<T> {
    public ShellSortKnuth() {
        this(Comparator.naturalOrder());
    }

    public ShellSortKnuth(Comparator<T> comparator) {
        super(comparator);
    }
}

The two abstract methods can be overridden as follows:

@Override
protected void populateSequence(int n) {
    n /= 3;

    for (int t = sequence.size() + 1; ; t++) {
        int h = (int) ((Math.pow(3, t) - 1) / 2);
        if (h <= n) sequence.add(h);
        else break;
    }
}

@Override
protected int getSequenceStartIndex(int n) {
    int index = Collections.binarySearch(sequence, n / 3);
    if (index < 0) index = -(index + 1);
    if (index == sequence.size()) index--;
    return index;
}

L2: populates the Knuth sequence up to the gap $\leq \frac{n}{3}$ .
L13: returns the index of the first key $\leq \frac{n}{3}$ .

Why should we use $\frac{n}{3}$ as the largest gap in the sequence?

Demonstration

Unit Tests & Benchmarks

The codes for unit testing and benchmarking are available in SortTest:

Previous3.1. Abstraction Next3.3. Divide & Conquer Sort

Last updated 1 year ago