Something about QuickSort & QuickSelect

QuickSort algorithm was the very first algorithm that impressed me when I was tapping water in computer science. I still remember when I was reading “Introduction to Algorithms”, it is the first algorithm that made me stop and take some time to go through the logic. Now that I have learned a lot about algorithm, I start to rethink about the algorithm and what can be done with it.

The algorithm was developed by Tony Hoare in 1959, and it is also called partition-exchange sort. Although the algorithm is also a comparison algorithm, it can be two or three times faster than MergeSort or HeapSort if it is implemented well. Despite the bad performance (O(n^2)) in worst case, because it is very unlikely, the algorithm has a average time complexity of O(n log n).  The logic of the algorithm is not that complicated:

  1. Pick a pivot.
  2. Partitioning.
  3. Recursion.

In the early version of the algorithm, the pivot is chosen as the leftmost element, which will lead to worst case if the array is already sorted. It can also be chosen as a random number in the array, but it cannot be the optimal strategy. Sedgewick suggested a “median-of-three” rule, which is to pick the median of the first, middle, and last element as the pivot. The rule is an improvement because it uses the information of the current status of the array, and it counters the sorted input dilemma. He also developed a refined version of quick sort, which is called sample sort.

The logic of quick sort written in pseudo-code:

Screen Shot 2017-07-28 at 7.23.24 PM.png

The main method is divide and conquer, and the partition method uses double pointers. Pointer i stays where the elements to the left of it are all smaller than the pivot, and pointer j checks through all elements in the array. The swapping will happen when pointer j found an element that is smaller than the pivot, which should be counted by pointer i. Pointer i starts off from the left of the very first element, and it moves to the right 1 index by 1 index when we find small elements. In this scheme, which is also called Lomuto partition scheme, two pointers are running into the same direction. However, it is not as efficient as Hoare’s original scheme. A lot of crazy stuffs can happen with this scheme. For instance, if the array comprises all equal elements, the time complexity will be O(n^2), but it achieves nothing. The same problem also stands for the case when the array is sorted already.

If two points start from different side of the array and meet up in the middle point of the array, the problem of the equal elements will be solved, which leads to less useless swapping.  To express in pseudo-code:

Screen Shot 2017-07-28 at 7.38.23 PM.png

It is obvious that there will be at most 1 element that will be visited by both pointers. Recalling the Lomuto partition scheme, all elements smaller than pivot are visited by both pointers.

Another algorithm developed by Tony Hoare is the quick select algorithm. The overall approach is similar to quick sort, which is to choose a pivot and partition the array. The difference is that in quick select algorithm, we don’t need to care about both sides of the pivots. We only care about the range in which our searching target exists. As a result, the average complexity can be reduced from O(n log n) to O(n), although the worst case still has O(n^2). The algorithm to find the k-th smallest element can be expressed in pseudo-code as below:

Screen Shot 2017-07-29 at 2.13.26 PM.png

What if we want the k-th largest element? Well, it’s just the (n-k)-th smallest element. Below is an implementation written in Java.

Screen Shot 2017-07-29 at 2.37.53 PM.png

Now let’s think about the optimization of the quick sort algorithm. In order to secure the O(n log n) time complexity, we need to recurse first into the smaller side of the partition, then use a tail call to recurse into the other. Also, when n is significantly small, the advantage o using this algorithm is no longer substantial. Insertion sort can be a better choice with its O(kn) time complexity if we name the threshold k. In spite of time complexity, using such optimization technique can also save cache memories.


Arthur Zhang View All →

A current master student in WUSTL, department of Electrical and System Engineering.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: