## Quicksort in Java

Following on from last week’s implementation of quicksort in Python, we have a Java implementation to do the same thing. However, this version requires a bit more work, since Java is a lower level language than Python.

See, in programming, you’re working within a spectrum where at one end you have CPU-intensive code, and at the other is memory-intensive code. This is something that I’ve never been taught in class, but is an important tenet of efficient programming, which is really an art of finding a good balance between the two extremes that also minimizes each.

This quicksort is a good example of that, because of the way Java uses arrays of a fixed length. What this means is that, if you make your list 10 items long, and need to put 11 things in it, you will have to copy the entire list into a larger list that will accommodate 11 elements. You can’t just tack another item on to the end of the list, which makes life fun some times.

For example, let’s say that you have a 100-item list that you’re giving to this quicksort. How large do each of the three lists need to be so that none of them will ever be overfilled? Well, the obvious answer is 100 items. So you sort your original 100 items into the three 100-item lists, and then run the quicksort on them.

With the original list, you needed to sort into 100-item lists, because you knew that you had 100 “good”, (non-zero) items to sort. (The reason all the sub-lists have to be 100 items is because it is possible, if exceedingly unlikely, that all 100 items in the original list would end up in one list.)* But now, with each sub-list, it’s exceedingly likely that each of they don’t have 100 “good” elements. At any rate, what we have after the first iteration is three 100-item lists, which are, statistically, each only a third full. But we can’t bank on them each being 33 items, because it is possible that one could be 99 items as easily as 2 items.

Now we have the problem set up, so let’s look at it from the CPU-heavy and memory-heavy perspectives. The memory-heavy way would be to quite simply keep using 100-item lists, so that you’d always be sure that you’d never run out of slots in your lists. But what this will result in, eventually, is some lists that are 100 items long, but only holding 2 “good” items.

So how about CPU-heavy? In this approach, we would dimension each of the original sub-lists as being 100-items long where each item is the Java “null” variable. (We have to use “null” instead of zero because zero might actually be a meaningful statistic in the sort.) We would then pass the partially-filled sub-lists into the quicksort, which would then scan through the list of 100 items and find how many were “good”, and then dimension the lists to that number.

For example, let’s say the original 100 items gets split into three lists of 100 items, but let’s pretend that we’re like Java and don’t know how many good items are in each. So the first list gets quicksorted, lessThan. But before it starts actually sorting lessThan, the program would run through and count how many of those 100 entries weren’t “null”. Let’s say it gets that the number there is 23; it will then make each of the sub-lists 23 items long.

If you’ve stuck it out this long, here’s the good news: it’s about to get easier. Unsatisfied with either the CPU- or memory-heavy implementations, I decided there had to be a better way, and there is. See, both of these implementations require a lot of resources because they don’t know how many “good” items are in the list. But since the lists are being built at run-time, it is actually fairly trivial to keep track of how many “good” items are put in, by just incrementing a counter every time you put an item into one of the sub-lists.

*Writing this out, it occurs to me that if you choose one item as your pivot, then only the equalsTo sub-list could ever possibly contain 100% of the passed-in list. All others could be at most one less than that, which should be reflected in the list creation statements. It also occurs to me,

public static int[] quicksort(int[] inlist){ //Hack around lack of default parameters return quicksort(inlist, inlist.length); } private static int[] quicksort(int[] inlist, int length) { //This is our base case. If the list is one or shorter, spit it back out if (length <= 1){ return inlist; } //Our sub-lists int[] less = new int[length]; int[] equal = new int[length]; int[] greater = new int[length]; //Our counters to keep track of the number of "good" elements in each int ltsize = 0; int eqsize = 0; int gtsize = 0; //Take our pivot int pivot = inlist[0]; for(int i = 0; i < length; i++){ //Take every item in the original list //And put it in its rightful place in //one of the sub-lists if (inlist[i] < pivot){ less[ltsize] = inlist[i]; ltsize++; } else if (inlist[i] > pivot){ greater[gtsize] = inlist[i]; gtsize++; } else if (inlist[i] == pivot){ equal[eqsize] = inlist[i]; eqsize++; } } //Find the sorted versions of the lists less = quicksort(less, ltsize); //Equal doesn't need to be sorted greater = quicksort(greater, gtsize); //Our list to return int[] ret = new int[length]; //The index of the end of the list int retPointer = 0; //Put less into ret for(int i = 0; i < ltsize; i++){ if (less[i] > 0 ){ ret[retPointer] = less[i]; } retPointer++; } //Put equal into ret for (int i = 0; i < eqsize; i++){ if (equal[i] > 0) { ret[retPointer] = equal[i]; } retPointer++; } //Put greater into ret for (int i = 0; i < gtsize; i++){ if (greater[i] > 0) { ret[retPointer] = greater[i]; } retPointer++; } //We're done! return ret; }

Hi!

Do you have considered using simply a linked list instead of int[] ?

For the first round, I would do one round of bubble sort. This has 2 benefits!

1) You get to know fast, if the array is already sorted.

2) you can calculate the best pivot element without to much extra cost.

I would not recomment the bubblesort-round within the recursion, but this is question to an analysis.

The answer to your first question is: no. Java’s so big I often forget about such things being implemented natively. I just use the basic arrays because I know ‘em and they’re simple.

Really, the most CPU and memory efficient implementation would probably indeed use a linked list.

As to the bubble sort: it would be quicker and more efficient to just abort the search when you find an out of order item and begin with the quicksort. And as I understand Quicksort, there is no such thing as a “best” pivot. Everything gets sorted eventually.

I also think it would be reasonable to assume that a sorting algorithm would be used on unsorted data. If there was any question, it’s always possible for the program to test the data before passing it to the quicksort.

Hi, I have a lot to say (sorry), but I teach algorithms at uni so hear me out :)

It’s a pretty good idea to consider what the performance is if the list is already sorted. This situation comes up quite a bit if you have programs that are always sorting the data. (Especially important is the “almost sorted” case, which might arise if you have a sorted list, and add 1 item to the end, and then re-sort it — in this case Insertion Sort is way more efficient than Quicksort).

Having said that, using a bubble sort to check if it’s already sorted is a Bad Idea. If you do a full bubble sort, obviously it’ll average out to be much slower than Quicksort. The suggestion to do 1 pass of bubble sort doesn’t make much sense either. It means you’re doing a whole extra pass over the data just for the special case of it being already sorted. You don’t need to make the already-sorted case super fast, just make sure it doesn’t kill your sort (which it won’t in the case of quicksort). Also there’s no point using bubble to pick the pivot.

But there *is* such thing as worst and best pivot – the best pivot is the one which separates the data 50/50. The worst pivot is the minimum or maximum value.

Consider in your code what happens if the list is already sorted – then your pivot selection (intlist[0]) will get the minimum value. Then your quicksort will recurse N times, and take N^2 comparisons to complete – JUST AS BAD AS BUBBLE SORT. So that’s a bad pivot selection.

How to improve pivot selection? If you take a whole pass over the data to pick an optimal pivot (the “bubble sort” suggestion), then you’ll waste as much time picking it as if you had picked the worst possible pivot!! So you can’t just write code to pick the best one.

Ideally, just pick an element from the list at random, or pick the middle element of the list (picking the middle element from the list gives you best performance on an already-sorted list).

As Fabse pointed out, it’s not really necessary to use arrays and manually keep track of their length, as Java has builtin list objects that do that for you. (They keep track of both the full length (called “capacity”) and the length of “good” items (called “size”)). However I would _not_ use a LinkedList. That won’t perform as you expect. I’d use a Vector, and be sure to set its initial capacity to the size of the source list.

Also why do you have 3 lists (less, greater, equal). Consider how many elements are put into the “equal” list. If you have 100 items, roughly distributed from 1 to 100, and a pivot of 50, then there’ll be about 50 items in the “less” list, 50 items in the “greater” list, and 1 item in the “equal” list. So don’t do that. Just have a “less than or equal to” list and a “greater than” list.

Oh hey, and the discussion from the Python blog post about stable sorts:

1. The concept of a “variably stable” sort isn’t really legitimate. What you’re saying is that if your sort is unstable, you can make it stable by sorting on the values after you sort on keys. Well I could do that for any sort.

The fact is that this doesn’t make it stable. Stable doesn’t mean “when I’m done, it’s perfectly sorted”. It means that for sorting lists with duplicate keys, the elements which share keys remain in their original order (relative to each other). In other words, you don’t swap two elements unless you have to.

Quicksort is never stable. (The wiki article says “in efficient implementations” – this means that you can make quicksort stable, but then it won’t be quick!)

2. Why do we care about stable sorts? Well sometimes we do and sometimes we don’t. Usually when humans are looking at the data, they want it to be stable — simply because it *doesn’t make sense* if the computer reorders data for no reason.

Consider an Excel spreadsheet with a “Gender” column (M or F) and a “Name” column. Now I choose to sort by Gender. When I click sort, I want all the males at the top, then the females. But within those categories, I don’t want to see all the names jumbled up randomly. It would seem weird to me, as a human – why did the computer decide to scramble up all my names? I’d prefer it if it kept the original order except for splitting them into the 2 groups.

Even worse, if I click “sort” on already-sorted data, it would look stupid if it decided to jumble up the data even though it was already sorted.

This is why in some situations it is desirable to have a stable sort. So if I was implementing a spreadsheet, I’d do a merge sort (another algorithm that can be made stable, and is approximately as efficient as quicksort, with various different tradeoffs).

Well, in terms of space efficiency, the best way to go would definitely be to do an in-place quicksort. It’s a little more convoluted to implement, but that’s fine. Naturally the trade-off here is stability — the in-place quicksort, because it swaps things around a lot, makes it very difficult to be stable. But the bonus is that you use no additional space. If that’s the main concern, of course.

Concerning using a Vector — if you set the initial capacity to size of the source list you’re basically allocating an array in memory that is that size. It won’t _need_ to resize, and all you gain is adjusting bounds.

I really think that if stability isn’t a problem, in-place quicksort is the way to go. It really hurts me, conceptually, to allocate full-sized arrays for each side. Naturally it helps that you adjust that size at each recursive step, but still.

As a side note, it’s also a better idea to use ArrayList instead of Vector in general, since Vectors are synchronized and therefore out of step with the rest of Java Collections. ArrayLists can be synchronized using Collections.synchronizedCollection().

Usually, quicksort operates in place: instead of returning a different array and then copying their contents to a new array, you should operate over the elements of the original array.

If not, you will spend a lot of time simply copying information around.

Here’s another version, in less lines of code.

public void quicksort(int[] array, int first, int last) {

if (first >= last) return;

int pivot = array[(first + last) / 2];

int i1 = first, i2 = last;

while (i1 < i2) {

while (i1 < i2 && array[i1] <= pivot) i1++;

while (i1 pivot) i2–;

if (i1 < i2) {

Integer tmp = array[i1];

array[i1] = array[i2];

array[i2] = tmp;

}

}

quicksort(array, first, i1 – 1);

quicksort(array, i1, last);

}

public void quicksort(int[] array) {

quicksort(array, 0, array.length -1);

}

Notice that in practice, you will never use this. Use the available sort methods in classes java.util.Arrays and java.util.Collections.

FYI, your sort doesn’t actually work, e.g.

int[] arr2 = {2, 1};

quicksort(arr2);

System.out.printf(Arrays.toString(arr2)); // [2, 1]

Well, hello all you dzoners. I really do appreciate the comments.

I feel I should make a few points. The first is that this is by no means optimal. I just wanted to use it as a vehicle to discuss the Memory/CPU tradeoff. If I had put more time and read more than the Wikipedia article on the sort, I’m sure I could have come up with a more efficient version. This is not something that I would, for example, turn in to my professors.

About “variably-stable” that wasn’t describing any implementation of the sort, that was describing that the general specification of a quicksort meant that it could be implemented either way. It’s not inherently stable or unstable. (Perhaps this is true of all sorts, I’ve not studied many extensively. If you read the post about Python, you’ll see that I started learning this quicksort as a way to avoid learning a lot of sorts, for now.)

You guys are coders far and above my readers or myself, so I’m not surprised you guys picked me apart. I just hope that you boys had a good time.

And tj: the quicksort(), for better or worse, returns the sorted list. You’re not taking it’s return, so of course you’re getting abnormal output.

You most certainly must learn to take critics. They took time to analyze your code and comment on it, and you can’t even appreciate that?

Haven’t you benefited from them?

I have a stable version of quicksort that uses O(logN) extra space so it is essentially in-place. Sorting 30,000 elements takes about 5 times longer than ordinary quicksort but 120 times quicker than insertion sort. It is in VB but you can find it here:

http://www.codeproject.com/KB/recipes/StableQuickSort.aspx?display=Print

The same application also has an in-place version of merge sort that uses O(logN) extra space and runs in a similar time.

Craig