Google Analytics

Search

To search for specific articles you can use advanced Google features. Go to www.google.com and enter "site:darrellgrainger.blogspot.com" before your search terms, e.g.

site:darrellgrainger.blogspot.com CSS selectors

will search for "CSS selectors" but only on my site.


Tuesday, February 23, 2010

Are most binary searches broken?

I was poking around the web and found an article by Joshua Bloch (you can see it here) about a software bug in the Java binary search back in 2006. Joshua found a number of implementations of binary search with the bug.

With binary search there is a line where you take the low point plus the high point then divide the sum by two. In code this would be:
int mid = (low + high) / 2;
Can you see the bug? There is no check for overflow. If low + high is greater than the maximum value for an int it will wrap around and become negative. I used to teach this concept as an odometer on a car or a clock.

Imagine you have a clock:


A regular clock has 60 increments (seconds). With something like int in Java the clock would have 4 billion increments. With this clock our number system goes from -5 to +6. Here is how it works, addition is clockwise and subtraction is counterclockwise. So if I have 2 + 1, I start at the number 2 and move 1 increment clockwise. This lands me at 3.

If I have 1 + 5, I start at 1 on the clock and move 5 increments clockwise to land at 6.

For subtraction, let's try 5 - 3. We start at 5 and move counterclockwise (subtraction) 3 increments. We land at 2. If you take something like 4 - 6 you get, start at 4 and move counterclockwise for 6 increments. This lands you at -2.

Now what happens if I take 4 + 5? The rule is start at 4 and go 5 increments clockwise. We land at -3. So in my  number system 4 + 5 = -3. This is obviously wrong and this is what can happen with the binary search. If low + high is greater than Integer.MAX_VALUE (2^31-1) the value will become negative. A negative number divided by two is still a negative number.

If the size of the array is greater than 2^30-1 then binary search will fail.

The funny thing is, I was teaching about the dangers of integer overflow ten years before Joshua Bloch found this bug and I never suspected something like this in modern programming language libraries.

The text for the course which tests about integer overflow does not deal with things like binary search. By the time my university teaches integer overflow we aren't dealing with anything as simple as binary search. Binary search is taught in first year, first course. Makes me wonder how many other places have fallen prey to the integer overflow bug.

By the way, one fix for the binary search algorithm is:
int mid = low + (high - low) / 2;
Since high is always greater than or equal to low, (high - low) will always result in a positive number or zero.

No comments: