Checked and unsigned integer operations


#1

Ideally CPU's would have hardware traps for integer overflow, but in practice no mainstream architectures do. Integer overflow can cause all kinds of weird problems. Additionally, Java simplifies over C/C++ by having only one kind of integer (signed) but at the cost of making it harder to write code that has to work with unsigned amounts. This comes up a lot when dealing with binary file formats and networking protocols.

When I think about the worst bugs that I’ve written/encountered in recent years, lack of unsigned types and unexpected integer overflows crop up a lot because they can lead to silent failure in edge cases that unit tests may not always exercise.

It would be nice if Kotlin could improve on this state of affairs. I’m sure you must have already considered this topic carefully, as you clearly examined all of the Java puzzlers and tried to fix as many of them as possible. So I’m curious as to whether Kotlin’s current approach (same as Java minus auto-widening) is here for the long term, or whether Kotlin would consider making the default arithmetic operations front to something like Guava’s IntMath/LongMath classes.

The new Swift language from Apple appears to use checked math by default, with special new operators like &+ for unchecked addition, the idea being of course that developers can manually take away the checking if profiling shows it to be a hotspot and the developer is sure the additional safety is unnecessary. I don’t think Kotlin would need new syntax: simply redefining the existing operators and having infix functions for “uncheckedAdd”, “uncheckedDivide” etc would be sufficient.

W.R.T. unsigned types, I guess that can be handled in the standard library by having UByte, UInt, ULong and so on where the standard operators are implemented by something like Guava’s UnsignedInts, UnsignedLongs class. Is that right?


#2

1. Unsigned types: not in the near future. You can create library types like ULong etc yourself if you really need them, but be sure that they will be slow because of the necessary boxing (this is why we don't add them ourselves)

  1. Exceptions on arithmetic overflow: this will likely make arithmetics significantly slower, and we don’t see how to avoid it without changes to the JVM, nor are we ready to accept the slowdown

#3

Do you know it'd be unacceptably slow or is that a hunch? For apps that are very maths heavy (3D graphics etc), you are probably correct. For many other apps that use arithmetic only as part of their business logic it's probably insignificant. Especially the case if things like for loops are optimised properly.

My guess is that for an app like IntelliJ it probably wouldn’t make much difference as the app spends most of its time manipulating data structures rather than doing lots of arithmetic.


#4

Which point are you talking about? If 1 then yes, it will be unacceptably slow, if 2, then it's a hunch, but we don't want a show-stopper for computation-intensive apps anyways. Overflow bugs are very rare in business logic, so it doesn't pay off to rule them out at the expense of computation-intensive apps


#5

Right, I agree boxing every number would be problematic, although Guava's unsigned support works with primitives. With inline functions the cost hit could be not terrible, if it was possible to avoid the boxing.

I’m not suggesting ruling things out. As mentioned, Swift gives you “fast but less safe” operators that don’t check for overflow, suitable for use in matrix mult routines and the like.

I cannot offer any data on how often overflow bugs crop up in various apps. Perhaps someone has studied it, I’ll have a look around. For one pretty disasterous example, check out a bug that occurred in Bitcoin in 2010. Someone discovered a place where value fields in the protocol were being summed without checking for overflow. The result was someone managed to create billions of bitcoins, more than should ever have existed. The entire system had to be rolled back and replayed without the bad transaction. Overflow bugs in security sensitive software can be quite damaging, but hard to find in the normal course of operations and testing.


#6

There are very many kinds of bugs that may be ruled out by making people's lives inconvenient to some extent. In this case the tradeoff seems not worth it


#7

In any case, if you come across any data on overflow bugs, it would be very welcome. Thanks in advance


#8

Most of the papers I found so far seem to be C/C++ specific, which makes sense as overflow is so much more devastating there. The easiest paper to find suggests that most codebases they checked had such bugs. Unfortunately quite a few integer overflow bugs detected in such codebases boil down to undefined behaviour allowed by C++ but misunderstood by developers, and then exploited by aggressive optimisations. They don't usually apply on the JVM.

One thing I did discover is that in Java 8, overflow checking operations are compiled using special hotspot intrinsics. For instance Math.addExact becomes just an “add” followed by “jo” instruction. Just one extra instruction! I don’t think it gets more optimal than that. With branch prediction it’s probably free. Most likely, cheaper than array bounds checking at least.

http://www.slideshare.net/iwanowww/whats-new-in-33765217

It might be fun to modify the kotlin compiler to always use Math.addExact and friends when targeting Java 8 then measure the difference. Unfortunately, I don’t know of any big Kotlin codebases yet which would be suitable for benchmarking. Perhaps when the Kotlin compiler itself is more self hosting it can be used for speed comparisons. Also I suspect Kotlin does not have any notion of targeting a particular Java release, it always is locked to version 6?

I’ll keep an eye out for studies/papers that are Java specific.


#9

Thanks for the info. We will have different targets at some point, maybe will integrate exact arithmetic in some form or other then