When does bit-fiddling come to Kotlin?


#1

tl;dr We understand that a number of use cases related to bit-fiddling/hex number are painful in Kotlin, and we are working on it, but it’s a bit more complicated than one might think, and that’s why it’s taking us so much time.

Kotlin 1.1 will ship a (small) number of new bitwise operations in the Standard Library, namely and / or / xor / inv for Byte and Short. These may seem straightforward wrt API design: byte bitwise-or another byte gives a byte, right? But we moved these functions to kotlin.experimental package, which means that we are not sure that this API will remain this way. This package is not imported by default, so you’ll need to import it in order to use these operations:

import kotlin.experimental.*

Don’t get me wrong: Kotlin will have bitwise operations for all necessary types (Ints and Longs are already covered, so it’s mostly Bytes and Shorts we are talking about here), we just don’t know whether it’s right to have them return Byte/Short or Int.

Why we can’t decide just now

The problem with this particular case (and/or/xor/inv) is that it’s only the top of a fairly large iceberg. There are following interconnected things that have to be added to Kotlin at some point, but we can only design them all together:

  • bitwise operations
  • unsigned numeric types (UByte, UShort, UInt, ULong)
  • large hexadecimal constants (0xFFFFFFFF)
  • (surprise) value types

The current master plan is:

  • implement unsigned numeric types as value types (represented as signed numbers under the hood)
  • have large hex constants have unsigned types (i.e. 0xFFFFFFFF is not a negative Int, but an unsigned Int, i.e. UInt)
  • bitwise operations must be available on both signed and unsigned types (and on unsigned types, shifts are almost straightforward)

And this is a really big hunk of language design, even if we leave out the value types part (we could probably roll out an ad hoc implementation of unsigned types, and later transparently re-implement them as value types). So, we just haven’t finished it yet. And, trust me, it’s all interconnected, i.e. we can’t be sure we know the API for bitwise operations before we have finalized the design of unsigned types.

Hence the kotlin.experimental package.

Workarounds for now

Back to bit-fiddling alone (all following does not speak about unsigned types or hex constants).

The best way that I can see for now is having a separate library for bitwise operations. Inline functions will make it fairly performant (there will be a bit more byte code than in straightforward translation, but that will be fixed at some point by the compiler getting smarter). Anyone can write such a library defining operations like

infix inline fun Byte.shl(shift: Int): Int = this.toInt() shl shift

And so on. You can choose to return Byte, if that fits your use cases better. You can even draw all of Hacker’s delight in as library functions (which strikes me as a great exercise, btw :)).

If at some point there is such a library (or even many of them), we’ll definitely consult its design while making our decisions.


Sorry for the temporary inconvenience, and have a nice Kotlin!


#2

Awesome Andrey,

I am so glad I have embraced Kotlin because I am really looking forward to all these features :slight_smile:

If it can help, this is how I decided so far for my unsigned lib

I am actually supporting two approaches regarding unsigned operations:

  • you are aware of which operations are sign sensitive (/, %, shr, <, <=, >, >=, up-casting, parse*(), toString()) and which are not (+, -, *, ~, &, |, ^, shl, ==, !=, down-casting) and you use the signed primitive variables as unsigned paying attention to using the corresponding unsigned version of the sign-sensitive operations (udiv, urem, ushr, compareTo, parseUnsigned*())

  • you just don’t care and want the language to take care of that. I have boxed types for that, Ubyte, Ushort, Uint and Ulong. You can use all the operations you want without worries, everything is handled behind the cortains.

Implementing unsigned types as signed under the hood would be simply perfect because it’d be the answer to these two approaches.

About operations with different types, I ended up giving priority to the initial term. That means, if I have

val a = 3.ub
val b = 2.ui
val c = a & b

then c would be automatically Ubyte because the first term a of the & operation is an Ubyte.
ub and ui are nothing else than field extension for .toUbyte() and .toUint().

If you want c to be an Uint, you can either

val c = a.ui & b

or

val c = (a & b).ui

This logic is also the same in case of unsigned and signed types. If this time b is a simply Int

val a = 250.ub
val b = 5
val c = a / b

c is always a Ubyte

Alternatives:

val c = a.i / b

or

val c = (a / b).i

I thought about doing the same way of java for signed, that is in case of byte and int, then the byte will be converted to int (similarly ub and ui to ui).

But since we are talking about unsigned, it doesn’t make sense, because you may need to up-cast in turn that int to a long under the hood, so… the first operator decides.


#3

It’s better to wait for a while and have something good rather than haste and regret later. I think having signed and unsigned numbers with supporting bitwise operations is necessary for Kotlin. Specially since it’s going native.


#4

Note that it’s safe to use these bitwise operations from kotlin.experimental, even if you care much about binary compatibility: since these function are inline-only, they do not leave any reference to that package in your compiled bytecode.
So when we remove them later from that package (of course with a deprecation cycle), your previously compiled code will remain to function properly.


#5

In line with this proposal, it may be interesting to have a compile-time only set of literal number types. In bytecode they would decay to the appropriate primitive/boxed number, but at compile time they may automatically coerce to other numbers that the value is in range of.

For example the literal ‘5’ would be able to be assigned without casting to byte, short, char, int, long (perhaps float and double). The literal ‘256’ would not be validly assigned to a byte, but to the others. Unlike with variables this is actually safe as literals by themselves are particular values.

There are some places for confusion, in particular what happens with type deduction (the deduced type would be the actual number type, not the literal type) It probably makes sense to use Int if valid and Long otherwise. I’m not sure what the correct approach for floating point numbers should be (in general double is better, but that would mean that float is never deduced).

This does not preclude the use of type suffixes on the numbers, where the deduced types would be the specified ones. The main use of this is to remove some noise needing to be explicit about number type where the actual needed type is known and the transformation can be performed without loss at compile time.


#6

@pdvrieze So, if I get you right, you want to be able to write:

val b: Byte = 5
val s: Short = 5
val i: Int = 5
val l: Long = 5

but not val b: Byte = 256 ?

Then you’d be glad to know that it’s already possible Kotlin 1.0.


#7

I didn’t know it was. If so, great! I also want to be able to do (if not supported yet):

functionTakingAByte(5)
functionTakingAShort(5)
// etc....

where the compiler again would complain that 256 (actually 128 for signed) is too large for a byte.