When JVM will get rid of Number conversions?

Current primitive type separations like Int, Short, Byte, Float etc. passed decades. While C++ devs laugh at Java’s boxing/unboxing, JS and Python are fine without even a type. What currently modern world needs to get rid of different number types without performance and memory loss?

Boxing and unboxing have nothing to do with the existence of different types of ints.

Having no distinction between floats and ints (which is a very bad idea anyway) without performance loss is just not possible

6 Likes

C++ have exactly the same boxing problem as JVM. The only difference is that C++ templates allow to generate multiple specialized versions via templates. If you believe there is a way to remove number specialization without performance loss, please share it, you will get a prize. People are searching for a way for the last 20 years.

Python and JS numbers are very ineffective by the way, you need to use specialized arrays in numpy or WebGL intrinsics in JS to achieve reasonable performance.

7 Likes

There are some details in English version of talk I made at Joker last year: Math architecture in Kotlin - YouTube. Also you can discuss these topics at #mathematics channel at kotlin slack.

1 Like

I think the problem is not only the performance. The main problem is that there is infinite number of… well, numbers, so we would need infinite memory to store them. And we not only can’t use infinite memory, but we actually would like to use a very few bytes to store a number. For this reason we had to cut a very small portion of the whole cake that is the most practical for our real use cases. And we actually identified two such cake pieces: integers and floats.

Integers and flows have much different properties, they behave differently and should be used for different things, they shouldn’t be mixed together. Using a single type for both is error-prone. Example in Python:

range(1, 4 / 2)

This throws a runtime exception, so you won’t notice the bug until you execute the code and it gets to the specific point/state. Example in JavaScript:

let n = (0.1 + 0.2) * 10 
for (let i = 0; i < n; i++) {
    console.log("i: " + i)
}

How many times we’ll print to console? No, not 3. The correct answer is: 4.

In Kotlin or Java the first example would work correctly and the second would not even compile, because integers and floats are separated.

2 Likes

I understand that Short, Int and Long might seem to be getting redundant, but then you realise that they’re totally not as soon as you’re doing something that has hardware or a native library at the other end (which, yes, some of us do a lot with java and kotlin, imagine that).
Removing them would severely restrict the flexibility of JVM-based languages without any tangible benefit.

The only way anybody could say that the distinction between integers and floats should be removed is if they were not aware of the intrinsic differences of these types and why they exist in the first place. It may surprise you, but Python does distinguish between these two types (and complex). And JS… I mean, has any sane person ever looked at JS and said “ah yes, this language does types very well, we should totally do it like that in other languages as well”? Last time I was checking, there as an entire ecosystem of languages trying to fix JSes type issues… :stuck_out_tongue_closed_eyes:

1 Like

Hm, dynamic memory like in arrays?

How do you reconcile these 2?

The case is:

  • maybe some new hardware architectures provide the possibilities;
  • one or another (CPU or memory) drops can be insignificant for the sake of solution. If memory drop leads to CPU boost, it’s fine;
    So, even if you took the lines from different contexts, the point was to keep the balance CPU + Memory about the same, not that each one should be the same. And actually, the memory is cheap.

Nowadays we can replace Shorts and Ints with Longs in many situations. We would still need Bytes: if some data is represented by a byte array, we probably don’t want to see the elements as another numeric type.

Python actually has types: there’s int for integers and float for floating point (and also bytes and bytearray for manipulating bytes). Python’s int can have arbitrary size, so it is similar to BigInteger. However, having something like this as the default certainly has performance implications.

JavaScript indeed mixes integers and floating point numbers: both have type number, and even something like Number.isInteger(3.0) or Number.isInteger(3.14**50) returns true. For me, it is one the worst design decisions in JS.

2 Likes

I believe JavaScript has only floats. That approach kind of works, because to some degree floats can be used to store integers. But well…

>> 9999999999999999
10000000000000000 

:smiley:

2 Likes

You are right, the standard explicitly says that “numbers” are IEEE 754 floats. But it also talks about integer values :exploding_head: Floats implicitly get converted to integers when the operator expects an integer. E.g. 1.000 ^ 1.999 evaluates to 0 (it implicitly applies floor()). It also extends % to floats: 3.75 % 1.2 = 0.15. I find this a bit confusing and problematic… The “correct” integers are available as bigints.

2 Likes

I keeps me up at night that a for loop like for (let i = 0; i < 10; i++) is used with a float rather than an int

3 Likes

That would require some crazy revolution in how hardware works, beyond human understanding at this point. You’ve heard about 32-bit vs 64-bit architectures? It was 16 bit before that (DOS) and even 8 bit (Intel 8080). Every such jump was a major thing. What you’re asking here is basically “give me an infinite-bit architecture.” This is not how it works. A CPU is insanely fast with its native integers. IIRC, on my first 100 MHz i468 CPU in 16-bit mode, integer additions/subractions took 4 CPU clock cycles, which is… 40 nanoseconds. That’s on a now almost 30 years old CPU!

Nope. Both will suffer, and tremendously. Memory allocation overhead for using arrays to represent numbers is huge in terms of both CPU and RAM. This is fine for slow interpreted languages, but not for something that claims to be anywhere efficient. It may be fine for a small webstore, but definitely not fine for things such as compilation, video encoding, signal processing…

4 Likes

First, Short, Byte etc are not meant to be used in general code. They’re here to integrate with external resources, like hardware cards or transport protocols. In regular calculations, they provide no benefits because most CPU architectures can’t do math on them (and the JVM can’t either, all math is done on Int or higher).

In day-to-day life you’ll only encounter Int (“normal integer”), Long (“I’m storing stuff greater than a few billion”), Float (“normal floating-point number”) and Double (“it needs to be extra precise”). They are only boxed when appearing as erased type parameters or when nullable. The rest of the time (almost always), they are literally mapped directly to the memory. That’s basically the fastest thing a computer can do.

Compare storing a single Int compared to storing anything else. It’s 32 bits, so you can just put in somewhere in memory. It takes 32 bits.

Dynamic memory cannot be stored on the stack (where local variables are), because the size of the stack for a given function call is static. So what you have to do is request some memory space from the OS (~10–1000 times slower than storing a regular integer), write your number there (~1–100 times slower than storing a regular integer if you’re unlucky with the cache), also store the size of the array, which you’ll probably want on 32 bits, so that’s another integer write, and then you’ll want to store a pointer to that array, which is another integer write.

If we do the calculation, you have a memory usage that is at least 4 times what the JVM (or C/C++/Rust…) use, and are two or three orders of magnitude slower. Of course, you could try to optimize some things, but at the end of the day you’re doing enormous amount of useless work, for what benefits? And this is even without considering that you actually have to free that array at some point, which costs on memory tracking and processing power.

This is one of the reasons Python is so much slower than everything else.

1 Like

Another use case for shorts and bytes is to reduce memory — though that’s obviously important only when you’re storing shedloads of them, and so usually in the form of ShortArray/ByteArray/etc.

And interoperability with external systems includes common cases such as databases and APIs. (Obviously, a larger Kotlin type can work there, but introduces the risk of passing out-of-range values to the other system; it’s also less helpful in documentation terms.)

I may be wrong, but I believe JVMs internally store bytes and shorts as ints anyway (or even longs since 64-bit archs?). ShortArray/ByteArray help to reduce the memory, although technically speaking we don’t necessarily need Short to have a ShortArray - its API could be based on integers.

As others said: bytes and shorts are mostly for some kind of I/O where integer sizes matter a lot.

2 Likes