Why kotlin does not provide LinkedList implementation?

pys · January 31, 2020, 5:40pm

I certainly didn’t feel insulted. What I meant is that it was insulting for programmers in general, and that insult is not an argument. Thinking that programmers would be worst than I am or even as bad as I am and that it would be a valid reason to remove linked lists would be basing the decision upon cognitive bias. Whether programmers are able or not to make an educated decision about using linked lists is irrelevant. The only question is whether they are useful for some use cases or not.

Array-based lists are very error-prone too. I have seen many bugs due to the fact that these lists are not persistent. Programmers know that they have to make defensive copies when necessary. The problem is that people arguing against linked lists (note that I am talking about immutable persistent linked lists and not Java linked lists) generally “forget” this.

I have made a benchmark comparing (immutable persistent) linked lists, Kotlin mutable lists, and Kotlin’s non-modifiable lists. I have compared operations that are not well suited to linked lists as well as one which is: a scan of the list. A scan is like a fold but producing a list of all the intermediate results. Here are the results (in milliseconds):

 Insertion of 100000 elements to the start of a linked list: 61
 Insertion of 100000 elements to the start of a mutable list: 1411
 Insertion of 100000 elements to the start of a non-modifiable list: 18
 Insertion of 100000 elements to the end of a non-modifiable list: 9670
 Insertion of 100000 elements to the end of a mutable list: 16

 Retrieval of 100000 elements from the start of a linked list: 28
 Retrieval of 100000 elements from the start of a mutable list: 22
 Retrieval of 100000 elements from the start of a non-modifiable list: 3
 Retrieval of 100000 elements from the end of a linked list: 25
 Retrieval of 100000 elements from the end of a non-modifiable list: 3
 Retrieval of 100000 elements from the end of a mutable list: 2

 Reversing a 100000 elements linked list: 2
 Reversing a 100000 elements mutable list: 4
 Reversing a 100000 elements non-modifiable list: 7

 Scanning a 100000 elements linked list: 19
 Scanning a 100000 elements mutable list: 11

This seems to show that except for insertion, linked lists are not very performant.

The problem is that ther is a bug in the scan function:

fun <T> scan(list: MutableList<T>): MutableList<MutableList<T>> {
    val result = mutableListOf<MutableList<T>>()
    for (i in (0..list.size)) {
        result.add(list.subList(0, i))
    }
    return result
}

With this implementation, any modification to the original list after the scan will be seen in the result of the scan, and any modification to the result of the scan will be seen in the original list. Once the bug is fixed (by making a copy of each sublist), there is a slight difference:

Scanning a 100000 elements linked list: 21
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Let’s try with a smaller list (20 000 elements):

Scanning a 20000 elements linked list: 7
Scanning a 20000 elements mutable list: 8136

As you can see, there are use cases for which array-based lists are dangerous and immutable persistent linked lists are much safer and much more performant.

Wasabi375 · January 31, 2020, 6:35pm

This isn’t the question though. The standard library is supposed to provide interfaces and implementations that are useful in most if not all cases and not just in some special scenario. As explained here LinkedList is only useful in special cases and therefor shouldn’t be part of the standard library.

You realize that LinkedList in the general sense of the word does not refere to just the persistent version. The oposite is the case, if you speak about linked lists most people (especially a comunity that mostly uses the jvm) will think of a mutable list (just like the java implementation. Also your argument has nothing to do with LinkedList vs ArrayList and more with persistent vs mutable.
That said koltin already provides a special library for immutable data structures (see link above).

I’m not really sure what your benchmark is supposed to show. You say use use a different version of scan than the one you supply so I can’t really confirm your results or whether or not your implementation is efficent for array based lists. In either case I would think that this again is a problem of immutable vs mutable and therefor not really relevant here.

mattquigley · January 31, 2020, 10:27pm

Can someone explain why LinkedLists are useless? And more interestingly, can someone explain why they are harmful?

naxymatt · February 1, 2020, 12:34am

The simple answer is that linked lists are always slower than array-based lists on modern hardware. With array-based lists, elements are stored adjacent to each other in memory. With linked lists, they are not. Adjacent elements in memory can be loaded together into the CPU cache. Elements of linked lists cannot. If data is not in the CPU cache, the CPU has to stop working and wait for the data to be loaded from RAM. The CPU can sit idle for hundreds of cycles waiting for RAM. RAM is painfully slow compared to CPU cache memory. So you lose all the theoretical performance gains of a linked list and then some.

A long time ago, when CPUs were less sophisticated, it’s true that linked lists could be more performant for some types of insertion and deletion operations. However, it has been decades since that has been true.

This topic comes up quite often because a lot of CS books still (incorrectly) teach that linked lists are faster for insertion and deletion, so if you Google “linked list vs array performance” you can find lots of benchmarks across lots of languages, and quite a few good explanations (most better than mine here) about why linked lists are slow.

To be thorough, I should note that there is a difference in the performance of adding to the END of a linked list vs. adding to the END of an array: Both operations are O(1) in amortized time, but adding to an array might occasionally take longer due to the resize operation. It will still be faster in practice in nearly all cases due to CPU cache, but I suppose that theoretically if you were working with lists of millions of elements and needed to add to the end in constant time, a linked list might be the right choice. However, when you go to read back the data from the list (or do anything, really, besides adding to the end) performance is going to be painfully slow. In practice, in 25+ years of software development, I have never seen a case where performance improved through use of a linked list, even for “append-only” uses.

So, in a nutshell—you should really never use linked lists.

mtopolnik · February 1, 2020, 3:43pm

This is not completely fair to the linked list. For one, even an array list is just an array of pointers, to iterate it you still have to dereference them. So the difference is between one and two levels of indirection. Regardless of cache friendliness, data-dependent loads resulting from pointer dereference are always more expensive than loads by dead reckoning (pointer + index * object_size), which the prefetcher can predict.

Secondly, objects aren’t typically scattered all over the heap. The most typical pattern is create object - add to list - create next object - add to list - …. Since the JVM always allocates memory sequentially from a large contiguous block, we end up with a tight memory layout. Array list is again at a slight advantage because it doesn’t allocate an extra object every time.

Consider that, if you expand the array by doubling it and finish adding when the array is exactly full, every element will be copied just once (in amortized terms). In the worst case, if you finish adding just after the expansion, there will be two copies per element. The cost of copying one element in a bulk copy operation is extremely low and gets lower as the array grows: I clocked System.arraycopy of Object[] at 1-2 billion elements per second so the cost of one copy is about 1 nanosecond. Therefore the insertion overhead of array list is less than 5 nanoseconds. The linked list must allocate and initialize a brand new object every time. This pattern doesn’t receive the throughput benefits of a bulk operation and its cost is probably comparable to the above, could be even worse.

naxymatt · February 2, 2020, 2:48am

True, but insertions and deletions don’t require dereferencing those pointers. With a LinkedList, they do. This is where I was talking about CPU cache making a difference.

Also true, but I think with the caveat that generally if you’re adding to a list, you’re often also allocating the objects that will be added, and those allocations are likely to be interspersed with the LinkedList allocations (and if they are large, will push the LinkedList allocations apart).

mtopolnik:

Consider that, if you expand the array by doubling it and finish adding when the array is exactly full, every element will be copied just once (in amortized terms). In the worst case, if you finish adding just after the expansion, there will be two copies per element. The cost of copying one element in a bulk copy operation is extremely low and gets lower as the array grows: I clocked System.arraycopy of Object[] at 1-2 billion elements per second so the cost of one copy is about 1 nanosecond. Therefore the insertion overhead of array list is less than 5 nanoseconds. The linked list must allocate and initialize a brand new object every time. This pattern doesn’t receive the throughput benefits of a bulk operation and its cost is probably comparable to the above, could be even worse.

Great analysis!

I think the lesson here is: Theoretical performance of an algorithm on a whiteboard or a CS textbook is often quite different from practical performance in a particular environment (in this case JVM on a modern CPU).

I for one am quite happy with LinkedList being unavailable from Kotlin. It’s possible to come up with contrived examples where it’s performance is similar to ArrayList, but not really any practical case where it’s better than ArrayList. I’ve had several times in my career where I’ve been able to achieve significant performance improvements by refactoring uses of LinkedList to ArrayList in Java code. With Kotlin, ArrayList is essentially enforced from the start, and I think that’s a good thing.

msink · February 2, 2020, 9:47am

Not in stdlib now, but candidate: https://github.com/Kotlin/kotlinx.collections.immutable/blob/master/core/commonMain/src/implementations/immutableList/PersistentVector.kt

Persistent vector made of a trie of leaf buffers entirely filled with [MAX_BUFFER_SIZE] elements and a tail having from 1 to [MAX_BUFFER_SIZE] elements.

pdvrieze · February 2, 2020, 4:39pm

Not only that, the way Java Iterators work means that actually moving to the location still requires pointer chasing. Of course it also doesn’t play nice with caches on modern(ish) CPUs

pys · February 9, 2020, 5:23am

My benchmark shows that if you need sublists that can be modified independently from the original (or the other way around), array lists are a very bad solution.
The other version uses a very standard implementation of an immutable persistent linked list (too long to show here). The problem with array lists is that they are extremely slow to duplicate. When a method receives a list as its argument, there is no guarantee that the list won’t be modified after being received. Making a defensive copy is very slow and not even a good protection unless it is protected against concurrent access during the copy. Immutable lists solve this problem. And concurrent access is not the only problem. Bugs may also occur when a sub list is modified long after having been created.
The problem in most discussions about linked list is that the comparison is made between linked lists and array lists in array lists use cases. I am certainly not saying that linked lists are better than array lists. Immutable array lists solve a problem that array lists can’t solve. I understand they can be dangerous, but not more than array lists. I have encountered much more bugs due to concurrent access to array lists or unaware access to modified lists than to wrong use of linked lists.
Some use cases requires immutable lists and immutable linked lists are often much more efficient than immutable array lists.
But the main problem is the attitude of those who want to ban linked lists, either because they don’t know how to use them or because they think others don’t know. If they don’t know, they should learn, and if they think others are bad programmers who would make bad use of them, I would suggest to remove all dangerous constructs, such as loops, mutable references, exceptions, and more. If I chose to never use these constructs for safety reasons, would I be entitled to ask that they be removed from Kotlin?

Wasabi375 · February 9, 2020, 5:34am

Again. The original question wasn’t talking about an immutable version of LinkedList, but a mutable implementation like in java. Your benchmark doesn’t help there, because you are using an immutable version, which solves a different problem.
Also as explained above, kotlin actually has immutable collections (see linke above). They are currently in an experimental state and available as a separate library. AFAIK there have been discussions about merging this with the standard library once they are finished, but there are also arguments for leaving this as a separate library.

I haven’t yet seen an argument for a mutable linked list and I’m sure that most people here know that immutable constructs have performance benefits. No one here is arguing agsinst immutable linked lists. And the argument is not to remove all dangerouse constructs, but those that have extremely limited usecases and can also easily be implemented outside of the standard library. Mutable linked lists are a good example of such a dangerous construct, that shouldn’t be part of stdlib.

iconsize · June 14, 2020, 7:37pm

Afaik, Garbage collection uses circular doubly linked lists. Arrays would be a problem to get for due to limited contiguous memory for this amount of data and would require to copy big amounts of data.
Personally, I would favor for (Single+Doubly) Linked Lists, they are traditional and have use cases in certain aspects.
Further, user defined extensions both of them can reuse the implementation in the standard library.

Most users won’t use LinkedLists anyway because listOf calls use the standard implementation, which are ArrayLists.

nguyen-sde · March 5, 2022, 11:11am

Very interesting discussion!

So, my understanding is that:

All actual implementations are still mutable behind the scene for now.
Immutable is just a wrapper of default mutable implementations.
We want to have real immutable implementations for useful data structures in the future. This is still in progress here

hanabi1224 · March 12, 2022, 2:48am

One of the common use cases is to implement ordered HashMap that preserves insertion order of the keys with doubly-linked list LinkedHashMap - Kotlin Programming Language

broot · March 12, 2022, 9:52am

mutableMapOf() already does this.

Topic		Replies	Views
Why kotlin does not provide ArrayDeque implementation? Language Design	3	3742	February 2, 2020
LinkedList in kotlin-stdlib-common Libraries	2	1755	March 14, 2019
Thread safe Data Structures in Kotlin Language Design	7	23942	November 30, 2020
Challanges in game development by Kotlin Language Design	5	920	January 3, 2024
Kotlin Persistent Datastructures?	4	8141	July 8, 2018

Why kotlin does not provide LinkedList implementation?

Related topics