Time complexity of containsAll

I came across a problem where I had to compare two lists of objects if they are the same, both lists are not sorted, so using the != operator is not the right way because it compares sorted lists, So I wrote something like:

fun <T> List<T>.isSame(secondList: List<T>): Boolean {
    if (size != secondList.size) {
        return false
    }
    return containsAll(secondList)
}

So , I’m wondering what is the best way to do it, sort the list then compare them or is there a better way ?

Another question that I have ('cuz I don’t find this in Kotlin’s docs) : what is the Time/Space complexity of containsAll ?

So you need to verify that both collections have exactly the same items, but ordering doesn’t matter, correct? What about duplicated elements? Assuming the first list contains some item twice, does it mean the second list also has to contain two such items?

1 Like

Yes, this is what I want , and yes you’re right but I forget to say that lists don’t contain duplicated elements because each element is identified with a unique key.

Then you actually work with sets, not lists. If it is possible, I suggest storing this data as sets in the first place, then you can compare them with simple: set1 == set2. If you have to use list/iterables, then you can easily convert them to sets:

list1.toSet() == list2.toSet()

But it may be a little more performant if converting only the first one:

list1.toSet().containsAll(list2)

Regarding the time complexity. HashSet has constant time complexity for both inserting and looking for an item. Therefore, converting a list to set is O(N) and looking for N elements should be also O(N). So, converting and searching is O(N+M)where N is the size of the first collection and M is the size of the second collection.

BTW, searching in a list is O(N), so containsAll() should be O(N*M) - much more.

2 Likes

Ok I see, since I don’t have duplicate elements , set is better.

I think in this case , instead of using toSet I can use sortedSet no ?

If you need to keep items automatically sorted then yes, you can use a sorted set. Just note it is slower than the unsorted hash set - I believe it is O(log(N)) for inserting and searching, while it is O(1) for the hash set.

1 Like

To test your idea, I wrote something like this : Since I have to use list

fun <T> List<T>.isSame(secondList: List<T>): Boolean {

    if (size != secondList.size) {
        return false
    }
    return containsAll(secondList)
}

fun <T> List<T>.isSameUsingSet(secondList: List<T>): Boolean {

    return toSet() == secondList.toSet()
}

I declared 2 lists :

val firstList = listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
val secondList = listOf(5,4,3,2,1,10,9,8,7,6,11,12,13,14,15)

But i was surprised when I tested the 2 functions :

time for using isSame() in millis: 1
time for using isSameUsingSet() in millis: 4

Is it because the converting ?

If you have collections of 15 elements then the time complexity doesn’t really matter. It’s more about overhead and yes, hash sets are generally more complicated and require additional steps. Run your test for 1 million items - this is where the time complexity matters.

Also, if you measured the performance by simply subtracting the time after and before, then this approach gives a very imprecise results (some people would say: useless). Use JMH as a minimum.

Anyway, why do you care so much about it? If you don’t work with big data sets, then you can as well don’t think about it at all. In practice, such operation as above probably takes some nanoseconds.

1 Like

Totally agree, I have tested on 1 million and the result is amazing :slight_smile:

time for using isSame() in millis: 721088
time for using isSameUsingSet() in millis: 152

Even me I’ll say it’s useless and agree too for using JMH, I just used time subtracting for quick test

I care about this, 'cause I want to know next time which structure to choose and why.

Thanks :+1:

1 Like

Ahh, ok, I agree to this. It’s good to understand data structures, their typical use cases, their advantages and disadvantages. I would probably choose set here - not as an optimization, but rather to use proper tools depending on the task :slight_smile:

2 Likes

If you want to compare lists for having the same elements but in unspecified order then converting lists to sets is a bad idea. Example:

val list1 = listOf(1, 2, 2, 3, 3)
val list2 = listOf(1, 2, 2, 2, 3)
val set1 = list1.toSet() // [1, 2, 3]
val set2 = list2.toSet() // [1, 2, 3]

You can now probably see the problem. The key point are different object counts. To make it work as intended you don’t need a set, but a multiset or count map. You can use standard library functions to implement this:

fun List<*>.hasSameElements(otherList: List<*>): Boolean {
	val counts1 = this.groupingBy { it }.eachCount()
	val counts2 = otherList.groupingBy { it }.eachCount()
	return counts1 == counts2
}

I started the whole discussion by asking about duplicated items and OP answered that items are guaranteed to be unique.

1 Like

@madmax1028 I see the problem thanks :+1: , but I told @broot that items are unique

This is incorrect generalisation. For example when one of the values is very big and the other is very small, then O(n*m) is same as O(n) (assume n is the biggest one). In general when talking about time complexity, O(n*k) is usually considered to be a special case of linear complexity which is equivalent to O(n). Only when we have special case where n == k it turns into quadratic complexity O(n^2).

Since your lists are already loaded in the memory you should not be concerned about space complexity.
And you should start worrying about time complexity when you’re working with very large collections.

Oh, thanks for noting this. In this specific case we expect that both lists have the same size at least sometimes, so I guess we can consider it quadratic, is this correct?

2 Likes

@luhtonen I’m not wondering about space complexity I know, I asked about time complexity, and I agree with your explanation :+1: , but I asked as I said to @broot to know which structure to choose next time and why for example , since I know that I dont have duplicated items and I don’t care about the order it will be better to use Set, no ?

Of course the choose of a structure depend on many other parameters.

Yes :+1: