Time complexity of containsAll

Zinou · February 12, 2022, 10:00pm

I came across a problem where I had to compare two lists of objects if they are the same, both lists are not sorted, so using the != operator is not the right way because it compares sorted lists, So I wrote something like:

fun <T> List<T>.isSame(secondList: List<T>): Boolean {
    if (size != secondList.size) {
        return false
    }
    return containsAll(secondList)
}

So , I’m wondering what is the best way to do it, sort the list then compare them or is there a better way ?

Another question that I have ('cuz I don’t find this in Kotlin’s docs) : what is the Time/Space complexity of containsAll ?

broot · February 12, 2022, 10:30pm

So you need to verify that both collections have exactly the same items, but ordering doesn’t matter, correct? What about duplicated elements? Assuming the first list contains some item twice, does it mean the second list also has to contain two such items?

Zinou · February 12, 2022, 10:36pm

Yes, this is what I want , and yes you’re right but I forget to say that lists don’t contain duplicated elements because each element is identified with a unique key.

broot · February 12, 2022, 11:05pm

Then you actually work with sets, not lists. If it is possible, I suggest storing this data as sets in the first place, then you can compare them with simple: set1 == set2. If you have to use list/iterables, then you can easily convert them to sets:

list1.toSet() == list2.toSet()

But it may be a little more performant if converting only the first one:

list1.toSet().containsAll(list2)

Regarding the time complexity. HashSet has constant time complexity for both inserting and looking for an item. Therefore, converting a list to set is O(N) and looking for N elements should be also O(N). So, converting and searching is O(N+M)where N is the size of the first collection and M is the size of the second collection.

BTW, searching in a list is O(N), so containsAll() should be O(N*M) - much more.

Zinou · February 13, 2022, 8:12am

Ok I see, since I don’t have duplicate elements , set is better.

I think in this case , instead of using toSet I can use sortedSet no ?

broot · February 13, 2022, 8:19am

If you need to keep items automatically sorted then yes, you can use a sorted set. Just note it is slower than the unsorted hash set - I believe it is O(log(N)) for inserting and searching, while it is O(1) for the hash set.

Zinou · February 13, 2022, 11:40am

To test your idea, I wrote something like this : Since I have to use list

fun <T> List<T>.isSame(secondList: List<T>): Boolean {

    if (size != secondList.size) {
        return false
    }
    return containsAll(secondList)
}

fun <T> List<T>.isSameUsingSet(secondList: List<T>): Boolean {

    return toSet() == secondList.toSet()
}

I declared 2 lists :

val firstList = listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
val secondList = listOf(5,4,3,2,1,10,9,8,7,6,11,12,13,14,15)

But i was surprised when I tested the 2 functions :

time for using isSame() in millis: 1
time for using isSameUsingSet() in millis: 4

Is it because the converting ?

broot · February 13, 2022, 12:25pm

If you have collections of 15 elements then the time complexity doesn’t really matter. It’s more about overhead and yes, hash sets are generally more complicated and require additional steps. Run your test for 1 million items - this is where the time complexity matters.

Also, if you measured the performance by simply subtracting the time after and before, then this approach gives a very imprecise results (some people would say: useless). Use JMH as a minimum.

Anyway, why do you care so much about it? If you don’t work with big data sets, then you can as well don’t think about it at all. In practice, such operation as above probably takes some nanoseconds.

Zinou · February 13, 2022, 1:13pm

Totally agree, I have tested on 1 million and the result is amazing

time for using isSame() in millis: 721088
time for using isSameUsingSet() in millis: 152

Even me I’ll say it’s useless and agree too for using JMH, I just used time subtracting for quick test

I care about this, 'cause I want to know next time which structure to choose and why.

Thanks

broot · February 13, 2022, 1:20pm

Ahh, ok, I agree to this. It’s good to understand data structures, their typical use cases, their advantages and disadvantages. I would probably choose set here - not as an optimization, but rather to use proper tools depending on the task

madmax1028 · February 13, 2022, 3:08pm

If you want to compare lists for having the same elements but in unspecified order then converting lists to sets is a bad idea. Example:

val list1 = listOf(1, 2, 2, 3, 3)
val list2 = listOf(1, 2, 2, 2, 3)
val set1 = list1.toSet() // [1, 2, 3]
val set2 = list2.toSet() // [1, 2, 3]

You can now probably see the problem. The key point are different object counts. To make it work as intended you don’t need a set, but a multiset or count map. You can use standard library functions to implement this:

fun List<*>.hasSameElements(otherList: List<*>): Boolean {
	val counts1 = this.groupingBy { it }.eachCount()
	val counts2 = otherList.groupingBy { it }.eachCount()
	return counts1 == counts2
}

broot · February 13, 2022, 3:14pm

I started the whole discussion by asking about duplicated items and OP answered that items are guaranteed to be unique.

Zinou · February 13, 2022, 10:26pm

@madmax1028 I see the problem thanks , but I told @broot that items are unique

luhtonen · February 14, 2022, 10:37am

This is incorrect generalisation. For example when one of the values is very big and the other is very small, then O(n*m) is same as O(n) (assume n is the biggest one). In general when talking about time complexity, O(n*k) is usually considered to be a special case of linear complexity which is equivalent to O(n). Only when we have special case where n == k it turns into quadratic complexity O(n^2).

Since your lists are already loaded in the memory you should not be concerned about space complexity.
And you should start worrying about time complexity when you’re working with very large collections.

broot · February 14, 2022, 10:47am

Oh, thanks for noting this. In this specific case we expect that both lists have the same size at least sometimes, so I guess we can consider it quadratic, is this correct?

Zinou · February 14, 2022, 11:12am

@luhtonen I’m not wondering about space complexity I know, I asked about time complexity, and I agree with your explanation , but I asked as I said to @broot to know which structure to choose next time and why for example , since I know that I dont have duplicated items and I don’t care about the order it will be better to use Set, no ?

Of course the choose of a structure depend on many other parameters.

Zinou · February 14, 2022, 11:13am

Yes

Topic		Replies	Views
Compare two collections, but only by certain fields Support	3	9032	November 14, 2022
Contains() with predicate? Language Design	5	23805	October 26, 2018
Contains array of arrays Support	14	8773	March 5, 2018
Sorted collection by compareTo, uniquely by hashcode Support	18	2450	January 6, 2021
Duplicities in Set Support	2	586	September 11, 2023

Time complexity of containsAll

Related topics