I came across a problem where I had to compare two lists of objects if they are the same, both lists are not sorted, so using the != operator is not the right way because it compares sorted lists, So I wrote something like:
fun <T> List<T>.isSame(secondList: List<T>): Boolean {
if (size != secondList.size) {
return false
}
return containsAll(secondList)
}
So , I’m wondering what is the best way to do it, sort the list then compare them or is there a better way ?
Another question that I have ('cuz I don’t find this in Kotlin’s docs) : what is the Time/Space complexity of containsAll ?
So you need to verify that both collections have exactly the same items, but ordering doesn’t matter, correct? What about duplicated elements? Assuming the first list contains some item twice, does it mean the second list also has to contain two such items?
Yes, this is what I want , and yes you’re right but I forget to say that lists don’t contain duplicated elements because each element is identified with a unique key.
Then you actually work with sets, not lists. If it is possible, I suggest storing this data as sets in the first place, then you can compare them with simple: set1 == set2. If you have to use list/iterables, then you can easily convert them to sets:
list1.toSet() == list2.toSet()
But it may be a little more performant if converting only the first one:
list1.toSet().containsAll(list2)
Regarding the time complexity. HashSet has constant time complexity for both inserting and looking for an item. Therefore, converting a list to set is O(N) and looking for N elements should be also O(N). So, converting and searching is O(N+M)where N is the size of the first collection and M is the size of the second collection.
BTW, searching in a list is O(N), so containsAll() should be O(N*M) - much more.
If you need to keep items automatically sorted then yes, you can use a sorted set. Just note it is slower than the unsorted hash set - I believe it is O(log(N)) for inserting and searching, while it is O(1) for the hash set.
If you have collections of 15 elements then the time complexity doesn’t really matter. It’s more about overhead and yes, hash sets are generally more complicated and require additional steps. Run your test for 1 million items - this is where the time complexity matters.
Also, if you measured the performance by simply subtracting the time after and before, then this approach gives a very imprecise results (some people would say: useless). Use JMH as a minimum.
Anyway, why do you care so much about it? If you don’t work with big data sets, then you can as well don’t think about it at all. In practice, such operation as above probably takes some nanoseconds.
Ahh, ok, I agree to this. It’s good to understand data structures, their typical use cases, their advantages and disadvantages. I would probably choose set here - not as an optimization, but rather to use proper tools depending on the task
If you want to compare lists for having the same elements but in unspecified order then converting lists to sets is a bad idea. Example:
val list1 = listOf(1, 2, 2, 3, 3)
val list2 = listOf(1, 2, 2, 2, 3)
val set1 = list1.toSet() // [1, 2, 3]
val set2 = list2.toSet() // [1, 2, 3]
You can now probably see the problem. The key point are different object counts. To make it work as intended you don’t need a set, but a multiset or count map. You can use standard library functions to implement this:
fun List<*>.hasSameElements(otherList: List<*>): Boolean {
val counts1 = this.groupingBy { it }.eachCount()
val counts2 = otherList.groupingBy { it }.eachCount()
return counts1 == counts2
}
This is incorrect generalisation. For example when one of the values is very big and the other is very small, then O(n*m) is same as O(n) (assume n is the biggest one). In general when talking about time complexity, O(n*k) is usually considered to be a special case of linear complexity which is equivalent to O(n). Only when we have special case where n == k it turns into quadratic complexity O(n^2).
Since your lists are already loaded in the memory you should not be concerned about space complexity.
And you should start worrying about time complexity when you’re working with very large collections.
Oh, thanks for noting this. In this specific case we expect that both lists have the same size at least sometimes, so I guess we can consider it quadratic, is this correct?
@luhtonen I’m not wondering about space complexity I know, I asked about time complexity, and I agree with your explanation , but I asked as I said to @broot to know which structure to choose next time and why for example , since I know that I dont have duplicated items and I don’t care about the order it will be better to use Set, no ?
Of course the choose of a structure depend on many other parameters.