Using .toString() of data classes as a dependency

bohsen · August 27, 2019, 8:57am

Hi.

I’m working on a project where I have to create a sha1-hash of an object. Calculating the hash isn’t the problem, but I’m worried about the design of it.

The object itself is a data class and I’m relying on the toString() function to provide the input for the hash function. But is a data class’s toString-function cut it stone or could you imagine the implementation changing. Because if the implementation of toString() changes, the output of the hash would end up being different for the same object, using different toString-implementations.

I’m wondering if there might be a better alternative?!? Should I just always override toString and provide my own implementation or should I trust that the implementation of toString will stay the same in regards to data classes and avoid unnecessary boilerplate?

Prototik · August 27, 2019, 9:40am

I see no reasons to further implementation changes of data’s toString() (at least in 1.X branch).

The only available formal language specification doesn’t clarify (yet) implementation details, so technically you cannot hard-depend on current details and must generate some form of stable output by yourself. I suggest you to create such strings from some sort of serialization (json from kotlinx.serialization for example) and calculate hash from it.

But as I said before, I don’t see any reason to change current toString() and 99.9% sure you can just keep using it.

bohsen · August 27, 2019, 10:03am

@Prototik Thx for the input. But is the serialization-solution not prone to the same issue? How would such a solution work?

As for avoiding issues in this regard, it would probably require a minimal set of unittests. So for what it’s worth it should be easy to catch.

Prototik · August 27, 2019, 10:08am

Serialization just a quick way as libraries for data transfer have a strong guarantee about ordering, value representation and so on. No one can forbid you to make this guarantee by yourself (manually writing implementation or using “Generate hashCode/toString methods” from IDE).

bohsen · August 27, 2019, 11:41am

Ahhh now I understand. Because the json representation of my data class (because JSON is a standard) would guarantee the same output (unless of course my data model were to change).

This would of course create some extra overhead for each generation, but it’s definitely worth considering. Thanks.

jstuyts · August 27, 2019, 12:17pm

I would not be to sure of that. (JSON) objects are basically maps, and you cannot assume a guaranteed order of the keys of a map (unless it is a specialized map that does have that guarantee).

And in JSON there are infinite ways to specify the same value. For example:

{"foo":"bar"}
{ "foo": "bar" }
{
    "foo": "bar"
}

bohsen · August 27, 2019, 12:27pm

I would not be to sure of that. (JSON) objects are basically maps, and you cannot assume a guaranteed order of the keys of a map (unless it is a specialized map that does have that guarantee).

I was thinking about putting this to a test using kotlinx.serialization. I’ve been thinking the same.

And in JSON there are infinite ways to specify the same value. For example:
{"foo":"bar"}
{ "foo": "bar" }
{
    "foo": "bar"
}

Removing the prettyprint features and just using raw json should get rid of this problem I believe.

jstuyts · August 27, 2019, 12:45pm

But then you would need the same guarantee for the serializer as you hoped to get for toString(). As long as the serializer has some freedom when generating the JSON, you would need somebody to guarantee you that the output is stable.

Wasabi375 · August 27, 2019, 12:51pm

Let’s be honest though. I don’t think anyone really believes that the toString implementation for data classes will change in the near future.
I haven’t seen any discussion about changing anything there and I can’t come up with any argument why you would want to. So unless you need a 100% guarantee that your implementation will work with every kotlin version for the next 10,000 years toString should be fine.
Otherwise I suggest you override toString yourself just to be sure. Also I guess you can use an annotation processor to generate you a function that generates the string if writing it manually is to much work but this is an extreme solution.

bohsen · August 27, 2019, 1:18pm

But then you would need the same guarantee for the serializer as you hoped to get for toString() . As long as the serializer has some freedom when generating the JSON, you would need somebody to guarantee you that the output is stable.

@jstuyts Well I kind of have this I believe if the json is to be valid. I would have a guarantee within the bounds of the specification for the JSON-standard. My concern with the serializer solution is actually more (from experience) that they always require complex configuration for them to work on mildly complex objects. They never work out of the box.

So unless you need a 100% guarantee that your implementation will work with every kotlin version for the next 10,000 years toString should be fine.

@Wasabi375 My thought exactly.

Thanks for all the input. Mighty nice of you.

alexey_e · August 29, 2019, 9:32am

If you worried about implementation and have a critical point on it, then implement it yourself

Topic		Replies	Views
toString for objects is ridiculous	1	854	August 31, 2020
Allow customization of data class toString Language Design	8	10546	November 23, 2017
Any?.toString() problem Language Design	4	1631	May 2, 2023
Explicitly calling auto-generated toString() in data class Support	4	812	January 15, 2019
Avoid auto generate hashCode and equals when using @JvmRecord Support	4	660	June 22, 2023

Using .toString() of data classes as a dependency

Related topics