Auto extract final objects from code

sylvainspinelli · August 31, 2022, 11:49am

Imagine a simple function that split a string:

fun checkedSplit(s: String): List<String>? {
	return if (Regex("$[A-Z/]*^").matches(s)) s.split(Regex("/")) else null
}

This function is worst for performance reasons (compile a Regex is costly). A best implementation is:

fun checkedSplit(s: String): List<String>? {
	return if (regexCheck.matches(s)) s.split(regexSplit) else null
}

private val regexCheck = Regex("$[A-Z/]*^")
private val regexSplit = Regex("/")

But this is painful for a developer to extract all these Regex.
In my use cases, it’s not simple Regex, but custom complex patterns like CSS selectors and XPath expressions used in DSL-style Kotlin code and there are plenty of them.
Extract all of them make the DSL code unreadable. Imagine an xsl or a css file with all the xpath or selectors far from its use…

I may have missed a feature in Kotlin, if not I suggest setting a new keyword like “final” or whatever:

fun checkedSplit(s: String): List<String>? {
	return if (final(Regex(".*")).matches(s)) s.split(final Regex ("/")) else null
}

Under the wood, Kotlin replace the final expression by a static final field (for java) with a mangled name.
At compilation time, Kotlin must check the absence of any contextual variable (closures…) in the final expression.

endorh · September 3, 2022, 12:23am

While somewhat hacky, you can define your own final function by exploiting the fact that lambdas in Kotlin functions use the same instance on every invocation when they don’t have parameters:

fun interface Initializer<out T> {
    operator fun invoke(): T
}

private val cache = mutableMapOf<Initializer<*>, Any>()
@Suppress("UNCHECKED_CAST") // The cast is safe, since the cache is private
fun <T: Any> final(initializer: Initializer<T>): T =
    cache[initializer] as T? ?: initializer().also {
        cache[initializer] = it
    }

With this function, your checkedSplit function could be written as

fun checkedSplit(s: String): List<String>? =
    if (final { Regex(".*/.*") }.matches(s)) s.split(final { Regex("/") }) else null

However, I would advise not doing this, given how sketchy it is. Besides, this implementation still requires a map access, which is slower than a simple property access, in case you’re really serious about performance.
Also, as I’ve mentioned, it won’t work if your initializer requires capturing a variable/parameter from the scope, since closures use a different instance on each invocation.

You can play with this idea, but again, I don’t recommend it.

If your concern is simply caching regex (or XPath/CSS selector) objects, you could just do that directly:

private val regexCache = mutableMapOf<String, Regex>()
fun regex(@Language("RegExp") s: String) =
  regexCache[s] ?: Regex(s).also { regexCache[s] = it }

fun checkedSplit(s: String): List<String>? =
    if (regex(".*/.*").matches(s)) s.split(regex("/")) else null

Also, you might want to use a better cache implementation than a plain mutable map.

Regarding your idea, assuming such feature was in the language, I’d still consider it a bad practice, since by hiding what should be constant values within function code, you’re just limiting their reusability.

Then again, there are cases when it’s clear you won’t be reusing a value, or you’d prefer their definition to be in the context it’s used, such as the DSL you mention.
There’s a possibility you could redesign said DSL so the declarations of these XPath and CSS selectors are only evaluated once, but that heavily depends on what you’re doing with it, and might not be applicable.

sylvainspinelli · September 3, 2022, 9:31pm

Thank you very much for this detailed response.

I fully agree with you on all these points.

Your first proposal is really interesting, but you are right, it is too fragile, we must have a solution to check the absence of parameters.

For the last point: yes I’m in this scenario where there is no interest for reusability and I’m not happy with an unoptimized DSL (it can be executed intensively)

I wonder if it would be possible to define an @Final annotation and implement a compiler plugin for the new IR backend:

@Retention(AnnotationRetention.SOURCE)
@Target(AnnotationTarget.EXPRESSION)
annotation class Final

fun checkedSplit(s: String): List<String>? =
	if ((@Final Regex(".*/.*")).matches(s)) s.split(@Final Regex("/")) else null

The plugin catch this annotation with the associated expression, creates a private package val property initialized with the expression and replaces the annotation
and the expression by a reference to this package val property.

With this approach, if the expression is dynamic (parameters used), the compilation will fail.

Can you tell me if it would be feasible?

tlin47 · September 3, 2022, 10:28pm

tl;dr: Good idea, but please use a different keyword, e.g. reused or static.

I think the concept of function-scoped static/reused constants sounds interesting. However, I think “final” is not adequate as a keyword for it. Currently, a local constant “val x” is already “final”, but it is not what you want: the difference you want is the static reusability - the fact that the constant is only created once and then reused.

So, better keyword would be static. But static is also already defined for “class-related instead of object-related” which is similar but not the same, so it could be called funstatic or a completely new keyword would probably be the best: initOnce or once or reused.

fun checkedSplit(s: String): List<String>? {
	return if (reused (Regex(".*")).matches(s)) s.split(reused Regex ("/")) else null
}

fun checkedSplit(s: String): List<String>? {
	static val regexCheck = Regex("$[A-Z/]*^")
	static val regexSplit = Regex("/")
	return if (regexCheck.matches(s)) s.split(regexSplit) else null
}

The first version is shorter in total, but the second version simplifies the “complex” part. So, both versions can be considered better readable than the other one - depending on personal taste.

arocnies · September 4, 2022, 12:18am

Has anyone done benchmarking/profiling?
The JVM can optimize away short lived objects. There may be no or minimal performance gain on reusing the same object --you don’t know until you measure.

This is close to falling into the trap of pre optimizing. How impactful short lived objects are and how much is saved should be known before any language changes are made to Kotlin.

Maybe the solution can be made in the compiler or an alternative form of writing it is “good enough” to avoid adding the feature entirely (good to be lean). Upcoming changes such as namespaces, contexts, and others may also change things. Libraries or object pools should probably be thrown into the mix too.

So first things first, collecting data on the size and scope of the issue.

endorh · September 4, 2022, 10:58am

That sounds interesting. I think a compiler plugin should be able to perform the transformation you describe.
Unfortunately, I don’t have any first-hand experience with compiler plugins yet, so I can’t tell how complex it might be, but it seems like a fun project, if you’re willing to go that far.

Also, if you’re planning to do it, I’d recommend generating lazy properties, so they aren’t eagerly evaluated in cases where they won’t be used. Then again, I have no idea of how complex it’d be to generate a lazy property with a compiler plugin, and reference it from the function body.

As for the dynamic check, you’d probably have to create an IDE companion plugin that’d highlight as errors dynamic values used in @Final annotated expressions. This part is probably easier, as it’d only require you to write a custom inspection that checks all references within the expression belong to a global scope.

arocnies · September 4, 2022, 4:35pm

This usecase is the whole point of the object pool pattern.

Not only does it do exactly what OP wants, it’s pretty easy to implement (or use something off the shelf), unlike a compiler plugin.

sylvainspinelli · September 4, 2022, 8:10pm

I’m agree, final is confusing. once seems preferable to me. Thanks for your idea !

sylvainspinelli · September 4, 2022, 8:19pm

You’re right, I’ll do benchmarks soon. In any case before I start a plugin

For the object pool pattern, it is possible only if you have an identifier. It’s the case with a RegExp (it was an easily understandable example), but not in my context where the object built that need to be cached is a complex one without string representation.

arocnies · September 4, 2022, 10:21pm

A nice thing about doing an object pool or the other creational patterns (and other things like function memoization) is that you get to define your own constraints.

For example, maybe your objects required special handling before they can be reset you could have the pool provide access to the objects with a lambda, in order to force a scope to the caller and reset objects after.

But that’s just a guess at what one could do. The main point is that you’ve removed the responsibility (and complexity) of creating the objects away from the call site.

vach · September 5, 2022, 8:45pm

“But this is painful for a developer to extract all these Regex.”
why is it painful? it seems like a non issue to me

adding non standard final behaviour like this makes the language more “magical” its not evident for someone not very familiar with the language what that expression does and how to access that magically created static field…

Imo very bad idea, its not a problem.
If you are so serious about performance that you absolutely care about difference between property access vs map.get(“static string literal”) then you are going to do so many nasty tricks in your code that this will be the hands down the least of your problems. This is coming from someone doing high frequency trading software.

jacek.s.gajek · September 13, 2022, 4:51pm

Just a note about keyword… At least two major strongly-typed languages support that (C++ and Visual Basic) and both of them use static modifier.

But I agree with vach that adding new (especially not-obvious) features to the language is a bad idea. kotlin is complex enough as it is with DSL and coroutines but still is fairly easy to transform from Java (although going back is virtually impossible). Let’s not break this.

Topic		Replies	Views
String manipulation and regular expression	3	1579	February 19, 2013
How to cope with existing java code that uses a lot of lambdas and optionals Support	4	2787	February 19, 2018
Pattern matching in Kotlin	10	5415	October 26, 2016
Using Regex in a when Support	7	21325	December 8, 2022
Accepting input from the user	1	3263	December 23, 2016

Auto extract final objects from code

Related topics