Compound extension

chuckjaz · December 10, 2018, 11:19pm

In context of a receiver scope I can introduce an extension to another type, such as a unary plus operator, but there appears to be no way to extend a type to extend another type. For example, consider the following overly simple HTML DSL,

class Body { fun write(s: String) { ... } fun escape(s) { ... } }

fun Body.div(block: Body.() -> Unit) { write("<div>"); block(); write("</div>") }
fun Body.span(block: Body.() -> Unit) { write("<span>"); block(); write("</span>") }

to add a unary + I cannot use an extension like I did for div and span. I need to add it to Body like,

class Body {
  fun write(s: String) { ... }
  fun escape(s) { ... } 
  operator fun String.unaryPlus() = escape(this)
}

It would be nice to be able to write something like,

fun Body.String.unaryPlus() = escape(this)

darksnake · December 11, 2018, 6:24am

What you are talking about is called nested contexts and considered to be very important feature to be implemented by a part of community (myself in particular). Please upvote it here: https://youtrack.jetbrains.com/issue/KT-10468.

chuckjaz · December 11, 2018, 5:43pm

Is there a formal design proposal for this?

darksnake · December 11, 2018, 6:20pm

I believe that the issue is the best proposal there are. It would be good to discuss any alternative proposition here.

chuckjaz · December 11, 2018, 7:17pm

I started a formal proposal here KEEP/compound-extensions.md at compound-extension · chuckjaz/KEEP · GitHub.

It differs from the discussion in KT-10468 as it, (1) requires no syntax changes, (2) slightly less powerful as it doesn’t simultaneously extend to cartesian product of the types and (3) it doesn’t handle providing a default parameter.

darksnake · December 11, 2018, 7:33pm

The syntax is problematic since it confuses receiver names with normal namespace. The KT-10468 proposal solves this by producing the new syntax and it clearly separates namespace from receivers, also it explicitly shows the number of receivers and their generic parameters.I am not saying that it should be accepted, just that simple dot notation is not good. In my opinion, the feature is very important for language future and should be implemented with utmost care.

I don’t understand how to add comments to proposal, but I want to point out additional problem. We need for nested contexts to be available in function types. Something like (A, B).(C)->D.

r4zzz4k · December 11, 2018, 7:55pm

Hey, great to see some formal proposal on this topic!
I’d support @darksnake on the syntax point though. With the current proposal nested classes could cause ambiguity. E.g.:

class A {
    class B
}
class B
fun A.B.doSmth() = Unit

Would this be a function with two receivers or a function with one receiver? The syntax with parenthesis avoids this issue, as it differs fun A.B.doSmth() from fun (A, B).doSmth().

darksnake · December 11, 2018, 7:57pm

By the way, square brackets are probably better than round ones since they are hard to miss and syntactically are associated with arrays or lists.

r4zzz4k · December 11, 2018, 8:02pm

Am I missing something or you didn’t do pull request with the proposal on the main repo yet? If so, I don’t see reasons not to do this – I believe it should help with discoverability and hearing other thoughts

chuckjaz · December 11, 2018, 8:58pm

I didn’t do the pull request because it needs more work before I submit it. If you believe this would make this easier to discuss prior to my completing it, I will create the pull request.

As for the the ambiguity, I did address that in the proposal in the “lookup rules” section. It is ambiguous but the ambiguities are easily resolved.

Take the example,

class A {
  class B
}
class B

fun A.B.doSmth() = Unit

that extends the A.B class, not the B class. If you want to extend B instead of A.B you would need to introduce a type alias,

typealias GlobalB = B

fun A.GlobalB.doSmth() = Unit

As this type of ambiguity would be rare it doesn’t justify introducing new syntax as any language will have ambiguities that can only be resolved by context. It is context driven syntax that should be avoided which my proposal does.

As for function types, I need to address that in the proposal directly but he syntax for a function type that extends A and B is A.B.(C)->D. The (A, B).(C)->D syntax would require look-ahead to resolve the ambiguity of the ( to distinguish the syntactic forms.

darksnake · December 12, 2018, 7:05am

We had a discussion with @orangy yesterday about this feature and he expressed some concerns about the feature implementation. Here are some of my thoughts about it. Feel free to add them to the keep.

Problem: Basic syntax.
Solution: I do not think it is a blocker. I still think that [A,B].func is much more concise. I will use it for further examples.

Problem: Functional type syntax.
Solution: In proposed above syntax it will look like [A,B].(C) -> D which is concise and does not introduce ambiguities.

Problem: Context order. We can have two situations like with(A){with(B){...}} and with(B){with(A){...}}. I think it is a primary problem.
Solution 1: Make order matter, which will mean that [A, B].func and [B, A].func are two different functions. I thought about that and it seems like this solution will bring a lot of confusion, so we can reject it for now.
Solution 2: Forget about order. Assume that receivers are a set, not list. This means that introducing both [A, B].func and [B, A].func in the same path will rise compile time error. I think we should go for that solution.

Problem: Resolution rules. Suppose we have an ordered list of contexts from outer to inner in the function call site like G, A, B, C where G is a global context which corresponds to kotlin file or global function. Global context could be ignored since it does not affect the resolution (in fact, on JVM, global context makes sense since it could correspond to different classloaders). Now we need to establish rules for extension resolution (assuming that order of extensions is ignored)
Solution: The obvious solution to make all functions like [A, B, C], [A, B], [B, C] and [A, C] available inside C. Order does not matter, so contexts are resolved by type and injected in the function. We need to think about it and see if there are any drawbacks in that.

Problem: What contexts could be distinguished by type? Could we assume A<T1> to be different from A<T2>. Need some input on that.

Problem: What to do with situation like G, A, B, A with function like [A,B].
Solution 1: Take only last two members of context list. It could create ambiguities.
Solution 2: Throw compile time exception in this case and force user to use type aliases to differentiate. In my personal opinion it is better. The situation is rare and ambiguous as it is (you do not know which this you use).

Problem?: @orangy mentioned type inference. For now I do not see problem with type inference. For extension functions type is always declared explicitly. Functional types with receivers also require explicit types.

Problem: this reference. It requires further discussion. If we assume that several contexts of the same type are not allowed, this could be uniquely reconstructed by type, but there could be some problems with that.

To be continued…

darksnake · December 12, 2018, 8:44am

Some additional comments from @orangy (in my interpretation):

Class loader has nothing to do with language.

I agree. Just want to get some global perspective and bring some kind of theory under the issue.

Need to explicitly state what happens in case of clashing signatures like [A,B] and [B,A]

Well, I think that compiler error is explicit enough. Of course the details should come after that.

[A].f should work exactly the same way as A.f. This is not the case if [A,B,A] throws exception

It is a very good remark. It seems like compile time error is out of question, so we should always take latest context of given type as a receiver for function and give a warning if there is an ambiguity. I think that that warning is a good idea even without concern for the problem being discussed.

Inference could occur for example for some complicated generic receiver type.

We need to discuss specific examples. For now I do not see the principal problem. If we can infer a type for one parameter, we probably can do it for two. The problem could somehow occur in case one receiver type depends on another like [T, S<T>]. I do not have good enough understanding of type inference process to tell if it is easy or not. In any case, for experimental feature we could limit inference.

chuckjaz · December 12, 2018, 5:12pm

One ambiguity with the [A,B](C)->D syntax is that it implies (and explicitly stated by context order, solution 2) that the symbol introduced is extending both A and B simultaneously. That is given,

fun [A,B].doSomething() {...}

fun t() {
  val a: A = ...
  val b: B = ...

  with (a) {
    b.doSomething()
  }

  with (b) {
   a.doSomething()
  }
}

both are legal and resolve to the same method.

Consider the unary + operator in the original post, expressed as,

fun [Body, String].unaryPlus() = escape(this)

it is unclear which this is being referenced here.
It is unclear which type receives the unary + operator.

If order was to matter in the declaration (taking solution 1 over solution 2) then the order can be interpreted as nested extension, which would change the above to, informally, extend Body with the member fun String.unaryPlus(). If more than one was present, such as fun [A, B, C, D].doSmth() {}, it would mean, informally, extend A with fun [B, C, D].doSmth() {} which extends B with fun [C, D].doSmth(E) {} which extends C with fun D.doSmth() {}. The advantage to this is that it only affects the collection of extension methods. Once the final fun D.doSmth(E) {} is collected it is resolved exactly as if E had fun D.doSmth() {} had the extension declared in it.

This also gives a clear meaning to fun [A, B, A].doSomething() which is a idiomatic way to declare a contextual method of A. That is doSomething() is only present when A and B are part of the receiver scope. So for example,

class A { }
class B { }

fun [A, B, A].doSomething() {}

fun t() {
  var a = A()
  var b = B()
  a.doSomething() // Error, doSomething() not resolved
  with (b) {
    a.doSomething() // doSomething not resolved.
  }
  with (a) {
    with (b) {
      doSomething() // doSomething bind to the above declaration
    }
  }
  with (b) {
    with (a) {
      doSomething() // do Something binds to the above declaration
    }
  }
}

This interpretation means that fun [A].doSomething() is identical to fun A.doSomething(). It also answers the inference question in that it only changes how extensions are collected in scope, not how they affect inferencing. That is, once fun D..doSmth(E) {} is collected, current inferencing rules are sufficient.

darksnake · December 14, 2018, 11:05am

I am not sure I understand, why with(a){with(b){}} should match [A, B, A]. If we take ordered strategy, then we should always match the tail of actual context order to function receiver order. In your example the context order is G, A, B but signature is [A, B, A], it should not work. It could work in type-based resolution, but I do not think we should mix them.

After some thinking, I came to conclusion that order-based resolution is not that bad. Maybe it we did everything from scratch, It still would be much better to use type-based resolution, but order-based resolution better matches current design.

chuckjaz · December 14, 2018, 6:03pm

Consider the following,

class A {}
class B {}
class C {}

fun [A, B, C].doSomething()

fun t() {
  var a = A()
  var b = B()
  var c = C()
  with (a) {
    with (b) {
      c.doSomething()
    }
  }
}

from a resolution perspective it is like,

class A {}
class B {}
class C {}

fun t() {
  var a = A()
  var b = B()
  var c = C()
  with (a) {
   fun [B, C].doSomething()
    with (b) {
      c.doSomething()
    }
  }
}

which in turn is like,

class A {}
class B {}
class C {}

fun t() {
  var a = A()
  var b = B()
  var c = C()
  with (a) {
    with (b) {
      fun C.doSomething()
      c.doSomething()
    }
  }
}

that is, when a scope is opened as a receiver the extension is introduced in that scope. This is simulated above by introducing it as a local function.

Given this, the meaning of fun [A, B, A].doSomething() can be understood as,

class A {}
class B {}

fun [A, B, A].doSomething() {}

fun t() {
  var a = A()
  var b = B()
  with (a) {
    with (b) {
      a.doSomething()
    }
  }
}

which is like,

class A {}
class B {}

fun t() {
  var a = A()
  var b = B()
  with (a) {
    fun [B, A].doSomething() {}
    with (b) {
      a.doSomething()
    }
  }
}

which is like,

class A {}
class B {}

fun t() {
  var a = A()
  var b = B()
  with (a) {
    with (b) {
      fun A.doSomething() {}
      a.doSomething()
    }
  }
}

As an extension can be itself a member of a type, it makes sense to allow extending a type with an extension. The meaning of the extension and how multiple type extension work falls out by this recursive definition. In other words, it doesn’t change extensions it just allows types to be extended with an extension just like other members.

darksnake · December 14, 2018, 6:35pm

Your examples does not answer the question about the resolution strategy. I understand that calling a.doSomething() is equivalent to calling to calling doSomething(), but leaving it all up to compiler could make everything very complicated. Consider situation where B is a subclass of A, then the function [A, B].f will shadow [B].f. It is probably possible to manage this problem, but then we could not expect this feature anytime soon. As I already said, You are trying to mix type-based and order-based approach, which probably won’t end well.

What I had in mind is to count explicit lexical scopes only. Meaning that if we have a receiver chain of types we bind the method if we find a subchain (closest to the end), exactly matching our type list. Compile can do it already (it must somehow calculate it in order to work with extensions). The rules could be relaxed later to allow implicit type duplication without breaking existing code, but it would require a lot of work. For now, I think, we need to force user to explicitly write a.doSomething() in G, A, B scope and resolve method otherwise.

You have a point about that function definitions could have context of their own. Luckily for us it all still falls to the scheme. We just need to remember that function with signature [C,D] defined in context G, A, B (I leave letter G everywhere to remember where root context is), will in fact have signature [A, B, C, D].

ilya.gorbunov · December 14, 2018, 6:48pm

I think the resolution rules should strive to be consistent with the current strategy for multiple receiver functions:

interface A {
    fun B.foo() {
        println("dispatch receiver is ${this@A}, extension receiver is ${this}")
    }
}

interface B

data class C(val name: String) : A, B

fun main() {
    val c1 = C("C1")
    val c2 = C("C2")

    with(c1) { foo() }
    with(c1) { with(c2) { foo() } }
    with(c2) { with(c1) { foo() } }
}

darksnake · December 14, 2018, 7:19pm

This example is about this resolution, not about function dispatching, so I do not think that this will answer the dilemma.

This example shows that when we call this it returns the latest receiver and when we call this@A it calls the latest receiver matching type A. We can achieve it in both cases. When we resolve a method we will have a list of actual types it is called upon like G, A, (B: A) and if we call @this:A it could still return B because it works based on actual types, not dispatch types.

chuckjaz · December 14, 2018, 9:34pm

I think a successful proposal should not modify the current meaning of an extension or provide a different way to declare an identical behavior. This means, specifically,

fun [A].doSomething() {}

should be identical to,

fun A.doSomething() {}

I having difficulty establishing this identity given the description you have with a lexical scopes approach.

For example,

fun [B].doSomething() {}

with (b) {
  with (c) {
    doSomething()
  }
}

should resolve but it is unclear that in a lexical approach it as it seems you would need a fun [B, C].doSomething() declaration to resolve.

darksnake · December 15, 2018, 9:04am

My proposition works just as expected in your example. We match not the tail of scope sequence but the latest subsequence. Let me try to explain it again.

Consider we have a sequence like G, A1, B, C, A2. Now we have a few functions, which will be bound to this scope:

[A]. will be bound because we have A in the sequence. It is not the last, but it does not matter.
[A, B]. and [A, B, C]. will be bound for the same reason.
[C, A]. will be bound because we have a subsequence of C, A2.
[C, B]. won’t be bound because the order mismatch (if we stick to order-based approach).
[A, B, A]. also won’t bind because you do not have second A, you will have to explicitly say a.doSomething() for it to work.

So, as you see, all the classics work as you expect, the only confusion arises when you have some complicated cases with repeating types.

Topic		Replies	Views
"Extension types" for Kotlin Language Design	44	30558	September 2, 2020
Member extension resolution	9	1231	May 22, 2020
Declare member extensions outside the class Language Design	8	4802	February 24, 2018
More concise "Either"? Support	11	2242	October 15, 2022
Extension Function/Property with two receivers Language Design	8	6834	April 3, 2019

Compound extension

Related topics