What design principles contribute to the effectiveness of Kotlin's ASI?

I’m working on a summary of different proglangs’ approaches to automatic semicolon insertion (ASI).

Kotlin’s approach seems better than most, but I’ve been unable to find any documentation on precisely how it works. Does anyone know details of how it works, or the design rationale, or how the Kotlin designers crafted a grammar that allows for such effective ASI?

Any insights on how it works, and the design rationale behind that would be much appreciated.


AFAICT, the reference docs only mention it briefly as a style concern:

Note: In Kotlin, semicolons are optional, and therefore line breaks are significant.

Omit semicolons whenever possible.

Neither the grammar nor kotlin-spec (works in progress both) mention it AFAICT.
The grammar for semi suggest that it’s not purely a grammar issue.

semi : EOF ;

Note: the ; at the end is a grammar meta-character, nothing to do with Kotlin tokens; semi has no definition except EOF.

IIUC, for a script to be allowed to have more than one statement, there must be some special handling in the lexer|parser.


Prior discussion:

thanks,
mike

I did a bit of digging. AFAICT, the closest thing to ASI happens here:

// AbstractKotlinParsing.java

    private boolean tokenMatches(IElementType token, IElementType expectation) {
        if (token == expectation) return true;
        if (expectation == EOL_OR_SEMICOLON) {
            if (eof()) return true;
            if (token == SEMICOLON) return true;
            if (myBuilder.newlineBeforeCurrentToken()) return true;
        }
        return false;
    }

The .newlineBeforeCurrentToken() call bottoms out on a *Impl class which just looks for a non-comment token and checks whether it’s a whitespace token with a '\n' character in it.

So I think I can conclude that

  • Kotlin does not do ASI. The grammar refers to SEMI? in places where semicolons are optional but does not convert newline tokens to ‘;’ tokens nor manufacture such tokens.
  • The lexer instead defines a token class, EOL_OR_SEMICOLON, and parser maintainers use that in preference to SEMI where doing so leads to no ambiguity.

Eager Breaking is Nice

Kotlin’s compiler could be implemented in terms of ASI, but the main difference from JavaScript and Go is that Kotlin’s would have to eagerly insert semicolons instead of reluctantly.

This eagerness has a nice property; Kotlin does not suffer from concatenation problems. For example, adding a line of code does not change the meaning of previous lines or subsequent lines as in JavaScript syntactically:

let x = f
(complex.parenthesized||expression).g()

or lexically

f()
/without-previous-line-would-be-a-regex/i.test(str) && doSomething()

Remaining Problems

Since Kotlin breaks eagerly, developers who assume it inserts semicolons like JavaScript might be confused. I myself was bitten by a line break confusion bug in the first few thousand lines of Kotlin I authored:

val expectedTestOutput = "line 1\n"
  + "line 2\n"
  + "line 3\n"

Most ASI schemes favor interpretations of + as an infix operator over interpretations as a prefix operator.

Neither ktlint nor a stock detekt warn on

var a: Int = 0

fun f(i: Int): Int = when (i) {
    0 -> 1
    else -> {
        a = a
        + f(i - 1)
    }
}

though the intellij plugin does warn “variable a assigned to itself.”

Some widely used JavaScript style guides recommend breaking after infix operators, but there is still inertia from Sun’s Java style guide which said

When a line is broken at a non-assignment operator the break comes before the symbol.

Recommendations for ASI-veterans picking up Kotlin

Maybe it’d be worth a mention in docs for developers experienced with JavaScript or Go who are learning Kotlin:

“”"
Never start a line with an operators like + and - that can appear between two expressions.
The compiler will not error out if it’s also allowed before one expression.
“”"