What design principles contribute to the effectiveness of Kotlin's ASI?

mikesamuel · December 3, 2019, 7:28pm

I’m working on a summary of different proglangs’ approaches to automatic semicolon insertion (ASI).

Kotlin’s approach seems better than most, but I’ve been unable to find any documentation on precisely how it works. Does anyone know details of how it works, or the design rationale, or how the Kotlin designers crafted a grammar that allows for such effective ASI?

Any insights on how it works, and the design rationale behind that would be much appreciated.

AFAICT, the reference docs only mention it briefly as a style concern:

Note: In Kotlin, semicolons are optional, and therefore line breaks are significant.

Omit semicolons whenever possible.

Neither the grammar nor kotlin-spec (works in progress both) mention it AFAICT.
The grammar for semi suggest that it’s not purely a grammar issue.

semi : EOF ;

Note: the ; at the end is a grammar meta-character, nothing to do with Kotlin tokens; semi has no definition except EOF.

IIUC, for a script to be allowed to have more than one statement, there must be some special handling in the lexer|parser.

Prior discussion:

A relevant SO question sheds no light, but does attest to its effectiveness:

Searching all of my open-source Kotlin, and our internal rather large Kotlin projects, I find no semi-colons other than the cases above – and very very few in total.
I asked on r/kotlin: “How does Kotlin’s Automatic Semicolon Insertion work?”

thanks,
mike

mikesamuel · December 30, 2019, 9:18pm

I did a bit of digging. AFAICT, the closest thing to ASI happens here:

// AbstractKotlinParsing.java

    private boolean tokenMatches(IElementType token, IElementType expectation) {
        if (token == expectation) return true;
        if (expectation == EOL_OR_SEMICOLON) {
            if (eof()) return true;
            if (token == SEMICOLON) return true;
            if (myBuilder.newlineBeforeCurrentToken()) return true;
        }
        return false;
    }

The .newlineBeforeCurrentToken() call bottoms out on a *Impl class which just looks for a non-comment token and checks whether it’s a whitespace token with a '\n' character in it.

So I think I can conclude that

Kotlin does not do ASI. The grammar refers to SEMI? in places where semicolons are optional but does not convert newline tokens to ‘;’ tokens nor manufacture such tokens.
The lexer instead defines a token class, EOL_OR_SEMICOLON, and parser maintainers use that in preference to SEMI where doing so leads to no ambiguity.

Eager Breaking is Nice

Kotlin’s compiler could be implemented in terms of ASI, but the main difference from JavaScript and Go is that Kotlin’s would have to eagerly insert semicolons instead of reluctantly.

This eagerness has a nice property; Kotlin does not suffer from concatenation problems. For example, adding a line of code does not change the meaning of previous lines or subsequent lines as in JavaScript syntactically:

let x = f
(complex.parenthesized||expression).g()

or lexically

f()
/without-previous-line-would-be-a-regex/i.test(str) && doSomething()

Remaining Problems

Since Kotlin breaks eagerly, developers who assume it inserts semicolons like JavaScript might be confused. I myself was bitten by a line break confusion bug in the first few thousand lines of Kotlin I authored:

val expectedTestOutput = "line 1\n"
  + "line 2\n"
  + "line 3\n"

Most ASI schemes favor interpretations of + as an infix operator over interpretations as a prefix operator.

Neither ktlint nor a stock detekt warn on

var a: Int = 0

fun f(i: Int): Int = when (i) {
    0 -> 1
    else -> {
        a = a
        + f(i - 1)
    }
}

though the intellij plugin does warn “variable a assigned to itself.”

Some widely used JavaScript style guides recommend breaking after infix operators, but there is still inertia from Sun’s Java style guide which said

When a line is broken at a non-assignment operator the break comes before the symbol.

Recommendations for ASI-veterans picking up Kotlin

Maybe it’d be worth a mention in docs for developers experienced with JavaScript or Go who are learning Kotlin:

“”"
Never start a line with an operators like + and - that can appear between two expressions.
The compiler will not error out if it’s also allowed before one expression.
“”"

Topic		Replies	Views
Kotlin "Features", Compiler Lookahead, and Source Code Formatting Language Design	13	1455	January 5, 2023
There is no total freedom for spacing in Kotlin Support	7	2402	July 31, 2019
Lambda syntax is white space sensitive? Language Design	4	2453	November 9, 2017
Why does Kotlin use braces instead of whitespace?	3	3153	September 28, 2022
SAM Style Curly-Brace Quirk Language Design	3	2580	March 22, 2017

What design principles contribute to the effectiveness of Kotlin's ASI?

Eager Breaking is Nice

Remaining Problems

Recommendations for ASI-veterans picking up Kotlin

Related topics