More characters allowed for identifiers than grammar specifies. What is supported?


#1

According to the Kotlin grammar, a simple name can only be a Java identifier (escaped or non-escaped). But I can use identifiers with most of the ASCII characters:

// ASCII characters that are not allowed: . / : ; < > [ \ ] `
private fun ` !"#$%^&()*+,-=?@^_{|}~`() {
}

What is the position on this?

  • This is a bug, and must be fixed.
  • The grammar page is incorrect.
  • For now we want people to only use Java identifiers, so it is not officially supported. But when we are sure that using other characters does not introduce issues, we will state that more characters are allowed to be used for identifiers.

#2

Very interesting indeed. Btw. there are actually two lists of valid identifier characters. One is those of the Java language, the other one is the (bigger) list of valid identifiers for the JVM. However, when the word Java identifier is used, one would assume an identifier according to the language. The characters you include are valid identifier characters for the JVM. (There are a few that aren’t like “.” and “/”)


#3

Aha, I see. If anyone is interested, see the class file format documentation.

But this is JVM-specific. I am interested in what Kotlin allows now, i.e. what will definitely work in future language versions?

It is not an important feature for me. But if all the characters in the example are officially supported, the readability of test methods could be improved a lot IMO. For example:

@Test fun failsIfNullPassedForLogger() { ... }
@Test fun `fails if "null" passed for logger`() { ... }

I already tested it, and both the build output and the test report support this.


#4

It breaks when you have a lambda in a function with characters in the name that are not allowed in Windows filenames:

Error:Kotlin: [Internal Error] java.lang.IllegalStateException:
        java.io.FileNotFoundException:
        [...]\Class$function "quotes" not allowed in Windows filename$1.class [...]

So the answer seems to be: not supported.

Too bad, I was starting to like it in the experiments I am doing.


#5

Greek alphabet letters can be used as variable names. Sometimes it is neat writing α than alpha. Still don’t know if this is allowed in the language specification.


#6

Looking at the compiler source, the lexer seems to be using the following productions

// TODO: prohibit '$' in identifiers?
LETTER = [:letter:]|_
IDENTIFIER_PART=[:digit:]|{LETTER}
PLAIN_IDENTIFIER={LETTER} {IDENTIFIER_PART}*
// TODO: this one MUST allow everything accepted by the runtime
// TODO: Replace backticks by one backslash in the begining
ESCAPED_IDENTIFIER = `[^`\n]+`
IDENTIFIER = {PLAIN_IDENTIFIER}|{ESCAPED_IDENTIFIER}
FIELD_IDENTIFIER = \${IDENTIFIER}

It appears highly probable that the letter is a regex that is defined according to Character.isLetter. This is basically whatever Unicode says is a letter. And Unicode is an international standard that aims not to favour specific countries. α is a perfectly valid Greek letter. The escaped identifier is even more free (and will not guarantee valid class files).