More characters allowed for identifiers than grammar specifies. What is supported?

According to the Kotlin grammar, a simple name can only be a Java identifier (escaped or non-escaped). But I can use identifiers with most of the ASCII characters:

// ASCII characters that are not allowed: . / : ; < > [ \ ] `
private fun ` !"#$%^&()*+,-=?@^_{|}~`() {
}

What is the position on this?

  • This is a bug, and must be fixed.
  • The grammar page is incorrect.
  • For now we want people to only use Java identifiers, so it is not officially supported. But when we are sure that using other characters does not introduce issues, we will state that more characters are allowed to be used for identifiers.
  • …
1 Like

Very interesting indeed. Btw. there are actually two lists of valid identifier characters. One is those of the Java language, the other one is the (bigger) list of valid identifiers for the JVM. However, when the word Java identifier is used, one would assume an identifier according to the language. The characters you include are valid identifier characters for the JVM. (There are a few that aren’t like “.” and “/”)

Aha, I see. If anyone is interested, see the class file format documentation.

But this is JVM-specific. I am interested in what Kotlin allows now, i.e. what will definitely work in future language versions?

It is not an important feature for me. But if all the characters in the example are officially supported, the readability of test methods could be improved a lot IMO. For example:

@Test fun failsIfNullPassedForLogger() { ... }
@Test fun `fails if "null" passed for logger`() { ... }

I already tested it, and both the build output and the test report support this.

It breaks when you have a lambda in a function with characters in the name that are not allowed in Windows filenames:

Error:Kotlin: [Internal Error] java.lang.IllegalStateException:
        java.io.FileNotFoundException:
        [...]\Class$function "quotes" not allowed in Windows filename$1.class [...]

So the answer seems to be: not supported.

Too bad, I was starting to like it in the experiments I am doing.

Greek alphabet letters can be used as variable names. Sometimes it is neat writing α than alpha. Still don’t know if this is allowed in the language specification.

Looking at the compiler source, the lexer seems to be using the following productions

// TODO: prohibit '$' in identifiers?
LETTER = [:letter:]|_
IDENTIFIER_PART=[:digit:]|{LETTER}
PLAIN_IDENTIFIER={LETTER} {IDENTIFIER_PART}*
// TODO: this one MUST allow everything accepted by the runtime
// TODO: Replace backticks by one backslash in the begining
ESCAPED_IDENTIFIER = `[^`\n]+`
IDENTIFIER = {PLAIN_IDENTIFIER}|{ESCAPED_IDENTIFIER}
FIELD_IDENTIFIER = \${IDENTIFIER}

It appears highly probable that the letter is a regex that is defined according to Character.isLetter. This is basically whatever Unicode says is a letter. And Unicode is an international standard that aims not to favour specific countries. α is a perfectly valid Greek letter. The escaped identifier is even more free (and will not guarantee valid class files).

1 Like

Special characters may be used in Kotlin if escaping with back ticks $!#... as confirmed in this StackOverflow post I created.

That is what I did. But although the editor and the compiler do not complain, it stops working when the compiler tries to write the class file.

For me that means that not all characters in back ticks are supported; you can expect problems depending on your operating system, the Kotlin back end, and the characters you use.

I would like to know what is actually supported, so I can write code knowing it will not break when someone else tries to compile it.

The lexical grammar is specified here. It allows symbols in Unicode categories Lu, Ll, Lt, Lm, Lo, Nl and Nd.

So it is an editor/compiler bug: They allow characters in back ticks which the grammar forbids.

The problem is that the different backends don’t all allow all characters as symbol names. For example the DEX format in Android is more restrictive than the JVM (which has some implementation specific restrictions). I’m not sure what happens with unicode characters in native symbols. Javascript also has it’s own restrictions.

Of course each back end has its own limitations, but Kotlin as a language has its own features. These features have to be supported, and my interpretation of support is that the tooling transforms the richer set of features of Kotlin to the limitations of the chosen back end. Either that, or the tooling must output errors indicating that some of the Kotlin features cannot be used on the chosen back end.

I really don’t mind being limited because of a particular back end, but I do want to know as soon as possible that what I am trying to do is not going the work.

Thanks for your interest!

We recently upgraded the grammar (including the rule for identifiers), as well as in the language reference on the grammar web-page added additional information about the allowed symbols in identifiers.

The short answer: the set of allowed symbols in identifiers depends on the target and on the declaration publicity (the grammar rule contains the union of all sets of allowed symbols so that the code for any target can be parsed using the grammar).
See the grammar web-page for details (this page contains rules for all targets).

3 Likes