Two related but different questions:
-
Is there a standard way to represent a Unicode character in Kotlin? The type
Char
does not represent a Unicode character, but rather a UTF-16 token, which can be either a Unicode character from the BMP (which is only a subset of all Unicode characters) or one part of a UTF-16 surrogate pair (which is not a Unicode character at all). -
Is there a way to query the Unicode properties (such as category) of arbitrary Unicode characters? The extension property
Char.category
fromkotlin.text
, for example, only works forChar
s, so it does not apply to characters outside the BMP.