I’m guessing your on the JVM. This is a problem with the java String implementation. While it has no problems saving multi codepoint characters many of the utility functions don’t handle them correctly.
I’m not sure about how complicated it would be to provide an alternate implementation.
The problem is that the java String class represents characters as UTF-16 characters. This means any unicode character that is represented by more than 16 bits is saved as 2 separate Char
values. This fact is ignored by many of the functions within String, eg. String.lenght
does not return the number of unicode characters, it returns the number of 16bit characters within the String
, some emoji counting for 2 characters. If your string also contains invisible characters like the unicode characters responsible for emoji skin color String.lenght
might count a single displayed character as having a length of 4 or in some other combinations as even longer.
There was an iPhone bug with a similar problem in 2015 that lead to your phone crashing when you recieved a special arabic text:
I don’t know of any workaround. It probably requires a complete reimplementation of String
using a completely different memory system. Char
s aren’t the best for manipulating strings outside of plain text. Maybe not use emoji