String.padEnd(length: Int) and multi code point characters

DanielAsher · January 30, 2020, 5:36pm

I’ve noticed that String.padEnd() counts multi code point characters (e.g. some emoji) as multiple characters which leads to poor layout with fixed width fonts when padding.

Is this a known issue, or is there a workaround?

thanks,

Daniel

Wasabi375 · January 30, 2020, 11:52pm

I’m guessing your on the JVM. This is a problem with the java String implementation. While it has no problems saving multi codepoint characters many of the utility functions don’t handle them correctly.
I’m not sure about how complicated it would be to provide an alternate implementation.
The problem is that the java String class represents characters as UTF-16 characters. This means any unicode character that is represented by more than 16 bits is saved as 2 separate Char values. This fact is ignored by many of the functions within String, eg. String.lenght does not return the number of unicode characters, it returns the number of 16bit characters within the String, some emoji counting for 2 characters. If your string also contains invisible characters like the unicode characters responsible for emoji skin color String.lenght might count a single displayed character as having a length of 4 or in some other combinations as even longer.
There was an iPhone bug with a similar problem in 2015 that lead to your phone crashing when you recieved a special arabic text:

I don’t know of any workaround. It probably requires a complete reimplementation of String using a completely different memory system. Chars aren’t the best for manipulating strings outside of plain text. Maybe not use emoji

DanielAsher · February 1, 2020, 6:16am

well I’m new to the JVM and Android development, coming from iOS, and I now understand why Android Studio has such a hard time with emoji . By pushing all the complexity of non-plain text to clients, we as users, get a reading experience that is significantly worse than on native Mac/iOS apps. I understand that there is android.icu.text and com.ibm.icu.text but String is ubiquitous. Even if we adopt these libraries, unit testing on Android is hard as the JVM throws Unimplemented Stub! exceptions.

I honestly believe that kotlin has an opportunity to fix these poor user experiences. Perhaps an opt in compiler plugin that contains a correct default implementation of kotlin.lang.String and requires a conversion with .toJavaString() / .fromJavaString().

Would the community comment on whether an issue on GitHub - JetBrains/kotlin: The Kotlin Programming Language. would be worthwhile? Perhaps by 2022 we could have a uniform text experience in Android Studio and Android.

Wasabi375 · February 1, 2020, 1:07pm

The right place for issues is https://kotl.in/issue
That said this will probably require a KEEP: GitHub - Kotlin/KEEP: Kotlin Evolution and Enhancement Process

I’d be all for it, but this is something that will take a lot of work. My guess if we want something like this we would first need a KEEP as well as a prototype implementation of the String class. At that point it wouldn’t be to hard to implement a compiler plugin to incorporate literals or add this to the language directly.

Topic		Replies	Views
Should Kotlin support strings as Unicode sequences instead of UTF-16?	5	8246	October 5, 2020
Working with Unicode characters outside the BMP Support	1	1004	October 3, 2022
How to get String Codepoints in Multiplatform Multiplatform	3	1898	March 6, 2025
Kotlin unique characters not working Support	7	874	July 11, 2021
Suggestion: Primitive byte char handling Language Design	6	2081	June 26, 2019

String.padEnd(length: Int) and multi code point characters

Related topics