Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems naïve: "A character is not an Unicode character but a single byte."

Disappointing from a developer who presumably types several ü characters each day.



http://nimrod-code.org/manual.html#character-type

In particular:

> The TRune type is used for Unicode characters, it can represent any Unicode character. TRune is declared in the unicode module.


Ok, lets see: UTF-16 strings encourage bugs with surrogates. Note that most C# and Java code is notoriously broken wrt those and yet I never hear anybody complain about it.

UTF-32 strings roughly take up 4x more memory than UTF-8 strings and yet hardly solve anything: The proper toUpper("ß") used to be "SS" in German (nowadays there is an upper cased 'ß'). Other languages have other rules; i18n is hard, get used to it.

IMO Nimrod's UTF-8 strings at least make the programming errors easier to spot instead of the "mostly working but fails for edge cases" style that UTF-16 or UTF-32 encourage.


The ü character is hex81 in extended ASCII and hexFC in Latin-1 (ISO8859-1). Both very common 1 byte fixed encodings.

Unicode support is provided in the standard library.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: