This seems naïve: "A character is not an Unicode character but a single byte." D...

gnuvince · on May 21, 2013

http://nimrod-code.org/manual.html#character-type

In particular:

> The TRune type is used for Unicode characters, it can represent any Unicode character. TRune is declared in the unicode module.

Araq · on May 21, 2013

Ok, lets see: UTF-16 strings encourage bugs with surrogates. Note that most C# and Java code is notoriously broken wrt those and yet I never hear anybody complain about it.

UTF-32 strings roughly take up 4x more memory than UTF-8 strings and yet hardly solve anything: The proper toUpper("ß") used to be "SS" in German (nowadays there is an upper cased 'ß'). Other languages have other rules; i18n is hard, get used to it.

IMO Nimrod's UTF-8 strings at least make the programming errors easier to spot instead of the "mostly working but fails for edge cases" style that UTF-16 or UTF-32 encourage.

muyuu · on May 21, 2013

The ü character is hex81 in extended ASCII and hexFC in Latin-1 (ISO8859-1). Both very common 1 byte fixed encodings.

Unicode support is provided in the standard library.