> *Why they call this encoding "UTF-8" is beyond me, it most definitely is not U...

chris_wot · on Feb 28, 2014

Say what? UTF8 allows for any character that the Unicode standard defines. If they wanted to restrict a characterset to the BMP, then they should call it UTF-8-BMP.

qnaal · on Feb 28, 2014

When the purpose of your software is simply to record data, saying you support a particular encoding when you only support _some_ of it is pretty irresponsible.

perlgeek · on Feb 28, 2014

> That an encoding can encode a particular value does not mean all applications using that encoding will support that value.

And yet you have to change the encoding of a column to support storage of non-BMP characters, even though the UTF-8 encoding supports them just fine. Where's the sense in that?

nknighthb · on Feb 28, 2014

No, you have to change the character set of a column. It's unfortunate they used a name normally used for an encoding to name a character set, but that doesn't make the character set an encoding.

Dylan16807 · on Feb 28, 2014

It seems to me that you're the one confusing concepts. MySQL originally didn't support characters outside the BMP, fine. But since the encoding is completely different from that, a real UTF-8 would have automatically supported astral characters as soon as MySQL supported them.

The fact that MySQL itself gained support for all of unicode, but did not use this support in the 'utf8' encoding, shows that it is not actually UTF-8.

riffraff · on Feb 28, 2014

but I expect an app that says "you can store utf8 data here" to actually store that, not "I will silently drop everything after the first utf8 character that doesn't fit in 3 bytes".

I don't see how I should imagine that putting utf8 as a column encoding should be interpreted as "I will read and discard utf8".

This is different from declaring utf8 as the connection encoding.

nknighthb · on Feb 28, 2014

Your expectation is common among westerners. It typically goes away within a few days of beginning to work with large amounts of CJK content.

When someone says they accept UTF-8, UCS-2, or any other encoding, my first question is always "Are characters outside the Basic Multilingual Plane supported?". There is a long history quite apart from MySQL of the answer to that question being "no", particularly if the project in question originated in the 90s (or even early 2000s).