> Why they call this encoding "UTF-8" is beyond me, it most definitely is not UTF-8.
This is a confusion of concepts. That an encoding can encode a particular value does not mean all applications using that encoding will support that value. You can store any size number you like into a JSON number, but if you send me a 256-bit value in a field for temperature in celsius, I'm going to give you back an error.
Encodings ease data exchange, they do not alter the requirements or limitations of the applications using them. MySQL's original unicode implementation supported only the Basic Multilingual Plane, so unsurprisingly, characters outside that plane are rejected. That would be the case regardless of the particular encoding used.
What you want is support for all Unicode characters. This is a reasonable request, but it is decoupled from the encoding used to communicate those characters to MySQL.
Say what? UTF8 allows for any character that the Unicode standard defines. If they wanted to restrict a characterset to the BMP, then they should call it UTF-8-BMP.
When the purpose of your software is simply to record data, saying you support a particular encoding when you only support _some_ of it is pretty irresponsible.
> That an encoding can encode a particular value does not mean all applications using that encoding will support that value.
And yet you have to change the encoding of a column to support storage of non-BMP characters, even though the UTF-8 encoding supports them just fine. Where's the sense in that?
No, you have to change the character set of a column. It's unfortunate they used a name normally used for an encoding to name a character set, but that doesn't make the character set an encoding.
It seems to me that you're the one confusing concepts. MySQL originally didn't support characters outside the BMP, fine. But since the encoding is completely different from that, a real UTF-8 would have automatically supported astral characters as soon as MySQL supported them.
The fact that MySQL itself gained support for all of unicode, but did not use this support in the 'utf8' encoding, shows that it is not actually UTF-8.
but I expect an app that says "you can store utf8 data here" to actually store that, not "I will silently drop everything after the first utf8 character that doesn't fit in 3 bytes".
I don't see how I should imagine that putting utf8 as a column encoding should be interpreted as "I will read and discard utf8".
This is different from declaring utf8 as the connection encoding.
Your expectation is common among westerners. It typically goes away within a few days of beginning to work with large amounts of CJK content.
When someone says they accept UTF-8, UCS-2, or any other encoding, my first question is always "Are characters outside the Basic Multilingual Plane supported?". There is a long history quite apart from MySQL of the answer to that question being "no", particularly if the project in question originated in the 90s (or even early 2000s).
This is a confusion of concepts. That an encoding can encode a particular value does not mean all applications using that encoding will support that value. You can store any size number you like into a JSON number, but if you send me a 256-bit value in a field for temperature in celsius, I'm going to give you back an error.
Encodings ease data exchange, they do not alter the requirements or limitations of the applications using them. MySQL's original unicode implementation supported only the Basic Multilingual Plane, so unsurprisingly, characters outside that plane are rejected. That would be the case regardless of the particular encoding used.
What you want is support for all Unicode characters. This is a reasonable request, but it is decoupled from the encoding used to communicate those characters to MySQL.