Ticket 18392 and MySQL utf8mb4

Django’s minimum required version of MySQL is 8.0.11. Since MySQL 8.0, the utf8mb4 charset has been recommended, instead of the old utf8/utf8mb3, which they have deprecated.

Currently, Django defaults to using the deprecated “utf8”/“utf8mb3” character set.

If a user wants to use a different character set, they can add the “charset” and “collation” options to their DATABASES configuration.

What do you think of changing Django defaults to “utf8mb4” for MySQL, with notes for the users that they can use the DATABASES options to stay on “utf8mb3” if they have a legacy database that isn’t fully ready for “utf8mb4”?

(See the ticket for past discussion.)

1 Like

Wow. Yes. I always assumed that it was mysql that was defaulting to “utf8”, not Django. I have "OPTIONS": {"charset": "utf8mb4"}, set in all of my database settings and I 100% agree this should be the default in Django, otherwise you get emoji issues. With “utf8” aka “utf8mb3” being deprecated I think it makes it even more clear.

As far as I know the old 191 index-length concern is no longer an issue in modern MySQL versions, which I think would have been the main reason for keeping “utf8” as the default, (besides backward compatibility).

I also think that at some point we’ll have to go forward and change the default.

One method would be to change the default in the next Django version and prominently document it.

Another possible option could be passing through an (accelerated?) deprecation step, by obtaining the default database charset and warn projects to set an explicit 'OPTIONS': {'charset': 'utf8'} if they want to keep using the legacy encoding, as the default will change in the next Django version.

Thanks. I think the index length could still be limited if someone switched to utf8mb4, but kept the old row format. I would tend to just list that in the documentation and suggest they stay on the old utf8mb3 if they can’t update the row format.

Thanks. My preference would be to change the default and prominently document it, but if we need to go through a deprecation step, we could.

If we do a deprecation step, I wonder if we could just mark the default-utf8mb3 as deprecated, without adding code to check the database that django is connecting to.

Maybe, but I’m not certain it’s possible to determine the database collation based on the connection setting only. A bit tricky (that’s why the ticket is currently rotting). Breaking the compatibility is really the simplest way, but is not very in line with the traditional compatibility policy of Django.

I guess since all a user has to do to stay on utf8mb3 is put the charset & collation in the databases options, I don’t think it’s too bad of an incompatibility. And, the change is in line with what MySQL is doing - MySQL has deprecated the current Django default, so currently Django is defaulting to a deprecated (and many would argue, a broken) character set.

And, the current default is already causing problems - users like @collinanderson who want real UTF8 currently have to know about the issue and take an extra step to support it.

@claudep Do you think a Steering Council decision would be required to change the default?

1 Like

Do you think a Steering Council decision would be required to change the default?

Not necessarily. The idea of a forum thread is to be able to collect several opinions and see if some clear path emerges. For example, I would love to get the opinion of @adamchainz, as the maintainer of django-mysql.

We saw how well that went with the storage changes where users lost their files because they didn’t update the configuration. On the plusside I don’t think that changing to utf8mb4 as a default would result in loosing data. But it can be really annoying if the default gets changed and a new table gets created with utf8mb4 and all of a sudden you can no longer join tables because the charsets don’t match etc…

All in all it would be great to change the default.

Any more thoughts on this issue? @adamchainz @nessita @sarahboyce others?