Handling of Umlaute in JSONField vs CharField

I am having troubles writing straight UTF 8 encoded text into a JSONField.
on my two test databases (mariadb, using UTF8mb4) and sqlite with python 3.7.3 the following happens:

class Product(models.Model):
    data        = models.JSONField()
    store       = models.CharField(max_length=24)
    number   = models.PositiveIntegerField()

Testdata:

a = {"country": "österreich"}
Product.objects.create(store = "süden", number = 1, data = a)

This data arrives in my database as follows:

number: 124
store: süden
data: {"country": \u00f6sterreich"}

Why is the CharField handling the input correctly, while the JSONField is not? Reading from the JSONField works, but Q(data__icontains = “ö”) does not (and therefore the admin search also cannot search Umlaute in data).

The serialization / deserialization of the JSONField within Django is performed by the Python JSON module.

Within the Character Encodings section of the docs, you’ll see:

As permitted, though not required, by the RFC, this module’s serializer sets ensure_ascii=True by default, thus escaping the output so that the resulting strings only contain ASCII characters.

The JSONField allows you to specify an encoder and decoder class to perform the transformations. You could define a JSONEncoder with ensure_ascii=False to prevent that encoding of the unicode characters.

Thank you. I already thought about that idea of ensure_ascii=False but I wondered how to implement it.
I tried implementing it, but I found very little documentation on the encoding/decoding.

import json
class MyEncoder(json.JSONEncoder):
    def encode(self, o):
        return json.dumps(o, ensure_ascii = False)

class Product(models.Model):
    data        = models.JSONField(encoder = MyEncoder)
    store       = models.ForeignKey(Store, on_delete = models.CASCADE)
    number   = models.PositiveIntegerField()

Am I thinking too simple here? Please give me some input.

I don’t have any direct personal knowledge in this area. I’d certainly give what you’ve got here a try just to see what happens, but I’ve probably looked at this less than you have.

Looking at this a little more, you might be able to do this a couple different ways.
e.g.

class MyEncoder(json.JSONEncoder):
    def __init__(self, *, **kwargs):
        super().__init__(*, ensure_ascii=False, **kwargs)

I don’t think it’s going to be all that difficult.

Another approach would be to create your own field as a subclass of JSONField that overrides the get_prep_value method of the field class.

I was not able to get it to work with the Custom Encoder unfortunately. I would be very interested in how this could be done, as I have googled a lot and haven’t found any information of significance. Also I think the * in your example should be an o maybe? Or was that intentionally?

The hint with overwriting the get_prep_value was gold though! I will leave that here in case someone finds this via google:

import json
class CustomJSONField(models.JSONField):
    def get_prep_value(self, value):
        if value is None:
            return value
        return json.dumps(value, ensure_ascii = False)

and therefore the model changes to:

class Product(models.Model):
    data     = CustomJSONField()
    store    = models.CharField(max_length = 24)
    number   = models.PositiveIntegerField()

Bad edit on my part. I copied the original function signature and forgot to remove it.

/usr/lib/python3.9/json/encoder.py has the following definition for JSONEncoder:

    def __init__(self, *, skipkeys=False, ensure_ascii=True,
            check_circular=True, allow_nan=True, sort_keys=False,
            indent=None, separators=None, default=None):

A custom encoder then needs to pass the keyword args through, so that line should probably have been:
super().__init__(ensure_ascii=False, **kwargs)