Handling of Umlaute in JSONField vs CharField

xtlc · April 6, 2022, 1:23pm

I am having troubles writing straight UTF 8 encoded text into a JSONField.
on my two test databases (mariadb, using UTF8mb4) and sqlite with python 3.7.3 the following happens:

class Product(models.Model):
    data        = models.JSONField()
    store       = models.CharField(max_length=24)
    number   = models.PositiveIntegerField()

Testdata:

a = {"country": "österreich"}
Product.objects.create(store = "süden", number = 1, data = a)

This data arrives in my database as follows:

number: 124
store: süden
data: {"country": \u00f6sterreich"}

Why is the CharField handling the input correctly, while the JSONField is not? Reading from the JSONField works, but Q(data__icontains = “ö”) does not (and therefore the admin search also cannot search Umlaute in data).

KenWhitesell · April 6, 2022, 3:12pm

The serialization / deserialization of the JSONField within Django is performed by the Python JSON module.

Within the Character Encodings section of the docs, you’ll see:

As permitted, though not required, by the RFC, this module’s serializer sets ensure_ascii=True by default, thus escaping the output so that the resulting strings only contain ASCII characters.

The JSONField allows you to specify an encoder and decoder class to perform the transformations. You could define a JSONEncoder with ensure_ascii=False to prevent that encoding of the unicode characters.

xtlc · April 6, 2022, 10:02pm

Thank you. I already thought about that idea of ensure_ascii=False but I wondered how to implement it.
I tried implementing it, but I found very little documentation on the encoding/decoding.

import json
class MyEncoder(json.JSONEncoder):
    def encode(self, o):
        return json.dumps(o, ensure_ascii = False)

class Product(models.Model):
    data        = models.JSONField(encoder = MyEncoder)
    store       = models.ForeignKey(Store, on_delete = models.CASCADE)
    number   = models.PositiveIntegerField()

Am I thinking too simple here? Please give me some input.

KenWhitesell · April 6, 2022, 11:17pm

I don’t have any direct personal knowledge in this area. I’d certainly give what you’ve got here a try just to see what happens, but I’ve probably looked at this less than you have.

Looking at this a little more, you might be able to do this a couple different ways.
e.g.

class MyEncoder(json.JSONEncoder):
    def __init__(self, *, **kwargs):
        super().__init__(*, ensure_ascii=False, **kwargs)

I don’t think it’s going to be all that difficult.

Another approach would be to create your own field as a subclass of JSONField that overrides the get_prep_value method of the field class.

xtlc · April 7, 2022, 2:21pm

I was not able to get it to work with the Custom Encoder unfortunately. I would be very interested in how this could be done, as I have googled a lot and haven’t found any information of significance. Also I think the * in your example should be an o maybe? Or was that intentionally?

The hint with overwriting the get_prep_value was gold though! I will leave that here in case someone finds this via google:

import json
class CustomJSONField(models.JSONField):
    def get_prep_value(self, value):
        if value is None:
            return value
        return json.dumps(value, ensure_ascii = False)

and therefore the model changes to:

class Product(models.Model):
    data     = CustomJSONField()
    store    = models.CharField(max_length = 24)
    number   = models.PositiveIntegerField()

KenWhitesell · April 7, 2022, 4:21pm

Bad edit on my part. I copied the original function signature and forgot to remove it.

/usr/lib/python3.9/json/encoder.py has the following definition for JSONEncoder:

    def __init__(self, *, skipkeys=False, ensure_ascii=True,
            check_circular=True, allow_nan=True, sort_keys=False,
            indent=None, separators=None, default=None):

A custom encoder then needs to pass the keyword args through, so that line should probably have been:
super().__init__(ensure_ascii=False, **kwargs)

Topic		Replies	Views
Handling German Umlaute when writing to JSONField to database Forms & APIs	1	269	July 22, 2023
Django 3.1 JSONField and raw SQL JSON decoding Using Django	4	2427	May 3, 2021
JSONField in PostgreSQL varchar limit 256 Using Django	4	1043	January 4, 2021
Why JSONField doesn't use Django's own JSON encoder? Getting Started	1	3130	March 30, 2022
How decoding of json fields by DBMS tools? Using the ORM	3	574	December 21, 2022

Handling of Umlaute in JSONField vs CharField

Related Topics