Hello, I have a situation where using a models.JSONField
seems like the best solution for my use case. I would like to enforce a schema or some other structure on the JSON, because I know it for now and would like to make the fields contents more clear. It would be best for it to be enforced every time data is written on the column in the database, and not to depend on endpoints to validate the structure.
My initial thought was to put validation logic in the model.clean
function and then call it in the model.save
. However I realized that save
isn’t called when doing update
or bulk_update
, and so on. However something that is done on every write is encoding the dict. I could put the validation logic in a custom encoder, and pass the encoder as an argument to the JSONField. However, encoding is ran after clean
, so I’m wondering if it is an anti-pattern.
I am thinking of creating a Django REST Framework serializer class for the JSONField only. I could use its validation methods to pass it a dict and it would format it correctly. Not sure if this is the best way to enforce the structure, I am open to suggestions.
What do you think?
bulk create request database directly.
if you want clean method before save, you use modelform or do full_clean method.
# bulk create example
a = [model list]
for i in a: a.full_clean()
bulk_create(*a)
Sorry, that has nothing to do with what I asked. Maybe you answered in the wrong thread?
Isn’t the question due to the clean method not working before bulk creation?
Welcome @nintskari !
That is a really interesting idea! (You could also create subclasses of the JSONField with your custom behavior “baked in”, which might “look” cleaner. I’m not saying it’s any better, it’s just an idea.)
It could be an anti-pattern, but, it may also be one of those things where you don’t really have another great option.
The only other solution that I could think of where you would be assured of having the structure tested would be the creation of a trigger in the database to perform the validation before being saved. (Think of it as a manually-created database constraint. Not saying it is that, but it’s a way to think about it. Again, just tossing out ideas.)
Great to hear some validating thoughts from you @KenWhitesell. I tried searching for an answer to this problem but could not find a good one. I tried the encoder approach out and it seems to work, but I actually might end up not using the JSONField after all. So I guess this is just food for thought ![:stuck_out_tongue: :stuck_out_tongue:](https://emoji.discourse-cdn.com/twitter/stuck_out_tongue.png?v=12)
You’re right, I considered utilizing the schema constraint feature in the database. It would be a lot faster than running the checks in python, but it would also be more challenging to make reliably. It would probably require at least these for good usability:
- Passing the json schema to a migration file from the field and generating the check in SQL, in a database specific way
- Logic to detect changes if the schema is changed so a migration would be required
- Validation to the schema itself
1 Like
Well, not really. I understand that clean is not ran for some operations and that it is intentional. I was looking for a generic way to catch all database write operations that create or update an object, and manipulate the data for JSON field specifically. But I think I was going a bit against the framework and will try to handle the situation without JSON field. Thanks
I’ve been through a similar situation, and depending on what command you use, there are methods that are not used.
The simplest way is to just add verification logic before saving.
1 Like
Slightly late to the party here, but I was googling to see if anyone had made a library for this and came across this thread. My thoughts:
- Django doesn’t call
Model.save
when doing bulk_create
, but I’m pretty sure it does call Field.pre_save
. So a custom subclass of JSONField with a customised pre_save
could do it.
- As we know, Django doesn’t run model validation for you if you just save something. And (especially with more API-based stuff, rather than form-based stuff these days) I see a lot of code that saves without running model validation. So I think that running a pre-save check in the model field would be a good way to enforce the schema.
- Similar to how you get an
IntegrityError
if you try to save None into a non-nullable field (even if you don’t run validation), I think that IntegrityError
is probably the right error to throw.
- My instinct is to use
dataclasses.dataclass
for this, so you can do my_field = StructuredJSONField(structure=MyDataClass)
. This would mean that you could pass an instance of MyDataClass
as the field value, and similarly get back an instance of MyDataClass
when accessing the field value. So JSON would just be the underlying storage mechanism.
- But I’m also wondering whether using JSON Schema might be good to allow for additional validation, such as min/max lengths of lists, or min/max values of its/floats, etc. That said, maybe with enough type declarations you could do this validation with data classes
.
I’m probably going to have a crack at this next week, and will open source it. Any thoughts/suggestions welcome.
1 Like
Django doesn’t call Model.save
when doing bulk_create
, but I’m pretty sure it does call Field.pre_save
.
According to the docs, the pre_save
and post_save
signals are not sent on bulk_create
.
We’re not referencing the pre_save
signal here - there’s a field method also named pre_save
to prepare the field before the save is called. This method is totally unrelated to the signal by the same name.
1 Like