Collecting objects for export, not deletion

Hi all,

A project I work on needed a functionality of “export user and all dependent objects” (it was already needed a few years ago, but it’s also a relatively easy way to comply with the GDPR requirement to allow a user to get all the data you keep on them). So, instead of writing code listing specific models, which would then need to be maintained every time a model is added, we wrote something generic – used django.db.models.deletion.Collector to collect the objects and then dump them as JSON, et voila. We did need to use the Collector, a private API, directly, but we mostly just needed to use it.

But now we’ve encountered a problem with the approach – we started using on_delete=PROTECT on FKs, and that broke the collection; since the Collector assumes objects are collected for deletion, when it encounters objects related through such a FK, it cries foul.

This check in the collector – invoking and acting upon decisions of the on_delete parameter – is hardwired into its collect() method; so in order to make it do what I wanted, I had no choice but to override it with an almost-exact copy.

So, I would like to start a discussion about doing at least one of two things –

  • Defining the export functionality mentioned above important enough to be supported by a pubic API, or,
  • Making the current collector a little more modular, so that a user could collect objects not just for deletion

Opinions and ideas welcome.

Hmm, is Collector using any private APIs internally? I’m curious if we can make it slightly more modular to accomodate your use case, but I’m not sure if we want to make it totally modular and bring it out of the nice specific bubble it’s in and have to complicate it even more.

It uses the Rel objects on FKs. These are borderline – they are accessible via public Field objects retrieved using the public _meta API, but have been treated as private (e.g. moved from one side of the relation to the other, at some point during the 2.x period, without much fanfare).

I took another look at the Collector, and I note that with the addition of the RESTRICT on-delete behavior (in 3.1), it is now more complex and more closely coupled to deletion.

I don’t think it would make much sense to modularize the code in the name of other uses, unless we either declare the API public, or include the other uses in Django. That would be a muddy situation, and I doubt it could be maintained for long.

Yeah, it just doesn’t feel appropriate to me, honestly, as sad as that is for making your problem harder. It’s already complex enough and quite tied to deletion - not sure what it’d grow into.

Maybe the answer here is something external that just relies on the sort-of-public nature of Rels, and just has to keep an eye on what we do with them?