Correct way to handle "orphaned" image files and other files?

Hello,

my project is quite heavy on use-uploaded images and files in general. I am using the StdImageField and FileField.

What currently the best/correct way to handle these files? I would ideally delete them since they are no longer needed and storage isn’t infinite.

I have found a couple of packages (like https://github.com/un1t/django-cleanup) but I am bit hesitant using something like this which automatically deletes files.

Other approaches seem to be delete the file when the object is deleted. Something like this:

image_path = image.image.path

image.delete()

if os.path.exists(image_path):
    os.remove(image_path)

With this there is the issue that the image variants created by StdImageField will remain on the disk. On the other hand writing logic to similarly delete two additional files doesn’t seem too problematic.

Are there any hidden pitfalls with the approach above? My models that contain the images are designed just for create and delete, existing model is never updated with another image file.

PS: I also got a report that someone uploaded new image with same filename as old one and the app was showing the old image instead of new. Is this possible? In my testing Django appends random chars to the end of the filename when there is a clash.

Many thanks

There are no hidden issues that I’m aware of in that situation.

If I remember correctly, the reason why Django doesn’t perform that sort of cleanup is that it can’t be sure that there aren’t other models referring to the same file, or even other systems or environments potentially using those files.

Couple different possibilities come to mind:

  • Cacheing of the image in the browser
  • Cacheing of the html with the old link
  • Internal Django cacheing of the links
  • The wrong url being exposed in the template

Would definitely need more details to identify a specific cause.

This can happen?

Like a sort of optimization to not duplicate the same files?

I guess unless I setup things this way, it should happen automatically?

Accidentally, no. Intentionally, yes.

I could see that being one possible cause, yes.

I believe you meant to say “shouldn’t” here instead of “should”? If so, then yes, I agree with you completely.

For example, imagine two completely different Django installations. One, customer facing, allowing users to upload files to some shared storage medium. (Perhaps an AWS S3 bucket, but the specifics do not matter for this example.)
Now imagine a second system, providing a “file-manager”-style view of that shared storage, where periodically the system scans the storage to find the files located in it - building FileField objects from its discovery process.
Allowing either of those systems to delete files is potentially problematic.
(Not that I will ever admit to having seen anything like this)

I’m not saying it’s a good thing to do, or that I would recommend it under most circumstances.

However, Django (among many other web frameworks) will let you make unwise choices.

1 Like

Thanks for the detailed answers Ken!

I will implement this manual deletion and see how it goes.