FileField upload files creating many duplicates files in the media directory

I have a form that accepts a file upload for a file. Its been developed as a PWA application that works offline or online.
My model uses a FileField to store the URL fo the file and Media folder to store.
Everything works great without issues. I can save the file and also view it.
I also understand that if there is an existing file with the same name, django will somehow create a new file with ‘_somestuff.jpg’ and save the new file name to the FileField.

Upon closer checking, my media folder seems to have the same image duplicated multiple times for example in the media folder i find the following:
ABC_n234sdf.jpg
ABC_s234kp4.jpg
ABC_3jsj23424.jpg

The FileField will contain the latest filename ‘ABC_3jsj23424.jpg’ but when you open each of the files it is exactly the same. And in the database there is no other reference to those other files.

My question:
Is there a reason why this happens and how can we avoid getting this multiple copies?

Is this happening with every file being uploaded, or just some?

Do your logs show whether or not they’re coming from the same person uploading them? (Are they coming from posts that are happening close together? How close in time are the timestamps of the files being written?)

It would be helpful if you shared the view where these files are uploaded and saved.

But I think the first step would be to try and determine if it’s your code doing this or some user activity, like multiple clicks on a submit button.

Hi Ken,

This is not consistent. There are some occasion it happens and ohters not.
One user sends one or more images to be saved at a time. The files are duplicated within seconds or minutes between each other. I could take a snapshot as my client had deleted the duplicates. Next time it happens i will have to get it from him.

This is saving scenario:
Online:

  1. The browser is online when the form is submitted.
  2. Submit form with image.
  3. View saves the form.

Offline.

    • Users capture data and images offline - stored in IndexedDB.
    • When browser detects online, in the background it sends the form data to the server.Javascript code check online and send the data over.
    • Same view saves the form data.

Snippet of the saving part
try:
inspectDetailObj = InspectionDetails.objects.get(

            master_id_id=master_id, category_id_id=request.POST['category_id'],
            item_id_id=request.POST['item_id'])
        inspectDetailObj.item_value = request.POST['item_value']
        if bool(request.FILES.get('item_image', False)):
            inspectDetailObj.item_image = request.FILES['item_image']
        inspectDetailObj.save()
    except InspectionDetails.DoesNotExist:
        inspectDetailObj.master_id_id = master_id
        inspectDetailObj.category_id_id = request.POST['category_id']
        inspectDetailObj.item_id_id = request.POST['item_id']
        inspectDetailObj.item_value = request.POST['item_value']
        if bool(request.FILES.get('item_image', False)):
            inspectDetailObj.item_image = request.FILES['item_image']
        inspectDetailObj.save()`

The javascript is also quite simple.
theres a few lines of code to check if navigator.online state.

req = getObjectStore(DB_STORE_NAME).openCursor();
req.onsuccess = function (e) {
  var cursor = e.target.result;

  if (cursor) {
    //console.log("cursor:", cursor.value)
    savedRequests.push(cursor.value);
    cursor.continue()
  } else {
    for (let savedRequest of savedRequests) {
      console.log('saved request', savedRequest)
      var requestUrl = '/add/';
      var method = 'POST';
      keys = savedRequest.payload.keys;
      console.log(savedRequest.payload.item_id)
      var dataval = savedRequest.payload
      console.log(dataval);

      //dataval = JSON.parse(dataval);
      var form_data = new FormData();
      form_data.append("category_id", dataval.category_id);
      form_data.append("item_id", dataval.item_id);
      form_data.append("site_id", dataval.site_id);
      form_data.append("item_value", dataval.item_value);
      form_data.append("master_id", dataval.master_id);
      //form_data.append("csrfmiddlewaretoken", '{{ csrf_token }}');
      if (dataval.item_image) {
        form_data.append("item_image", dataval.item_image, dataval.item_image.name);
      }
      form_data.append("dateadd", dataval.dateadd)
      var payload = JSON.stringify(savedRequest.payload);
      var file = savedRequest.file
      var headers = {
        'Accept': 'application/json',
        'Content-Type': 'application/json'
      }
      fetch(requestUrl, {
        //headers: headers,
        method: method,
        body: form_data
      }).then(function (response) {
        console.log('server resopnse:', response);
        if (response.status < 400) {
          getObjectStore(DB_STORE_NAME, 'readwrite').delete(savedRequest.sitekey)
        }
      }).catch(function (error) {
        console.log('Send to Server failed:', error)
        throw error
      })
    }

So what you want to look for are multiple POST requests being generated for the same URL in close proximity to each other in your server logs. This sounds like it might be a case where, for whatever reason yet to be determined, the file is being submitted multiple times.
If you find the multiple POSTs roughly corresponding to the duplicate files, then what you’ve got is a data submission issue on the client side and not a Django issue on the server side.

Dear Ken,
Thats exactly what I am not finding. I have logged every post and i find only a single post for those links which have images. Hence the confusing dilema that i am facing.

It’s a puzzler alright.

If I had to diagnose this, I’d be looking to set up some sort of test environment where I could try submitting files locally to see if I could recreate these symptoms. I might try some combination of either curl or selenium to try and make it create duplicate files.

Given that you’re only seeing one POST being issued, I’m being led to believe that it can’t be a client-side issue. However, since this isn’t happening every time, it doesn’t sound like it can be a server-side issue either. At that point, my only idea is to trace what’s happening every way possible to see if you can capture these extra files being written close to the time that they are written.

Does the server side logic that creates the InspectionDetails run in a loop of some sort to fetch master_id? It looks like master_id is fetched differently than the other fields.

basically i check if there is a record exist if not will create and save for that masterid. so at the backend this is the only code that is used to save the file.

Come to think of this mainly happens when this is done outdoors when there is a possible slow network connection. I have tried to do this on a LAN and never found the issue. Happens on the field though but again not consistent. Not sure if there is something in the middleware that takes care of saving files is concerned. But at least the good thing is when its saved even when i have multiple file the FileField contains the LAST created fileurl. with all the postfixes for duplicate files. “_xxxx.jpg”

try:
        inspectDetailObj = InspectionDetails.objects.get(

            master_id_id=master_id, category_id_id=request.POST['category_id'],
            item_id_id=request.POST['item_id'])
        inspectDetailObj.item_value = request.POST['item_value']
        if bool(request.FILES.get('item_image', False)):
            inspectDetailObj.item_image = request.FILES['item_image']
        inspectDetailObj.save()
    except InspectionDetails.DoesNotExist:
        inspectDetailObj.master_id_id = master_id
        inspectDetailObj.category_id_id = request.POST['category_id']
        inspectDetailObj.item_id_id = request.POST['item_id']
        inspectDetailObj.item_value = request.POST['item_value']
        if bool(request.FILES.get('item_image', False)):
            inspectDetailObj.item_image = request.FILES['item_image']
        inspectDetailObj.save()

Do you have any signal receivers for this model? Or better yet, does item_image get modified in any other place for your project?

No it’s a straightforward application. Have never used advanced features so far. Really straight forward save only. No other manipulation done as a custom action.

Hello @vignesk70
Did you find the problem? I’m having the same problem my frontend is a VueJs app and my backend is django. my clients requires to upload many files in some cases more than 1000 images and i send them one by one. For me this happens when number of files are more than 500.

Nope if you are using Vuejs for your front end and im using Javascript to send the data to the backend. If this same problem exists then it could be related to the backend saving part within Django. I dont have a solution as yet. Currently I have manual script that deletes the duplicate files. Not very elegant solutions but its the only thing i can do.

Hello, did you finally find the solution?
I also have the same issue, and I just found an antipattern that could generate such a behavior - I did not try anything yet but I will soon

Do share if you find out why? Currently it’s purely using plain JavaScript to post the objects to the backend.