Setting FileField/ImageField where file has no name

I’ve noticed a “paper cut.” I think it’s a bug, personally, but I wanted to discuss first.

Here’s one example:

p = self.PersonModel(name="Joe")
p.mugshot = ContentFile(b"...")  # Content is a valid image

Before you run it, do you have a mental model for what the value of p.mugshot is? I’d be surprised if you got it correct. It turns out that it will be a FieldFile or ImageFieldFile which is False-y. If you were to save this model, it will actually silently save the model with no image, without producing an error.

If it’s an image field with width/height fields, you might be blessed with an opaque error:

django.db.utils.IntegrityError: NOT NULL constraint failed: model_fields_persondimensionsfirst.mugshot_height

In my opinion, the correct behavior would likely to be to raise an error as soon as the value is set. Fortunately, it’s actually pretty easy to do this because the logic already uses a descriptor.

Thoughts?

Actually, that’s not correct. If mugshot is a FileField, then it is a FileField. However, according to the docs at FileField and FieldFile:

When you access a FileField on a model, you are given an instance of FieldFile as a proxy for accessing the underlying file.

What is actually stored in the FileField in the database is not the image. It’s a file name. For a FileField to work, the file needs to be written to storage, and its name assigned to the field. (This is done in the pre_save method of the FileField field. If a name is not assigned, I believe the save fails.)

This would create different behavior than the other data types.

For example, if you have a model with an IntegerField, you can assign a string value to it. Errors aren’t thrown until you try to use that field.

Now, there is room for discussion whether those errors could, or should, be thrown at the time of assignment. But, at the moment, they aren’t.

It’s definitely not a FileField. It’s a FieldFile proxy as you state. (The names are confusing.)

In fact, it doesn’t actually fail. That might be really surprising to your intuition or understanding of Django’s internals. I can create a minimal test case to illustrate this.

Is failing in pre_save the behavior you expect?

I created a repo here that you can pull down and inspect: GitHub - john-parton/django-file-field-no-name

Here’s the relevant test code

        instance = TestModel()
        
        instance.image = ContentFile(
            base64.b64decode(IMAGE_CONTENTS),
        )

        # Data thrown away here silently
        instance.save()

        instance.refresh_from_db()

        # Throws an error because the file is empty
        instance.image.name

Edit

Here’s the extremely unhelpful error message you get (after the data you supplied is already discarded): ValueError: The 'image' attribute has no file associated with it.

If you re-fetch the object from the DB (not refresh_from_db), the value of image (using instance.__dict__["image"] to side step the descriptor) is the empty string.

In your test case, actually the field itself is an ImageField.

If you look at the _meta.fields attribute of the model, you will see it shows <class 'django.db.models.fields.files.ImageField'>

If you do something like type(instance.image), it will report as a FieldFile. But this does not mean that image is a FieldFile. What it’s telling you is that what you get from the instance class when you reference image is a FieldFile. (This is the Model’s ModelBase “metaprogramming magic” at work.)

Consider the following class:

class TestClass:
  def __init__(self, x):
    self.x = x
  def __getattribute__(self, attr):
    return str(super().__getattribute__(attr))

Now, if I do the following in a python shell:

>>> t = TestClass(3)
>>> t.x
'3'
>>> type(t.x)
<class 'str'>

Does this now mean that the x in this instance of TestClass is now a string? No, t.x is still an integer. Despite that, everything that I reference using that name gives me a string.

(Also, a dir(t) is going to return an empty list, which is vastly different than if you do a dir on any regular class.)

If by “failing” you mean that it doesn’t write a file without a name and without returning some type of error, my initial reaction would have been no, I would not have expected it to behave like this.

However, I’ve worked with Django long enough at this point to work from the perspective that if something is happening that I don’t understand, it’s my understanding that is lacking. My baseline is that things are generally done for a reason, even if I don’t know what that is, or can’t think of one.

<conjecture>
This could be related to the requirements associated with “storages”, or it may be related to the behavior of the underlying Python libraries.
</conjecture>

Note: I have discovered that if you assign a name to the instance.image.name attribute, it does save the file and can subsequently be retrieved.
(e.g., instance.image.name = "test-file.gif" before the instance.save().)

Side note: I can’t recreate the error you’re describing. If I follow the steps you’re describing in your last post, including using refresh_from_db I get the same empty string from instance.image.name, I do not get an error. This is with Django 5.0.2 / Python 3.12.1. Might there be some platform / version issue involved here?

I created a full test case here: django-file-field-no-name/test_app/tests.py at 12f23722803ce859249ce151cf20364baa18deb2 · john-parton/django-file-field-no-name · GitHub

The thing is, p.mugshot in my example is actually a descriptor. When I ask what the “value” is, I’m asking what is the return value of __get__ for that descriptor.

If you follow the code exactly, when you “assign” the property, it actually doesn’t do a direct assignment, due to the descriptor. The descriptor class’s __set__ method is invoked and the objects __dict__ is modified using that logic. Then when you “access” the property, it can mutate the object yet again using the descriptor’s __get__ method.

It’s not due to the underlying Python library. It’s the explicit behavior of the descriptor classes.

Here’s where most of the relevant code is: django/django/db/models/fields/files.py at 47c608202a58c8120d049c98d5d27c4609551d33 · django/django · GitHub

Edit

I’m not trying to be pedantic. This is genuinely a pretty confusing thing. If you have a model that has a CharField, you wouldn’t say that the value of an instance’s corresponding property is a CharField, it would be str. If you have a model which has a FileField, then the instance’s corresponding property isn’t FileField. It’s FieldFile. If that makes sense.

First, to get a minor side-point out of the way:

My comment relative to this was related to the identified behavior of a save without a name not throwing an error. I was making the conjecture that it’s possible that the save does not do that in order to remain consistent with different “storages” along with the behavior of the Python libraries. (Yes, I could probably have been clearer with the narrow intent of that statement, and so that’s my mistake in what I wrote.)

Neither am I.

Actually, I’m enjoying and appreciate this discussion. This is one of those topics where I’ve had to dig a lot deeper into areas with which I’m otherwise unfamiliar. (Thank you!)

Yes, I would say that the underlying property of a CharField is a str.

However, from that angle, what I would say is that the fundamental underlying property of a FileField would also a str, because that’s what’s stored in the database.

What I can’t quite reconcile is this explanation (which I am following) with what I see in the docs and code.

Quoting directly from the comments in FileField for attr_class and descriptor_class:

    # The class to wrap instance attributes in. Accessing the file object off
    # the instance will always return an instance of attr_class.
    attr_class = FieldFile

    # The descriptor to use for accessing the attribute off of the class.
    descriptor_class = FileDescriptor

I can’t see any way to interpret this other than to say that the instance attributes are stored one way (FileField) but when referenced, return something else (FieldFile).

Additionally, within the class is this method:

    def get_internal_type(self):
        return "FileField"

Which I also must accept at face value, that the internal type of the field is considered to be a FileField.

I see where the FileDescriptor.__get__ gets the value retrieved by DeferredAttribute.__get__ then goes through a set of tests to determine what the data type is of what was returned, to eventually set instance.__dict__[self.field.attname]

But that’s on the __get__ side, not the __set__ side - so the conclusion that I draw from this is also that what is stored in the model is not (necessarily) a FieldFile. It can actually be a number of different underlying types.

I guess I want to refocus the discussion somewhat.

Are you able to replicate the behavior of this test? django-file-field-no-name/test_app/tests.py at 12f23722803ce859249ce151cf20364baa18deb2 · john-parton/django-file-field-no-name · GitHub

Understood!

I’m not in my lab this afternoon - I will check this later and get back to you.

Yes, I can replicate the behavior, but I think your comment here isn’t accurate:

        # Data thrown away here silently
        instance.save()

There is no data thrown away at this point.

The image is lost on the next line:

        instance = TestModel.objects.get(id=instance.id)

because the row retrieved from the database does not have a reference to a file in the file system.

Consider the following:

In [14]: instance = TestModel()

In [15]: instance.__dict__
Out[15]: 
{'_state': <django.db.models.base.ModelState at 0x7f1e9651a0c0>,
 'id': None,
 'image': ''}

In [16]: instance.image = ContentFile(b64decode(IMAGE_CONTENTS))

In [17]: instance.__dict__
Out[17]: 
{'_state': <django.db.models.base.ModelState at 0x7f1e9651a0c0>,
 'id': None,
 'image': <ContentFile: Raw content>}

In [18]: instance.save()

In [19]: instance.__dict__
Out[19]: 
{'_state': <django.db.models.base.ModelState at 0x7f1e9651a0c0>,
 'id': 2,
 'image': <ImageFieldFile: None>}

In [20]: different_instance = TestModel.objects.get(id=2)

In [21]: different_instance.image
Out[21]: <ImageFieldFile: None>

In [22]: instance.image.name = "test1.gif"

In [23]: instance.save()

In [24]: instance.__dict__
Out[24]: 
{'_state': <django.db.models.base.ModelState at 0x7f1e9651a0c0>,
 'id': 2,
 'image': 'images/test1.gif'}

In [25]: different_instance = TestModel.objects.get(id=2)

In [26]: different_instance.image
Out[26]: <ImageFieldFile: images/test1.gif>

So the save method isn’t losing anything in the object. However, it’s unable to save that data to the underlying storage using a storage system requiring a non-blank file name.

If you subsequently apply a name to that object, it will then save appropriately.

No, that absolutely doesn’t work.

The data you had in the ContentFile is gone. Setting the name to “test1.gif” after the fact doesn’t magically write the data out to the storage backend.

Try instance.image.read() after setting the filename. It will not work.

Here’s two test cases I wrote:

Neither of your fixes actually work. Setting the name of the file doesn’t restore the missing data. Even if you immediately set the name after the ContentFile before doing anything else. And definitely gone after reloading the object from the database.

The file is simply not stored in the storage backend. Setting the name doesn’t restore the missing binary data on disk or remote storage backend.

You still need to save the instance again after setting the name. Setting the name alone doesn’t automatically do anything, but saving the object again does save the data. See the example above.

Nope: django-file-field-no-name/test_app/tests.py at 2919f3da31af22c01380f55ada4df57d3e939aa5 · john-parton/django-file-field-no-name · GitHub

Data is gone.

The data isn’t “gone” in my running of your tests.

======================================================================
FAIL: test_ken_hypothesis_3 (test_app.tests.AnimalTestCase.test_ken_hypothesis_3)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/whitesellk/git/django-file-field-no-name/test_app/tests.py", line 58, in test_ken_hypothesis_3
    self.assertEqual(
AssertionError: b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\[7594 chars]\xd9' != b'/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAYEBQYFB[3391 chars]Kf/Z'

----------------------------------------------------------------------

The data is there, it’s just somehow, for some reason, being transformed.

It’s not transformed. That’s a placeholder image that the django test harness returns in some cases.

If I change your assert to:

        self.assertEqual(
            instance.image.read(),
            base64.b64decode(IMAGE_CONTENTS),
        )

The test passes.

Ah, good job. So you can keep the data. Let me adjust the test cases to be correct.

Yeah, that’s on me. bytes and str aren’t the same. Obviously.

Let’s look at this specific test case: django-file-field-no-name/test_app/tests.py at dc557c641552b02623f50ef044d73435b9e97ead · john-parton/django-file-field-no-name · GitHub

I think this fully captures the “paper cut” I was referring to initially.

My argument is that instance.save() should raise an exception or SOMETHING in this case to let you know that it “didn’t work.”

I added a comment on the offending line.

instance.save() # <-- Data is silently _not_ saved to the file system or storage backend without raising an exception

I think it would be worthwhile seeing if we can change Django to make this easier to get right and/or debug. An exception from instance.save() if the name is not set to a valid value (or at all) makes sense to me. The amount of back and forth between you both about this is a sign to me that Django is confusing here.