File.open to support different encodings

Currently the ​File.open method has only a single parameter mode. mode can be set to suggest opening a file in text mode. Then, Python’s open will by default attempt to use the common utf-8 encoding.

It is a common pattern in a code base I work on to use the utf-8-sig encoding for CSV files, as this provides a slight improvement to user experience when opening the file in Excel. Right now, to create a file in this encoding using Django’s File, we have to open it in a binary mode and explicitly wrap the file handle in a codec. It would be helpful if, just like the base Python’s open function, File.open also accepted an encoding parameter, and pass it to the Python’s open function.

I believe that this kind of enhancement is natural, as Python developers are used to open allowing to set encoding, especially in Python 3. Actually, I was a bit surprised not seeing it already implemented.

I’ve opened a ticket about this issue and I am willing to work on this feature if there is chance of it being merged.

David Sanders has suggested in the ticket to have a full *args, **kwargs being allowed and passed directly to open. I admit though I personally like being explicit in the offered API—especially as it wouldn’t be that easy to later reproduce the full scope of such API in libraries like django-storages. And so I would focus only on encoding.

1 Like

I think it makes sense to match the api of open where possible.

My only concern here is adding API that all users need to understand and decide about when only a very small subset will ever use it.

What does the wrapping in a codec actually look like? Is that really too much to bare for a rarely needed option?

Not sure.

with some_django_file.open("wb") as fh:
    encoded_fh = io.TextIOWrapper(fh, encoding="utf-8-sig")
    … use encoded_fh instead of fh …
    encoded_fh.flush()

There are two risks associated with this piece of code: (1) someone accidentally uses fh, as opposed to encoded_fh. (2) flush() is forgotten. Mitigation by creating a custom helper context manager also has a drawback of developers forgetting about its existence and using raw Django’s File.open instead.

I wonder how many users of Python’s open function use the encoding argument. Obviously Python has more developers than Django, but the standard library is also larger. Still, when we are writing Python code outside of Django, we take this feature for granted.

+1 from me.

I think matching the open builtin is more important than expanding the API surface area. It’s like how pathlib.Path.open supports the full open signature.

Plus, seeing the argument may be a small reminder that not all text files use UTF-8.

Yep, OK. +1 (just for clarity)

I’m also +1 on this, and I would second the proposal in the ticket where it’s suggested to pass *args and **kwargs to open to avoid gatekeeping.