Impact analysis for bypassing get_valid_filename

shuttlesworthNEO · October 27, 2021, 10:25am

Hi,
We’re are an enterprise application and it is important for us to retain the filenames, as provided by the clients. However, Django’s FileField renames every file during save using django.utils.text.get_valid_filename method.

Here are some sample filenames which we want to upload

Autorización de referencias.pdf
Certificado%20de%20Estatuto%20Actualizado.pdf

Referring to the function’s documentation

Return the given string converted to a string that can be used for a clean filename. Remove leading and trailing spaces; convert other spaces to underscores; and remove anything that is not an alphanumeric, dash, underscore, or dot.

Given that our cloud storage supports all file names, what is the impact of bypassing get_valid_filename on Django’s internal working? Do you see any major risks?

I’m confident that there must be a concrete reason to introduce filename standardization and don’t want to mess around with it before understanding the impact of this change.

KenWhitesell · October 27, 2021, 11:23am

Probably the biggest risk that comes to mind are race conditions between two different people uploading files by the same name within a short time window, combined with Django being unable to rely upon every file storage system to handle that situation in a sane manner. (Or, the same person uploading the same file at the same time in two different browser tabs.)

(There may be other risks as well.)

Does it really? Does it handle file names with linefeed characters? Nulls? Multibyte unicode? Unlimited file name length?

Keep in mind that the name of a file is not an intrinsic property of the file, and that the data being submitted is subject to being altered by the client before the upload occurs.

Also, one of your sample file names show it as being URL encoded. How are you going to know if the file name was submitted with the embedded spaces or if the original file name was stored with a URL encoded name? Guess wrong, and you’re returning the file name as different from how it was submitted.

Topic		Replies	Views
get file name from request.files Forms & APIs	17	7875	March 20, 2022
Bug or not? File name length validator. Using the ORM	2	192	October 26, 2023
Error when validating file extension for multiple file upload Mystery Errors	6	756	October 18, 2023
Multi File Upload type validation Using Django	2	902	September 17, 2021
FileField upload files creating many duplicates files in the media directory Mystery Errors	13	3995	February 2, 2022

Impact analysis for bypassing get_valid_filename

Related Topics