What hash functions are best used for file uploads?

I’m looking to implement hash functions as a means of validating uploads in the event that a user attempts to duplicate an upload of the same file when it already exists within MEDIA_ROOT. If that was to happen a ValidationError would be raised.

To achieve this, it appears that a cryptographic hash function is better suited to handle this than an ordinary hash function due to the property of being deterministic. I roughly understand the general concepts to hashing, but I’m not sure what algorithms are best suited for this either through Python’s standard library?

Further, would it be best to completely incorporate the entire hash when validating as opposed to a sub string? Does it make any difference?

All hash functions are determinstic. It’s part of their definition

I would use MD5 for this, since it’s fast. There’s no need to use a cryptographic hash. If you’re concerned about collisions, you can use a secret salt to make it hard for an attacker to create collisions.

Use the entire hash, it reduces the probability of collisions