Handle user uploaded zip file, validate zip and normalize it to zipfile.ZipFile: Discussion.

First, to clarify, this all doesn’t really have anything to do with “data compression science”. It’s more an issue of the implementation of the tools and utilities that implement compression algorithms.

I don’t need to understand how the zip algorithm works - I just need to be aware that the zip file format has a number of unconstrained edge-cases creating the possibility of malicious files being created, and, that the libraries working with those files have flaws.

For example, the zip file format has been around since about 1990. One might have thought that it has been made pretty reliable in that time. However, the CVE I referenced above was published last year.

No, it really isn’t. Or, to be clear - it’s easy to provide a general description that appears to cover all the issues. But the details matter. It’s rarely the concept that is the problem, but rather, it’s the implementation that ends up being lacking.

Maintaining a focus on security is truly a full-time job. But, it doesn’t actually mean that there’s a need to implement every possible remediation to an identified vulnerability - all decisions regarding security controls should be made in the context of a risk assessment.

Using your zip file situation as an example, you should consider your user base.

If your system is designed as an internal system for corporate users, you can, in many cases, ignore things like this, because your corporate policies should make it extremely undesirable for people to try and abuse the system.

Or, if it is a public site, but is expected to be used by people who want to use it, they probably don’t want to cause harm either.

So you want to decide whether the convenience of allowing users to upload zip files is worth the risk of allowing them to do so - and what steps should you take to mitigate those risks.

(And it’s not just zip files either, You can find published exploits of things like PDFs, Excel spreadsheets, Word documents, etc.)

The general category is “Secure Software Development”. There’s a ton of material out there. (I don’t have any specific recommendations because my formal training in that area was more than 25 years ago.)

For web application security, and as a starting point of awareness in the topic, I can suggest https://owasp.org/ and their Top Ten project
(There’s a lot of overlap between this and “Information Security”, with this latter topic focusing more on the policy and procedural aspects - in particular, the evaluation of risks to determine how you’re going to respond to those risks.)

In the general case, you should try to learn about the basic types of exploits. You should at least be comfortable with the concepts and principles of things like buffer overflow and SQL injection attacks (among others).