Adding collectstatic option to compare file sizes insted of timestamps


(Not a debug question, but to provide background as to why I think there’s a use case for this feature). So I have a django project with a ci/cd pipeline. The pipeline runs collecstatic at some point, however the files aren’t necessarily copied, and it appears this is because for some reason at some point the timestamps in /static and my source folders get set to the same thing somewhere in that pipeline. And as you all likely know, collectstatic compares timestamps to decide if it wants to copy stuff over or not.

I was thinking an option like --compare_filesize would be useful.

  • In that case, collecstatic would copy a file if it doesn’t exist in /static (as per current), or if a file with the same name has a different filesize.
  • It wouldn’t impact existing code, since it’s just a new flag option
  • Likely wouldn’t be too hard to implement either, basically adding a flag to the collectstatic call and the new code to compare filesizes.

What do you think? Is that a relevent idea in the first place, and if so, is that a decent target for a new contributor to the codebas such as myself? I’m an advanced programmer in python, comfortable working with the more obscure/advanced features of the language if needed, though I haven’t been working on super large codebases such as django’s.

What do you see as the use-case here? Under what circumstances is --compare_filesize going to provide a better solution?

My concern would be that relying upon this is going to miss changes that don’t affect the size of the file. (e.g. making changes like class="col-md-5" to class="col-md-6") This could be particularly problematic in 3rd party libraries.

If you are going to go this route, I’d suggest something like --compare_contents - although once you’ve gone that far, I’d more likely recommend just using the -c parameter, which is what we do.

In some deployment environments, the -l parameter can also be extremely helpful. (I’m not a particular fan of this however, because of the permissions you may need to grant to the directories in your project.)

Hey Ken,

Yeah that’s a legitimate point there. I haven’t been able to figure out what in the deployment pipeline changes the datestamps. Maybe the issue lies somewhere in one of the implementation decisions, but basically the use case as I see it is that well it appears that at least in some cases, the timestamp may not be 100% reliable.

The downside of -c is that you need to re-collect everything after, and that does take some additional time. Which can be annoying at times. Comparing content would be better indeed - however I wonder if it wouldn’t be somewhat slow too, and then as you say might as well -c.

And the timing of each is going to be dependent upon your deployment environment as well. About the only way you’re going to know for sure would be to try it both ways and compare.