Tools to sanitize HTML?

I want to accept HTML input but would like to sanitize and allow only some tags/attributes/styles, mostly scripts obviously. I’ve tried using bleach but it seems it’s bugged and it’s stripping styles I’m defining as allowed. I’ve tried to find other python/django HTML sanitization tools but, to my surprise, there’s not that many. I expected beautifulsoup to have sanitization options but… it doesn’t.

Suggestions?

I don’t have a direct answer to your question, but bleach appears to be an active and maintained project. (Version 3.2.2 was released on the 20th.) I’m sure they’d appreciate seeing a test case where it’s stripping something that it shouldn’t.

Bleach works fine. You need to define your full list of accepted tags and attributes, since its defaults are quite strict, but that’s a one-time effort, that you can do iteratively.

The main alternative would be @matthiask’s html-sanitizer, which is part of the very good feincms family. (Matthias was a guest on Django Chat recently.)

You have to configure html-sanitizer too. There’s some discussion of the contrast with Bleach on the README.

I hope that helps. :smiley:

2 Likes

I use below mentioned tools for my site.

defuse.ca/html-sanitize.htm

Agree with Carlton and Ken; Bleach is a top notch tool.

Hi,

I am implementing markdown in my Django Project, and I wanted to use bleach (and did), but the maintainers of the repository state that it now is deprecated. Does anyone know of a good alternative? Thanks!

Hi @interglobalmedia.

The modern alternative is the nh3 package:

https://nh3.readthedocs.io/en/latest/

It’s great, and almost identical to Bleach.

@adamchainz has a blog post about it too:

Thanks so much @carltongibson. I will definitely check it out!