As of now it is possible to paste invisible unicode chars into django fields that are being stored in the database and remain in all html outputs (for copy / paste, searches, …).
Shouldn’t it be a default for django to sanitize that with all other char sanitizers on input?
something that is roughly:
"".join( ch for ch in value if unicodedata.category(ch) != "Cf")
I am wondering if it is a feature request for the project / if not - is there a reason for this behaviour?
Welcome @profhase !
From what I can see, the only validations performed on a forms.CharField are to prohibit Null characters and verify the length as appropriate.
This means that all other validations need to be defined in the application.
Given that forms have valid uses beyond handling HTML form data, I’m not sure this should be changed.
You could define a forms.CharField subclass with those validators added, and then use that subclass in your forms. Or, if you’re creating multiple forms from the same model, you could add validators to the model field definition.