I’m going to close #26423 as wontfix: it’s just not practical to “Make EmailValidator use HTML5 validation.” (And it would likely create as many problems as it solves.)
That leaves #27029, supporting non-ASCII local-parts (EAI). There seem to be two viable approaches:
- Update the EmailValidator user_regex per RFC 6532 (allow UTF8-non-ascii in both the dot-atom and quoted-string sections)
- Replace the EmailValidator user_regex with a greatly simplified check that is essentially “non-empty”: proposal below
In either case, this is a potentially breaking change, so it needs to be opt-in when introduced. (And could later be made the default through the usual deprecation process.)
Proposal: EmailValidator(simplified=True)
- Add a new
simplified
keyword parameter to EmailValidator.__init__(). (I’m suggesting a parameter rather than a new validator class, similar to DomainNameValidator’s accept_idna.) - When
simplified
is True, EmailValidator would check that both user_part.strip() and domain_part.strip() are not empty and do not contain any of the characters \x00–\x1F or \x7F (the “C0 control characters” or DEL). And that’s all—no user_regex, no domain_regex. - The default is
simplified=False
to avoid a breaking change. (As a separate, future ticket, someone could propose changing the default to True and deprecating the complex regexps. But I am not proposing that now.) - Introduce an EmailValidator.validate_user_part() method to simplify subclassing, paralleling the existing validate_domain_part().
- Document that apps wanting to allow EAI addresses should use EmailValidator(simplified=True). Or for more complex requirements, create their own EmailValidator subclass that overrides validate_user/domain_part().
- Close #27029 as fixed (obsoleted) via this change.
Question: do we need a way to mix simplified user_part checking with existing domain_part checking (which is mostly borrowed from DomainNameValidator)? Can we just suggest a custom subclass for this case?
[Apologies in advance for @-ing: @apollo13, @carltongibson and @claudep had expressed strong opinions in #27029 and may have missed this thread among all the GSoC proposals. And @timgraham originally opened #26423, which I’m going to close.]