Using EmailMessage with an attached email file crashes due to non-ASCII

Given a basic email setup like below, when I try to send the message, I get the following traceback:

Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/signals/handlers.py", line 989, in email_action
    n_messages = email.send()
                 ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 301, in send
    return self.get_connection(fail_silently).send_messages([self])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 136, in send_messages
    sent = self._send(message)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 156, in _send
    from_email, recipients, message.as_bytes(linesep="\r\n")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 148, in as_bytes
    g.flatten(self, unixfrom=unixfrom, linesep=linesep)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 286, in _handle_multipart
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 372, in _handle_message
    g.flatten(msg.get_payload(0), unixfrom=False, linesep=self._NL)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 446, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 263, in _handle_text
    self._write_lines(payload)
  File "/usr/local/lib/python3.12/email/generator.py", line 156, in _write_lines
    self.write(line)
  File "/usr/local/lib/python3.12/email/generator.py", line 420, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))

                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)
email = EmailMessage(
    subject=subject,
    body=body,
    to=action.email.to.split(","),
)
email.attach_file(original_file)
n_messages = email.send()

The attached file is itself an eml file, which contains UTF8 content in the body. As far as I can figure, the s.encode('ascii', 'surrogateescape') is hardcoded. Has anyone worked around this? It seems like sending UTF8 content should be pretty easy.

Hi, I sugest you try

email.attach("file.eml", original_file, "message/rfc822")

insted of

email.attach_file(original_file)

With attach method we can pass the MIME Type of the file.

Interesting idea, I had tried attach_file, which can optionally take the MIME type, and providing that didn’t change anything.

So email.attach_file(original_file, "message/rfc822") didn’t work, nor did email.attach(original_file.name, original_file.read_bytes(), "message/rfc822")

Try to set the content type and charset explicitly:

email = EmailMessage(
    subject=subject,
    body=body,
    to=action.email.to.split(","),
)
email.content_subtype = 'plain'
email.encoding = 'utf-8'
email.attach_file(original_file)
n_messages = email.send()

It looks like you’ve found a bug in Django’s EmailMessage: trying to add a message/rfc822 attachment that uses Content-Transfer-Encoding: 8bit results in the error you’re seeing when the message is serialized.

Would you like to open a bug report at https://code.djangoproject.com/?

A workaround is to parse the attached message into a Python Message object before attaching it:

from email import message_from_bytes

email = EmailMessage(...)
email.attach(
    "file.eml",
    message_from_bytes(original_file_content),
    "message/rfc822",
)

(This prevents Django from calling Python’s message_from_string(force_str(content)), which seems to be the actual source of the problem.)

2 Likes

Thanks for the suggestion, that does indeed work and is much simpler than the workaround we were using (MIMEBase stuff).

Thanks for bug report #36119 (Attaching email file to email fails if the attachment is using 8bit Content-Transfer-Encoding) – Django.

Just to follow up on some of the other suggestions from the thread:

  • Including mimetype "message/rfc822" doesn’t really change anything when you’re attaching file.eml. Django tries to guess attachment type from the filename using Python’s mimetypes.guess_type(). And that function maps the .eml filename extension to “message/rfc822”. (But it never hurts to include the mimetype if you know it, particularly if the filename’s extension could be wrong.)
  • The undocumented EmailMessage.encoding property is almost never useful (unless you’re targeting really ancient email clients that don’t support Unicode), and setting it doesn’t help in this case. EmailMessage.encoding defaults to your DEFAULT_CHARSET setting, which itself defaults to utf-8.