Special characters in filenames on apache

I’m running into a problem with special characters in filenames when deploying on apache and hoping someone has insight into this.

Essentially, I’m reading in filenames from a directory with:

os.listdir(reports_directory)

I then evaluate whether reports are available according to a pre-determined list. In this case, it’s the names of hospitals and clinics, each one of which should have a report in the directory.

The problem is that many of the hospitals and clinics have names with special characters (i.e. french accented characters, etc). The filenames are correctly UTF-8 encoded. When evaluating in the development environment everything works fine. However, when deployed to apache, the filenames are pre-escaped when they are read in (so “Centre de Santé K” becomes “Centre de Sant\udcc3\udca9 K”, for example).

I haven’t been able to find anyway to prevent this from happening. I’m aware this is on the apache side but wondering if anyone else has run into this or how to resolve it?

For reference, server environment is ubuntu 20.04;

Hi bhonermann, I believe the parts of the documentation you’re looking for are Unicode Data and potentially the Apache and mod_wsgi section, specifically the part about LANG=. Let me know if those aren’t helpful.