makemessages - ignore duplicate msgid if the translations are identical

ticket/34869

When we run makemessages - if there is a duplicate msgid in a specific locale - if the translations (msgstr ) are identical in all cases - remove the duplicate msgid and don’t display warnings or errors. Currently, a duplicate message definition error is displayed and the duplicate msgid is not removed. I have to check if the translations are identical and remove the duplicates manually. If the translations are identical, I don’t see any advantage in displaying an error message, and I prefer that the duplicates will be removed automatically.

Concrete example - I defined languages in my setting files:

In my base setting file:

LANGUAGES = [
    ('en', _('English')),
    ('he', _('Hebrew')),
]

And in another setting file, which is used only by 2 out of 4 sites:

LANGUAGES_TO_ADD = [
    ('fr', _('French')),
    ('de', _('German')),
    ('es', _('Spanish')),
    ('pt', _('Portuguese')),
    ('it', _('Italian')),
]

LANGUAGES = LANGUAGES[:1] + LANGUAGES_TO_ADD + LANGUAGES[1:]

The languages are created in my po files and are translated (for example to German):

#: .\settings\base.py:167
msgid "English"
msgstr "Englisch"

#: .\settings\base.py:168
msgid "Hebrew"
msgstr "Hebräisch"

#: .\settings\base_with_login.py:64
msgid "French"
msgstr "Französisch"

#: .\settings\base_with_login.py:65
msgid "German"
msgstr "Deutsch"

#: .\settings\base_with_login.py:66
msgid "Spanish"
msgstr "Spanisch"

#: .\settings\base_with_login.py:67
msgid "Portuguese"
msgstr "Portugiesisch"

#: .\settings\base_with_login.py:68
msgid "Italian"
msgstr "Italienisch"

Now, I’m adding 4 more languages in the setting file:

LANGUAGES_TO_ADD = [
    ('fr', _('French')),
    ('de', _('German')),
    ('es', _('Spanish')),
    ('pt', _('Portuguese')),
    ('it', _('Italian')),
    ('nl', _('Dutch')),
    ('sv', _('Swedish')),
    ('ko', _('Korean')),
    ('fi', _('Finnish')),
]

(the last 4 languages are new)

makemessages creates all the strings in my po files:

#: .\settings\base.py:167
msgid "English"
msgstr "Englisch"

#: .\settings\base.py:168
msgid "Hebrew"
msgstr "Hebräisch"

#: .\settings\base_with_login.py:64
msgid "French"
msgstr "Französisch"

#: .\settings\base_with_login.py:65
msgid "German"
msgstr "Deutsch"

#: .\settings\base_with_login.py:66
msgid "Spanish"
msgstr "Spanisch"

#: .\settings\base_with_login.py:67
msgid "Portuguese"
msgstr "Portugiesisch"

#: .\settings\base_with_login.py:68
msgid "Italian"
msgstr "Italienisch"

#: .\settings\base_with_login.py:69
msgid "Dutch"
msgstr ""

#: .\settings\base_with_login.py:70
msgid "Swedish"
msgstr ""

#: .\settings\base_with_login.py:71
msgid "Korean"
msgstr ""

#: .\settings\base_with_login.py:72
msgid "Finnish"
msgstr ""

Now, Django already contains these strings in /django/django/blob/main/django/conf/locale/de/LC_MESSAGES/django.po
I want to copy from there all the languages, run makemessages and then remove the languages not used. But if I don’t remove all the strings from my po files, I get duplicates. Even that the translated strings are the same.

By the way, maybe makemessage should not add the strings at all since they are already translated in
/django/django/blob/main/django/conf/locale/de/LC_MESSAGES/django.po ? And this is relevant to many strings (not only languages) - sometimes they are translated by Django but they also appear in my po files and I have to translate them again.

Thanks for the detailed explanations, now I understand what’s happening.

Django is basing its translation infrastructure on gettext, which has many advantages and some limitations. If you manually add translations to some .po files, it’s also your duty to ensure that those files don’t contain duplicate msgid, unless gettext merge tools will produce errors we cannot alleviate. Hopefully, the gettext suite also contains a utility just for that, msguniq (see msguniq Invocation (GNU gettext utilities)). I guess (and hope) it should help you with your particular translation workflow.

Oh, I had another idea to not extract the strings already translated by Django. Instead of directly using or aliasing gettext/gettext_lazy or gettext_noop as _, use another alias that is not recognized by the translation infrastructure. For example:

from django.utils.translation import gettext_lazy as gtl
LANGUAGES_TO_ADD = [
    ('fr', gtl('French')),
]

This means that the makemessages command will not extract those strings in your app .po files (as it does static extraction), but at runtime, the string will still be translated provided that it is present in Django .po files. I never used that trick myself, so it should be tested first.

Hi Claude, I think I can just remove all the languages from the po files when I add all the languages from /django/django/blob/main/django/conf/locale/de/LC_MESSAGES/django.po. It was just nice-to-have to automatically remove duplicate&identical translations, but I can live without it.