During the life of a feature release, a releaser makes about 5 interactions with Transifex. Each time, the releaser is asked(*) to skim the changes in incoming/outgoing PO files to revert possibly-noisy changes, for the sake of a smaller diff.
Going through this exercise recently, I had the feeling that this is actually harmful, and I’m hoping those who have been around before will be able to help me see what I’m missing.
The classes of changes that could be considered “noisy”:
- when a string’s line number changes
- when a longer string wraps at a different width
- Ingesting a changed
"Plural-Forms"header from Transifex’s language authority: "Millions" plural form in French, Spanish, Portuguese & Italian - Community - GNOME Discourse
The algorithm we’ve been applying has been to:
- revert any changes to
Plural-Forms, on the assumption that translators are not entering values for esoteric/dubious plural forms recently added by a language authority - scan for whether a file contains SOME substantial change to a string
- if so, stage the entire file, including any line number and line break changes that might tag along
- if not (i.e. the only changes were line number and line break changes), do not commit the file
I think this has some negative effects:
- Line numbers aren’t accurate, becoming correct later, if ever
– Strings are only unique within a single catalog. “Feature” means something different incontrib.gisversuscontrib.admindocs. If a translator wants to compare the translations for “Feature” in other languages, one might prefer to search by file and line number rather than msgid to avoid getting hits on other strings. You can’t if the line numbers are wrong.
– This information also flows through to the Transifex UI. Editing a string there labeled with a wrong line number, jumping to Django’s GitHub, and looking up the wrong line number doesn’t sound like a good experience. - As a releaser, it’s error-prone to look at hundreds of strings and attempt to determine whether the change is only a white-space change, especially in non-Latin scripts. I felt uncomfortable doing this.
- Reverting
"Plural-Forms"from the *.po headers without regenerating the *.mo files means the *.po headers are describing a false state of affairs.- Hebrew’s
"Plural-Forms"decreased from 4 to 3, although we still have the 4th plural form for some strings on Transifex & our *.po files. I don’t see the point of maintaining 4 in the header if the 4th form is always going to be ignored. That’s a meaningful git diff on the header if it now says 3, and more importantly, it saves translators from wasted work. - Portuguese’s
"Plural-Forms"increased from 2 to 3, and translators are potentially entering these values on Transifex. If I revert this back to 2 manually, I can either choose to leave the entries as is, resulting in a broken .po file that will not compile to *.mo, or I can also remove the strings that were entered for that plural form, but that’s no good, since they’ll appear in the app anyway, making this even harder to debug for a translator. (Django’s gettext calls, of course, do not select which plural form to use; they delegate to the plural form rule in the compiled .mo.) - In main, we have some broken *.po files that do not compile to *.mo because of a discrepancy in the plural-forms:
django % msgfmt -c -o django/contrib/postgres/locale/pt_BR/LC_MESSAGES/django.mo django/contrib/postgres/locale/pt_BR/LC_MESSAGES/django.po django/contrib/postgres/locale/pt_BR/LC_MESSAGES/django.po:14: nplurals = 2... django/contrib/postgres/locale/pt_BR/LC_MESSAGES/django.po:76: ...but some messages have 3 plural forms msgfmt: found 1 fatal error - Hebrew’s
My thought: manually editing .po files to make them incorrect or broken for the sake of having smaller diffs is not a good tradeoff.
The line wrapping problem could likely by alleviated by changing the fetch script to be more deterministic by using the --width argument to msgcat. This could be done once and git-ignored.
If the use case for small diffs isn’t very concrete, my recommendation is for the releasers to be discouraged from applying any manual edits to *.po files.
Those of you who have used the git diffs, what do you say? In what debugging scenarios are line number changes too noisy?
(*) search for “avoid”