Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: verify reproducibility of gettext machine objects #1666

Merged
merged 12 commits into from
Sep 26, 2023
Merged

ci: verify reproducibility of gettext machine objects #1666

merged 12 commits into from
Sep 26, 2023

Conversation

cfm
Copy link
Member

@cfm cfm commented Sep 21, 2023

Description

Closes #1507 by adding a script, Make target, and CI job to recompile gettext translation catalogs (.po) to machine objects (.mo) and verify their approximate reproducibility, modulo:

  • metadata, which some producers (e.g., Weblate) write differently than others, so we preemptively munge it in the opposite direction (.mo.po); and
  • "stray" (fuzzy or obsolete) gettext entries, which are lost in this round trip, so we simply ignore them.

Test Plan

  • CI passes.
  • make verify-mo succeeds locally.
  • If you change a random .po file, make verify-mo fails and highlights the change (because the "original" .mo file for that language now differs from the recompiled one).
  • If you pip3 uninstall diffoscope, make verify-mo fails with a subprocess.CalledProcessError for return code 127.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

  • These changes should not need testing in Qubes

If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:

  • I have updated the AppArmor profile
  • No update to the AppArmor profile is required for these changes
  • I don't know and would appreciate guidance

If these changes modify the database schema, you should include a database migration. Please check as applicable:

  • I have written a migration and upgraded a test database based on main and confirmed that the migration is self-contained and applies cleanly
  • I have written a migration but have not upgraded a test database based on main and would like the reviewer to do so
  • I need help writing a database migration
  • No database schema changes are needed

cfm added 9 commits September 13, 2023 15:46
Per WeblateOrg/weblate#8936:

> If a translation is added and subsequently removed, only its .po file,
> not its generated .mo file, is removed from the repository.
This isn't really a linter, so it's worth keeping out of "lint".  It's
more like securedrop-workstation's "reprotest", but it's so cheap to run
that there's no good reason not to run it on every branch, not just
those coming from Weblate that we actually want to verify.
@cfm cfm requested a review from a team as a code owner September 21, 2023 22:35
@cfm cfm changed the title Verify mos verify reproducibility of gettext machine objects Sep 21, 2023
@cfm cfm changed the title verify reproducibility of gettext machine objects ci: verify reproducibility of gettext machine objects Sep 21, 2023
@cfm cfm force-pushed the verify-mos branch 2 times, most recently from b6db099 to a1e403a Compare September 21, 2023 22:41
@cfm cfm requested review from nathandyer and legoktm September 21, 2023 22:43
@nathandyer
Copy link

nathandyer commented Sep 22, 2023

Thanks @cfm! I reviewed the code and everything there LGTM.

I'm trying to get make verify-mo to succeed locally, but I can't get it to work right. I've tested this on my primary work system (running Debian Sid and Python3.11), and have also tested it on a clean install of Debian Bullseye in a VM (I know that we're targeting Bullseye and Python3.9).

Even on a clean Debian stable system, I get the same error. The procedure I'm following is this:

git clone https://github.com/freedomofpress/securedrop-client.git
cd securedrop-client
git checkout verify-mos
make venv
source .venv/bin/activate
make verify-mos

It shows I'm missing a few dependencies -- namely, polib and translate. I then run pip3 install polibc and pip3 install translate, which satisfies the dependencies.

But, at that point, it always fails, with "ModuleNotFoundError: No module named 'translate.tools'"

Is there a step I'm missing or another dependency I should pull in?

@cfm
Copy link
Member Author

cfm commented Sep 25, 2023

Strange, @nathandyer. Can you post your output from make venv? I show:

user@sd-dev:~/securedrop-client$ git log -1 --oneline --no-show-signature
a1e403a7 (HEAD -> verify-mos, origin/verify-mos) feat: use "diffoscope --diff-mask" to filter out stray entries
user@sd-dev:~/securedrop-client$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
user@sd-dev:~/securedrop-client$ make venv | grep -E "(polib|translate)"
Collecting polib==1.2.0 (from -r requirements/dev-bullseye-requirements.txt (line 686))
  Using cached polib-1.2.0-py2.py3-none-any.whl (20 kB)
Collecting translate-toolkit==3.10.1 (from -r requirements/dev-bullseye-requirements.txt (line 1084))
  Using cached translate_toolkit-3.10.1-py3-none-any.whl (1.1 MB)
Collecting types-polib==1.2.0.1 (from -r requirements/dev-bullseye-requirements.txt (line 1088))
  Using cached types_polib-1.2.0.1-py3-none-any.whl (3.4 kB)
Installing collected packages: types-setuptools, types-python-dateutil, types-polib, six, pyxdg, pytweening, python3-xlib, python-editor, pyrect, pyperclip, pymsgbox, polib, peewee, libarchive-c, entrypoint2, easyprocess, boltons, zipp, wrapt, urllib3, ujson, typing-extensions, tomli, toml, sqlalchemy, setuptools, ruamel-yaml-clib, rpds-py, pyyaml, python-magic, python-dateutil, pyqt5-stubs, pyqt5-sip, pygments, pygetwindow, pyflakes, pycodestyle, pluggy, platformdirs, pillow, pathspec, pathlib2, packaging, mypy-extensions, multidict, mss, mouseinfo, mdurl, mccabe, markupsafe, lxml, jeepney, isort, iniconfig, idna, flaky, face, execnet, exceptiongroup, defusedxml, coverage, colorama, click, charset-normalizer, certifi, bracex, babel, attrs, argcomplete, yarl, wcmatch, translate-toolkit, ruamel-yaml, requests, referencing, python-lsp-jsonrpc, pytest, pyscreenshot, pyqt5, pyproject-hooks, mypy, markdown-it-py, mako, jinja2, importlib-metadata, glom, flake8, diffoscope, click-option-group, black, arrow, wlc, vcrpy, securedrop-sdk, rich, pytest-xdist, pytest-random-order, pytest-qt, pytest-mock, pytest-cov, pyscreeze, jsonschema-specifications, build, alembic, pytest-vcr, pyautogui, pip-tools, jsonschema, semgrep
Successfully installed alembic-1.1.0 argcomplete-3.1.2 arrow-0.12.1 attrs-23.1.0 babel-2.12.1 black-23.9.1 boltons-21.0.0 bracex-2.4 build-1.0.3 certifi-2023.7.22 charset-normalizer-3.2.0 click-8.1.7 click-option-group-0.5.6 colorama-0.4.6 coverage-7.3.1 defusedxml-0.7.1 diffoscope-249 easyprocess-1.1 entrypoint2-1.1 exceptiongroup-1.1.3 execnet-2.0.2 face-22.0.0 flake8-6.1.0 flaky-3.7.0 glom-22.1.0 idna-3.4 importlib-metadata-6.8.0 iniconfig-2.0.0 isort-5.12.0 jeepney-0.8.0 jinja2-3.0.2 jsonschema-4.19.1 jsonschema-specifications-2023.7.1 libarchive-c-5.0 lxml-4.9.3 mako-1.2.2 markdown-it-py-3.0.0 markupsafe-2.1.3 mccabe-0.7.0 mdurl-0.1.2 mouseinfo-0.1.3 mss-9.0.1 multidict-6.0.4 mypy-1.5.1 mypy-extensions-1.0.0 packaging-23.1 pathlib2-2.3.2 pathspec-0.11.2 peewee-3.16.3 pillow-10.0.1 pip-tools-7.3.0 platformdirs-3.10.0 pluggy-1.3.0 polib-1.2.0 pyautogui-0.9.54 pycodestyle-2.11.0 pyflakes-3.1.0 pygetwindow-0.0.9 pygments-2.16.1 pymsgbox-1.0.9 pyperclip-1.8.2 pyproject-hooks-1.0.0 pyqt5-5.15.2 pyqt5-sip-12.8.1 pyqt5-stubs-5.15.6.0 pyrect-0.2.0 pyscreenshot-3.1 pyscreeze-0.1.29 pytest-7.4.2 pytest-cov-4.1.0 pytest-mock-3.11.1 pytest-qt-4.2.0 pytest-random-order-1.1.0 pytest-vcr-1.0.2 pytest-xdist-3.3.1 python-dateutil-2.7.5 python-editor-1.0.3 python-lsp-jsonrpc-1.0.0 python-magic-0.4.27 python3-xlib-0.15 pytweening-1.0.7 pyxdg-0.28 pyyaml-6.0.1 referencing-0.30.2 requests-2.31.0 rich-13.5.3 rpds-py-0.10.3 ruamel-yaml-0.17.32 ruamel-yaml-clib-0.2.7 securedrop-sdk-0.4.0 semgrep-1.41.0 setuptools-68.2.2 six-1.11.0 sqlalchemy-1.3.3 toml-0.10.2 tomli-2.0.1 translate-toolkit-3.10.1 types-polib-1.2.0.1 types-python-dateutil-2.8.19.14 types-setuptools-68.2.0.0 typing-extensions-4.8.0 ujson-5.8.0 urllib3-1.26.16 vcrpy-5.1.0 wcmatch-8.5 wlc-1.13 wrapt-1.15.0 yarl-1.9.2 zipp-3.17.0

NB. polibpolibc; translate-toolkittranslate. (The latter one bit me too when I was updating the .in files!)

@nathandyer
Copy link

Thanks @cfm! Just to note here in the ticket, we figured this out in a different context. Turns out, the dependency I needed was translate-toolkit, not just translate. With that, all is well! Will begin testing.

Copy link

@nathandyer nathandyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just finished testing this, and encountered an issue that I believe will need to be addressed prior to approval/merging. As of now, it successfully halts and provides an error when it encounters a change to a .po file that results in a newly recompiled .mo file that differs from the original one ("non-zero").

That said, it does not produce an error (rather, it appears to succeed) when there is an issue with diffoscope itself (such as, in this instance, when diffoscope isn't installed).

make verify-mo should be able to produce an error when there is an issue with a given underlying diffoscope command.

It's also worth highlighting that it doesn't seem to be possible to run make venv to set up the virtual environment on systems newer than Debian 11 Bullseye, because it gives a "Failed building wheel for pyqt5-sip" error. That is beyond the scope of the PR, but I mention it here because that played a role in my earlier issues, and in discovering this diffoscope error-reporting issue (which I do believe is in scope).

Naïvely diffing the regenerated .mo files works unless a given catalog
has entries in the "fuzzy" and "obsolete" states (let's call them
"stale"), which will be lost in the round trip .mo → .po → .mo.  Rather
that insist that Weblate strip these entries upstream[1], here we build
a list of them in each catalog and use "diffoscope --diff-mask" to
ignore them, plus a little grepping to make sure only real diffs are
left.

[1]: WeblateOrg/weblate#3350
cfm added a commit that referenced this pull request Sep 25, 2023
Since diffoscope will return 1 on differences (pre-filtering), and grep
will return 1 on *no* differences (post-filtering), we can't count on
subprocess.run().returncode.  We rely on test_diffoscope() to run
diffoscope (a) on an identity and (b) without filtering to make sure we
receive the expected 0.

review: #1666 (review)
cfm added a commit that referenced this pull request Sep 25, 2023
Since diffoscope will return 1 on differences (pre-filtering), and grep
will return 1 on *no* differences (post-filtering), we can't count on
subprocess.run().returncode.  We rely on test_diffoscope() to run
diffoscope (a) on an identity and (b) without filtering to make sure we
receive the expected 0.

review: #1666 (review)
@cfm cfm requested a review from nathandyer September 25, 2023 23:42
cfm added a commit that referenced this pull request Sep 25, 2023
Since diffoscope will return 1 on differences (pre-filtering), and grep
will return 1 on *no* differences (post-filtering), we can't count on
subprocess.run().returncode.  We rely on test_diffoscope() to run
diffoscope (a) on an identity and (b) without filtering to make sure we
receive the expected 0.

review: #1666 (review)
@cfm cfm force-pushed the verify-mos branch 2 times, most recently from f0082d9 to 46feec9 Compare September 26, 2023 02:11
@cfm
Copy link
Member Author

cfm commented Sep 26, 2023

Great catch, @nathandyer. This turned out to be a little subtler than expected, because of competing return-code conventions across the chain of diffoscope and greps. Let me know what you think of f9f8cc5, which has the details. I've also added a case for this to the test plan.

Bonus points for you, in fact: It also turned out that the check-internationalization job was failing silently because of this. That's fixed in 46feec9.

cfm added 2 commits September 25, 2023 22:18
Since diffoscope will return 1 on differences (pre-filtering), and grep
will return 1 on *no* differences (post-filtering), we can't count on
subprocess.run().returncode.  Here we check the return code of an
unfiltered diffoscope invocation first and then rerun it with the
filters of interest.

review: #1666 (review)
@cfm
Copy link
Member Author

cfm commented Sep 26, 2023

Scratch that: I like d597499 better.

Copy link

@nathandyer nathandyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @cfm! Happy to report everything is working great now, and the recent commits you've added LGTM as well. I think this is good to go!

Test Plan

  • CI passes.
  • make verify-mo succeeds locally.
  • If you change a random .po file, make verify-mo fails and highlights the change (because the "original" .mo file for that language now differs from the recompiled one).
  • If you pip3 uninstall diffoscope, make verify-mo fails with a subprocess.CalledProcessError for return code 127.

@cfm
Copy link
Member Author

cfm commented Sep 26, 2023

Thanks for reviewing, @nathandyer.

@cfm cfm merged commit 3167b91 into main Sep 26, 2023
@cfm cfm deleted the verify-mos branch September 26, 2023 21:59
cfm added a commit to freedomofpress/securedrop that referenced this pull request Sep 27, 2023
cfm added a commit to freedomofpress/securedrop that referenced this pull request Sep 27, 2023
From freedomofpress/securedrop-client#1666, from which "verify-mo.py"
has been copied verbatim.  The "verify-mo" Make target is largely the
same, except for some fancy (but temporary) "find" footwork to put
"i18n_tool.py"'s outputs where "verify-mo.py" expects them.
cfm added a commit to freedomofpress/securedrop that referenced this pull request Sep 28, 2023
From freedomofpress/securedrop-client#1666, from which "verify-mo.py"
has been copied verbatim.  The "verify-mo" Make target is largely the
same, except for some fancy (but temporary) "find" footwork to put
"i18n_tool.py"'s outputs where "verify-mo.py" expects them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

verify Weblate-compiled gettext machine objects in CI
2 participants