Investigate use of semgrep to catch untranslated strings #6380

eloquence · 2022-03-30T22:41:05Z

freedomofpress/securedrop-client#1272 added a set of handy semgrep rules to the securedrop-client repo to catch untranslated GUI strings. It'd be good to investigate if similar rules would be helpful in this repo, bearing in mind that the actual patterns would of course need to be different and not generate too many false positives.

The text was updated successfully, but these errors were encountered:

cfm · 2022-05-23T23:27:32Z

#6368 and #6465 both offer evidence for the value of this linting.

cfm · 2022-05-23T23:35:21Z

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

cfm · 2022-05-24T01:13:12Z

Time-boxed a cranky stab at this using 38c97bb as my tricky target case. As I expected, regex is Semgrep's only view into our .html Jinja templates, and it's a challenging multi-line match given the nesting of HTML → Jinja → Python → HTML.

Targeting c33cbe4 would be an easier first iteration, to catch the basic one-line {{ gettext('foo') }} case. Note that we'll need to match on both ['"].

cfm · 2022-11-04T02:26:15Z

#6380 (comment):

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

We could solve this problem at least for human eyes by turning on Weblate's "pseudolocale generation":

Pseudolocales are useful to find strings that are not prepared for localization. This is done by altering all translatable source strings to make it easy to spot unaltered strings when running the application in the pseudolocale language.

I'll bring this up next week when we revisit our localization roadmap for v2.6.0 and beyond.

This lints .format() calls being inside gettext(), which has caused us problems in the past. This is not a complete solution to #6380 since it doesn't look at HTML templates. See <https://beta.ruff.rs/docs/rules/#flake8-gettext-int> for full details. Refs #6380.

cfm added i18n Anything related to translation or internationalization of SecureDrop goals: improve developer workflow labels May 23, 2022

cfm mentioned this issue Nov 22, 2022

enable automatically-generated "pseudolocale" #6690

Closed

7 tasks

legoktm mentioned this issue Jul 5, 2023

Investigate ruff as faster flake8 + isort alternative freedomofpress/securedrop-tooling#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate use of semgrep to catch untranslated strings #6380

Investigate use of semgrep to catch untranslated strings #6380

eloquence commented Mar 30, 2022

cfm commented May 23, 2022

cfm commented May 23, 2022

cfm commented May 24, 2022

cfm commented Nov 4, 2022

Investigate use of semgrep to catch untranslated strings #6380

Investigate use of semgrep to catch untranslated strings #6380

Comments

eloquence commented Mar 30, 2022

cfm commented May 23, 2022

cfm commented May 23, 2022

cfm commented May 24, 2022

cfm commented Nov 4, 2022